Utilize a vertex-centric index
- retrieving incident edges by indexed constraint/filter on say ‘time’ is always faster than doing a linear scan of all edges and filtering
- vertex-centric index solves super-node problem ( https://github.com/thinkaurelius/titan/issues/11 )
Turn on cache to improve latency
- configure setVertexCacheSize(..) on TransactionBuilder . The number of vertices this transaction caches in memory.
Avoid Vertex Traversal over Edge
- if a graph needs to jump to V2 from V1 via E1 (2 hops) – to query properties from V2 , then its a very expensive operation.
- so we should store the ‘most frequently accessed properties’ from V1 and V2 to E1
- lets study the how the queries impoved by filtering on edges
- Global-global Retrieval Vs Graph-local walk ….
- Handle failure in Application
When committing a transaction, Titan will attempt to persist all changes to the storage backend. This might not always be successful due to IO exceptions, network errors, machine crashes or resource unavailability. Rollback of transactions is necessary because only the user knows the transactional boundary.
- Check if existence of vertex need to be verified
- TransactionBuilder.checkExternalVertexExistence(boolean) determines – whether this transaction should verify the existence of vertices for user provided vertex ids. Such checks requires access to the database which takes time. The existence check should only be disabled if the user is absolutely sure that the vertex must exist – otherwise data corruption can ensue.
- TransactionBuilder.checkInternalVertexExistence(boolean) – whether this transaction should double-check the existence of vertices during query execution. This can be useful to avoid phantom vertices on eventually consistent storage backends. Disabled by default. Enabling this setting can slow down query processing.
- Convert date time into long
- Enable Batch Loading
- TransactionBuilder.enableBatchLoading() – enables batch-loading for an individual transaction.
Data Partition Strategy
- Partition data within a single graph context
- Partition uber graph into multiple graphs within a single address space
- Partitions can be joined / merged and cross-partition data can be queried for analytics
- Verify Titan Keyspace in Cassandra
Large scale data analysis
- Note that Titan stores 2 wide rows (1. adj. list of incident_edges+target_vertices , 2. adj. list of vertices_edges) .
- So for some fast real-time computation (aggregation of data over vertices)
– implement in-memory map-reduce (preferably using SparkComputer) code execution per vertex in parallel
- Note – VertexProgram – is a piece of code that is executed at each vertex in logically parallel manner until some termination condition is met (e.g. certain number of iterations have occurred, no more data is changing in the graph, etc.).
- Bulk Parallel – execution through Message passing
Query against Patterns
- Match a topology / sub-graph: https://github.com/tinkerpop/gremlin/wiki/Pattern-Match-Pattern
Text Search (full / predicate)
- turn-on indexing for full / predicate text search