Hash Indexes
Hash indexes are the most common type of index in a property graph. They enable fast lookups for exact matches on property values, dramatically improving query performance.
What are Hash Indexes?
A hash index maps property values to vertices, allowing you to quickly find all vertices with a specific property value without scanning the entire graph. It uses a hash table data structure for O(1) lookups.
In this diagram:
- The graph on the right contains vertices (A, B, C, D) with properties.
- The index on the left is specifically for the
name
property. - The dashed arrows show the mapping:
- Looking up
"Alice"
in the index quickly leads to verticesA
andC
. - Looking up
"Bob"
leads to vertexB
.
- Looking up
This allows a query like "find all vertices where name is 'Alice'" to directly access nodes A and C via the index, instead of checking every vertex in the graph.
Defining Hash Indexes
In Graph API, you define a hash index by adding the #[index(hash)]
attribute to a field in your vertex enum:
#[derive(Debug, VertexExt)]
pub enum IndexedVertex {
// Person vertex with various properties
Person {
name: String, // Not indexed
#[index(hash)] // Hash index for exact lookups
username: String,
},
}
How Hash Indexes Work
When you apply the #[index(hash)]
attribute to a field:
- The derive macro generates a hash index entry for that field
- The graph implementation maintains a hash map from property values to vertices
- When you query using the index, the graph can directly look up matching vertices in constant time
Querying with Hash Indexes
The real power of hash indexes becomes apparent when querying the graph:
// Find a person by their username (using hash index)
let julia = graph
.walk()
.vertices(Vertex::person_by_username("julia456"))
.first();
println!("Found Julia: {}", julia.is_some());
Performance Benefits
Hash indexes offer significant performance advantages:
- Constant time lookups: O(1) rather than O(n) for full scans
- Reduced memory pressure: Only relevant vertices are loaded
- Improved scalability: Performance remains consistent as the graph grows
When to Use Hash Indexes
Hash indexes are ideal for:
- Unique identifiers: User IDs, product codes, etc.
- Common lookup fields: Names, titles, categories
- Fields used in equality predicates: Where you need exact matches
Best Practices
When using hash indexes:
- Be selective: Only index fields you frequently query
- Choose appropriate fields: Index fields with high selectivity
- Consider cardinality: Fields with many unique values benefit most from indexing
- Balance maintenance cost: Each index adds storage and update overhead
Index Limitations
Hash indexes have some limitations:
- Exact matches only: Cannot handle range or partial text queries
- Memory overhead: Each index increases memory usage
- Update cost: Indexes must be maintained when data changes
For range queries or partial text matching, consider range indexes or full-text indexes respectively.