Implementing Indexes
Indexes are a critical component for efficient graph operations, allowing for quick lookups of elements based on property values. This chapter guides you through implementing different types of indexes in your graph backend.
Index Types Overview
The Graph API supports four main types of indexes:
- Label Indexes: Index vertices and edges by their label (enum variant)
- Hash Indexes: Lookup elements by a property value using exact matching
- Range Indexes: Find elements with property values in a specific range
- Full-Text Indexes: Search text properties using more complex matching
The Index Infrastructure
The Index Trait
The Graph API defines the Index
trait to represent element indexes:
#![allow(unused)] fn main() { pub trait Index where Self: Sized + Copy + Eq + Hash + Debug, { /// The type of the element being indexed fn ty(&self) -> TypeId; /// The index ordinal fn ordinal(&self) -> usize; /// The type of index fn index_type(&self) -> IndexType; } }
Index Types
The IndexType
enum defines the supported index types:
#![allow(unused)] fn main() { pub enum IndexType { Hash, // Exact matching Range, // Range queries FullText, // Text search } }
Declare Index Support
To support indexes in your implementation, implement the appropriate support traits:
#![allow(unused)] fn main() { impl<Vertex, Edge> Graph for MyGraph<Vertex, Edge> where Vertex: Element, Edge: Element, { // Core Graph functionality // ... } // Implement support traits for the indexing features you want to provide impl<Vertex, Edge> SupportsVertexLabelIndex for MyGraph<Vertex, Edge> where Vertex: Element, Edge: Element, {} impl<Vertex, Edge> SupportsEdgeLabelIndex for MyGraph<Vertex, Edge> where Vertex: Element, Edge: Element, {} impl<Vertex, Edge> SupportsVertexHashIndex for MyGraph<Vertex, Edge> where Vertex: Element, Edge: Element, {} impl<Vertex, Edge> SupportsEdgeHashIndex for MyGraph<Vertex, Edge> where Vertex: Element, Edge: Element, {} impl<Vertex, Edge> SupportsVertexRangeIndex for MyGraph<Vertex, Edge> where Vertex: Element, Edge: Element, {} impl<Vertex, Edge> SupportsEdgeRangeIndex for MyGraph<Vertex, Edge> where Vertex: Element, Edge: Element, {} impl<Vertex, Edge> SupportsVertexFullTextIndex for MyGraph<Vertex, Edge> where Vertex: Element, Edge: Element, {} }
Implementing Label Indexes
Label indexes allow for quickly finding all vertices or edges of a specific label (enum variant).
Data Structure
A simple label index can be implemented using a vector of sets:
#![allow(unused)] fn main() { /// Stores vertex IDs by label struct LabelIndex { // One entry per label, containing all vertex IDs with that label indexes: Vec<HashSet<VertexId>>, } impl LabelIndex { fn new(label_count: usize) -> Self { Self { indexes: (0..label_count).map(|_| HashSet::new()).collect(), } } fn insert(&mut self, label: usize, id: VertexId) { self.indexes[label].insert(id); } fn remove(&mut self, label: usize, id: &VertexId) { self.indexes[label].remove(id); } fn get(&self, label: usize) -> impl Iterator<Item=VertexId> + '_ { self.indexes[label].iter().copied() } } }
Integration with Vertex/Edge Operations
Update label indexes during vertex/edge operations:
#![allow(unused)] fn main() { fn add_vertex(&mut self, vertex: Self::Vertex) -> Self::VertexId { let label_idx = vertex.label().ordinal(); let vertex_id = // create a new vertex ID // Add to label index self.label_index.insert(label_idx, vertex_id); // Rest of implementation // ... } }
Implementing Hash Indexes
Hash indexes allow for quick lookups by property value using exact matching.
Data Structure
A hash index maps property values to sets of element IDs:
#![allow(unused)] fn main() { struct HashIndex<K, V> { map: HashMap<K, HashSet<V>>, } impl<K: Hash + Eq, V: Copy + Eq + Hash> HashIndex<K, V> { fn new() -> Self { Self { map: HashMap::new(), } } fn insert(&mut self, key: K, value: V) { self.map.entry(key).or_default().insert(value); } fn remove(&mut self, key: &K, value: &V) { if let Some(values) = self.map.get_mut(key) { values.remove(value); if values.is_empty() { self.map.remove(key); } } } fn get(&self, key: &K) -> impl Iterator<Item=V> + '_ { self.map .get(key) .into_iter() .flat_map(|values| values.iter().copied()) } } }
Integration with Mutation Handling
When a vertex or edge property changes, you need to update the hash indexes:
#![allow(unused)] fn main() { pub struct MyMutationListener<'reference, Element> { indexes: &'reference mut IndexCollection, id: VertexInternalId, } impl<'reference, Element> MutationListener<'reference, Element> for MyMutationListener<'reference, Element> where Element: Element, { fn update(&mut self, index: <Element::Label as Label>::Index, before: Value, after: Value) { // Remove the old value from the index self.indexes[index.ordinal()].remove(&before, self.id); // Add the new value to the index self.indexes[index.ordinal()].insert(after, self.id); } } }
Implementing Range Indexes
Range indexes allow for finding elements with property values in a specific range.
Data Structure
A range index typically uses an ordered map like BTreeMap
:
#![allow(unused)] fn main() { struct RangeIndex<K, V> { map: BTreeMap<K, HashSet<V>>, } impl<K: Ord, V: Copy + Eq + Hash> RangeIndex<K, V> { fn new() -> Self { Self { map: BTreeMap::new(), } } fn insert(&mut self, key: K, value: V) { self.map.entry(key).or_default().insert(value); } fn remove(&mut self, key: &K, value: &V) { if let Some(values) = self.map.get_mut(key) { values.remove(value); if values.is_empty() { self.map.remove(key); } } } fn get(&self, key: &K) -> impl Iterator<Item=V> + '_ { self.map .get(key) .into_iter() .flat_map(|values| values.iter().copied()) } fn range<'a, R>(&'a self, range: R) -> impl Iterator<Item=V> + 'a where R: RangeBounds<K> + 'a, K: 'a, { self.map .range(range) .flat_map(|(_, values)| values.iter().copied()) } } }
Handling Range Queries
Implement range query support in your vertex and edge iterators:
#![allow(unused)] fn main() { fn vertices<'search>( &self, search: &VertexSearch<'search, Self>, ) -> Self::VertexIter<'search, '_> { // ...other search handling if let VertexSearch::Range { index, range, .. } = search { let index_storage = &self.indexes[index.ordinal()]; return VertexIter { iter: index_storage.range(range.clone(), index), // ...other fields }; } // ...default handling } }
Implementing Full-Text Indexes
Full-text indexes enable searching through text properties with capabilities like prefix matching, fuzzy matching, or tokenized searches.
Data Structure
A simple full-text index implementation might use an inverted index:
#![allow(unused)] fn main() { struct FullTextIndex<V> { // Maps tokens to element IDs tokens: HashMap<String, HashSet<V>>, // Maps element IDs to their full text contents: HashMap<V, String>, } impl<V: Copy + Eq + Hash> FullTextIndex<V> { fn new() -> Self { Self { tokens: HashMap::new(), contents: HashMap::new(), } } fn insert(&mut self, id: V, text: &str) { // Remove old content if it exists self.remove(&id); // Store the full text self.contents.insert(id, text.to_string()); // Tokenize the text and add to inverted index for token in tokenize(text) { self.tokens.entry(token).or_default().insert(id); } } fn remove(&mut self, id: &V) { if let Some(text) = self.contents.remove(id) { // Remove from token index for token in tokenize(&text) { if let Some(ids) = self.tokens.get_mut(&token) { ids.remove(id); if ids.is_empty() { self.tokens.remove(&token); } } } } } fn search(&self, query: &str) -> impl Iterator<Item=V> + '_ { // Tokenize the query let query_tokens: Vec<_> = tokenize(query).collect(); // Find matches query_tokens .into_iter() .filter_map(move |token| self.tokens.get(&token)) .flatten() .copied() .collect::<HashSet<_>>() // Deduplicate .into_iter() } } // Helper function to tokenize text fn tokenize(text: &str) -> impl Iterator<Item=String> + '_ { text.to_lowercase() .split_whitespace() .map(|s| s.to_string()) } }
Advanced Full-Text Features
For more advanced full-text search capabilities, consider:
- Stemming: Reducing words to their root form (e.g., "running" → "run")
- N-grams: Creating token sequences for partial matching
- Fuzzy Matching: Supporting approximate matching with edit distance
- Relevance Scoring: Ranking results by relevance to the query
Efficient Index Updates
For efficient index updates, consider these strategies:
Lazy Updates
Only update indexes when needed:
#![allow(unused)] fn main() { fn remove_vertex(&mut self, id: Self::VertexId) -> Option<Self::Vertex> { // Fetch the vertex let vertex = self.vertices.get(id.index())?; // Only update indexes if the vertex exists for index in vertex.label().indexes() { self.indexes[index.ordinal()].remove(&self.get_value(vertex, index), id.index()); } // Remove the vertex // ... } }
Batched Updates
For bulk operations, batch index updates:
#![allow(unused)] fn main() { fn add_vertices_batch(&mut self, vertices: Vec<Self::Vertex>) -> Vec<Self::VertexId> { let mut ids = Vec::with_capacity(vertices.len()); let mut index_updates = Vec::new(); // First pass: add vertices and collect index updates for vertex in vertices { let id = self.add_vertex_no_index(vertex); ids.push(id); // Collect index updates for index in vertex.label().indexes() { index_updates.push((index, self.get_value(&vertex, index), id)); } } // Second pass: apply all index updates for (index, value, id) in index_updates { self.indexes[index.ordinal()].insert(value, id); } ids } }
Common Pitfalls
When implementing indexes, watch out for these common issues:
- Index Inconsistency: Ensure indexes are always updated when elements change
- Memory Overhead: Indexes increase memory usage; consider selective indexing
- Type Safety: Ensure index type safety, especially with range and full-text indexes
- Empty Sets: Handle empty index entries to avoid wasted memory
- Concurrency: If supporting concurrent access, protect indexes with appropriate synchronization
Performance Considerations
For optimal index performance:
- Index Selection: Only index properties that will be frequently queried
- Data Structure Choice: Choose appropriate data structures based on workload
- Memory vs. Speed: Balance memory usage with lookup speed
- Measure: Benchmark index operations to identify bottlenecks
Testing Indexes
Test your index implementation thoroughly:
#![allow(unused)] fn main() { #[test] fn test_hash_index() { let mut graph = MyGraph::new(); // Add vertices with indexed properties let v1 = graph.add_vertex(Vertex::Person { name: "Alice".to_string(), age: 30, // ...other fields }); let v2 = graph.add_vertex(Vertex::Person { name: "Bob".to_string(), age: 25, // ...other fields }); // Test index lookup let results = graph.walk() .vertices(Vertex::person_by_name("Alice")) .collect::<Vec<_>>(); assert_eq!(results.len(), 1); assert_eq!(results[0].id(), v1); // Test index update after mutation graph.vertex_mut(v1).unwrap() .project_mut::<PersonMut<_>>().unwrap() .set_name("Alicia"); let results = graph.walk() .vertices(Vertex::person_by_name("Alicia")) .collect::<Vec<_>>(); assert_eq!(results.len(), 1); assert_eq!(results[0].id(), v1); } }
Next Steps
Implementing efficient indexes is key to building a high-performance graph backend. By carefully designing index structures and ensuring proper updates during mutations, you can provide fast lookup capabilities while maintaining reasonable memory usage. Remember that different use cases may require different indexing strategies, so consider the expected query patterns when deciding which indexes to implement and how to optimize them.