ArchitectureOverview

Architecture

CypherLite is organized as a layered architecture with four Rust crates, each responsible for a distinct concern.

5-Layer Architecture

+--------------------------------------------------+
|              Application Layer                    |
|   Rust API  |  Python  |  Go  |  Node.js  |  C   |
+--------------------------------------------------+
|              FFI Layer (cypherlite-ffi)           |
|   C ABI  |  PyO3  |  CGo  |  napi-rs            |
+--------------------------------------------------+
|           Query Engine (cypherlite-query)         |
|   Parser  |  Analyzer  |  Optimizer  |  Executor  |
+--------------------------------------------------+
|            Core Layer (cypherlite-core)           |
|   Graph Model  |  Temporal  |  Plugin System      |
+--------------------------------------------------+
|         Storage Engine (cypherlite-storage)       |
|   B+Tree  |  WAL  |  BufferPool  |  PageManager   |
+--------------------------------------------------+

Crate Dependency Graph

cypherlite-ffi
    |
    v
cypherlite-query
    |
    v
cypherlite-core
    |
    v
cypherlite-storage

Each crate depends only on the layer directly below it, ensuring clean separation of concerns:

CrateResponsibility
cypherlite-storageDisk I/O, page management, B+Tree index, WAL, buffer pool, crash recovery
cypherlite-coreGraph data model, property types, temporal versioning, subgraphs, hyperedges, plugin traits
cypherlite-queryCypher parser, semantic analysis, cost-based optimizer, volcano executor
cypherlite-ffiC ABI exports, PyO3 bindings, CGo bridge, napi-rs addon

Storage Engine

The storage engine is responsible for durable persistence of all graph data.

B+Tree Index

  • O(log n) lookup for nodes and edges
  • Index-free adjacency for relationship traversal
  • Supports range scans for ordered property access

Write-Ahead Log (WAL)

  • All mutations are first written to the WAL before applying to data pages
  • Crash recovery replays uncommitted WAL frames on startup
  • Single-writer / multiple-reader concurrency model

Buffer Pool

  • LRU page cache sits between the query engine and disk
  • Dirty page tracking with batched writeback
  • Configurable cache size for memory vs. I/O tradeoff

Page Manager

  • Fixed-size page allocation and deallocation
  • Free-list management for space reuse
  • Page-level locking with parking_lot

Query Execution Pipeline

A Cypher query goes through four stages before producing results:

Cypher String
    |
    v
+-------------------+
|   1. Parser       |  Recursive descent with Pratt expression parsing
+-------------------+  Produces: Abstract Syntax Tree (AST)
    |
    v
+-------------------+
|   2. Analyzer     |  Variable scope validation, label/type resolution
+-------------------+  Produces: Validated AST with semantic info
    |
    v
+-------------------+
|   3. Optimizer    |  Logical-to-physical plan, predicate pushdown
+-------------------+  Produces: Physical execution plan
    |
    v
+-------------------+
|   4. Executor     |  Volcano iterator model with 12 operators
+-------------------+  Produces: Result rows (streaming)

Parser

  • Hand-written recursive descent parser (no parser generator)
  • Pratt parsing for expression precedence
  • Supports 28+ Cypher keywords
  • Inline property filter: MATCH (n:Label {key: value})

Semantic Analyzer

  • Variable scope validation across WITH boundaries
  • Label and relationship type resolution
  • Type checking for property comparisons

Cost-Based Optimizer

  • Logical-to-physical plan conversion
  • Predicate pushdown to reduce intermediate tuples
  • Index selection based on available B+Tree indexes
  • Join order optimization for multi-pattern MATCH

Volcano Executor

  • Iterator-based pull model (lazy evaluation)
  • 12 physical operators: Scan, IndexScan, Filter, Project, Sort, Limit, Join, Create, Delete, Set, Merge, Aggregate
  • Three-valued logic for NULL propagation per openCypher spec

Concurrency Model

CypherLite uses a single-writer / multiple-reader model inspired by SQLite:

  • One write transaction at a time (serialized via mutex)
  • Multiple concurrent read transactions
  • WAL-based snapshot isolation ensures readers see a consistent view
  • No read blocking during writes (readers use WAL frame index)

This concurrency model is ideal for embedded use cases where the database is accessed from a single process. For multi-process access, use file locking (OS-level).

Feature Flags

CypherLite uses Cargo feature flags to enable optional functionality at compile time:

FlagDefaultCrateDescription
temporal-coreYescoreTemporal graph support
temporal-edgeNocoreTemporal edge attributes
subgraphNocoreNamed subgraph entities
hypergraphNocoreN:M hyperedge relationships (requires subgraph)
full-temporalNocoreAll temporal features
pluginNocorePlugin system (ScalarFunction, IndexPlugin, Serializer, Trigger)