Package Exports
- rust-kgdb
- rust-kgdb/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (rust-kgdb) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
rust-kgdb
Production-ready RDF/hypergraph database with 100% W3C SPARQL 1.1 + RDF 1.2 compliance, worst-case optimal joins (WCOJ), and pluggable storage backends.
This npm package provides the high-performance in-memory database. For distributed cluster deployment (1B+ triples, horizontal scaling), contact: gonnect.uk@gmail.com
Deployment Modes
rust-kgdb supports three deployment modes:
| Mode | Use Case | Scalability | This Package |
|---|---|---|---|
| In-Memory | Development, embedded apps, testing | Single node, volatile | ✅ Included |
| Single Node (RocksDB/LMDB) | Production, persistence needed | Single node, persistent | Via Rust crate |
| Distributed Cluster | Enterprise, 1B+ triples | Horizontal scaling, 9+ partitions | Contact us |
Distributed Cluster Mode (Enterprise)
For enterprise deployments requiring 1B+ triples and horizontal scaling:
Key Features:
- Subject-Anchored Partitioning: All triples for a subject are guaranteed on the same partition for optimal locality
- Arrow-Powered OLAP: High-performance analytical queries executed as optimized SQL at scale
- Automatic Query Routing: The coordinator intelligently routes queries to the right executors
- Kubernetes-Native: StatefulSet-based executors with automatic failover
- Linear Horizontal Scaling: Add more executor pods to scale throughput
How It Works:
Your SPARQL queries work unchanged. For large-scale aggregations, the cluster automatically optimizes execution:
-- Your SPARQL query
SELECT (COUNT(*) AS ?count) (AVG(?salary) AS ?avgSalary)
WHERE {
?employee <http://ex/type> <http://ex/Employee> .
?employee <http://ex/salary> ?salary .
}
-- Cluster executes as optimized SQL internally
-- Results aggregated across all partitions automaticallyRequest a demo: gonnect.uk@gmail.com
Why rust-kgdb?
| Feature | rust-kgdb | Apache Jena | RDFox |
|---|---|---|---|
| Lookup Speed | 2.78 µs | ~50 µs | 50-100 µs |
| Memory/Triple | 24 bytes | 50-60 bytes | 32 bytes |
| SPARQL 1.1 | 100% | 100% | 95% |
| RDF 1.2 | 100% | Partial | No |
| WCOJ | ✅ LeapFrog | ❌ | ❌ |
| Mobile-Ready | ✅ iOS/Android | ❌ | ❌ |
Core Technical Innovations
1. Worst-Case Optimal Joins (WCOJ)
Traditional databases use nested-loop joins with O(n²) to O(n⁴) complexity. rust-kgdb implements the LeapFrog TrieJoin algorithm—a worst-case optimal join that achieves O(n log n) for multi-way joins.
How it works:
- Trie Data Structure: Triples indexed hierarchically (S→P→O) using BTreeMap for sorted access
- Variable Ordering: Frequency-based analysis orders variables for optimal intersection
- LeapFrog Iterator: Binary search across sorted iterators finds intersections without materializing intermediate results
Query: SELECT ?x ?y ?z WHERE { ?x :p ?y . ?y :q ?z . ?x :r ?z }
Nested Loop: O(n³) - examines every combination
WCOJ: O(n log n) - iterates in sorted order, seeks forward on mismatch| Query Pattern | Before (Nested Loop) | After (WCOJ) | Speedup |
|---|---|---|---|
| 3-way star | O(n³) | O(n log n) | 50-100x |
| 4+ way complex | O(n⁴) | O(n log n) | 100-1000x |
| Chain queries | O(n²) | O(n log n) | 10-20x |
2. Sparse Matrix Engine (CSR Format)
Binary relations (e.g., foaf:knows, rdfs:subClassOf) are converted to Compressed Sparse Row (CSR) matrices for cache-efficient join evaluation:
- Memory: O(nnz) where nnz = number of edges (not O(n²))
- Matrix Multiplication: Replaces nested-loop joins
- Transitive Closure: Semi-naive Δ-matrix evaluation (not iterated powers)
// Traditional: O(n²) nested loops
for (s, p, o) in triples { ... }
// CSR Matrix: O(nnz) cache-friendly iteration
row_ptr[i] → col_indices[j] → values[j]Used for: RDFS/OWL reasoning, transitive closure, Datalog evaluation.
3. SIMD + PGO Compiler Optimizations
Zero code changes—pure compiler-level performance gains.
| Optimization | Technology | Effect |
|---|---|---|
| SIMD Vectorization | AVX2/BMI2 (Intel), NEON (ARM) | 8-wide parallel operations |
| Profile-Guided Optimization | LLVM PGO | Hot path optimization, branch prediction |
| Link-Time Optimization | LTO (fat) | Cross-crate inlining, dead code elimination |
Benchmark Results (LUBM, Intel Skylake):
| Query | Before | After (SIMD+PGO) | Improvement |
|---|---|---|---|
| Q5: 2-hop chain | 230ms | 53ms | 77% faster |
| Q3: 3-way star | 177ms | 62ms | 65% faster |
| Q4: 3-hop chain | 254ms | 101ms | 60% faster |
| Q8: Triangle | 410ms | 193ms | 53% faster |
| Q7: Hierarchy | 343ms | 198ms | 42% faster |
| Q6: 6-way complex | 641ms | 464ms | 28% faster |
| Q2: 5-way star | 234ms | 183ms | 22% faster |
| Q1: 4-way star | 283ms | 258ms | 9% faster |
Average speedup: 44.5% across all queries.
4. Quad Indexing (SPOC)
Four complementary indexes enable O(1) pattern matching regardless of query shape:
| Index | Pattern | Use Case |
|---|---|---|
| SPOC | (?s, ?p, ?o, ?g) |
Subject-centric queries |
| POCS | (?p, ?o, ?c, ?s) |
Property enumeration |
| OCSP | (?o, ?c, ?s, ?p) |
Object lookups (reverse links) |
| CSPO | (?c, ?s, ?p, ?o) |
Named graph iteration |
Storage Backends
rust-kgdb uses a pluggable storage architecture. Default is in-memory (zero configuration). For persistence, enable RocksDB.
| Backend | Feature Flag | Use Case | Status |
|---|---|---|---|
| InMemory | default |
Development, testing, embedded | ✅ Production Ready |
| RocksDB | rocksdb-backend |
Production, large datasets | ✅ 61 tests passing |
| LMDB | lmdb-backend |
Read-heavy workloads | ✅ 31 tests passing |
InMemory (Default)
Zero configuration, maximum performance. Data is volatile (lost on process exit).
High-Performance Data Structures:
| Component | Structure | Why |
|---|---|---|
| Triple Store | DashMap |
Lock-free concurrent hash map, 100K pre-allocation |
| WCOJ Trie | BTreeMap |
Sorted iteration for LeapFrog intersection |
| Dictionary | FxHashSet |
String interning with rustc-optimized hashing |
| Hypergraph | FxHashMap |
Fast node→edge adjacency lists |
| Reasoning | AHashMap |
RDFS/OWL inference with DoS-resistant hashing |
| Datalog | FxHashMap |
Semi-naive evaluation with delta propagation |
Why these structures enable sub-microsecond performance:
- DashMap: Sharded locks (16 shards default) → near-linear scaling on multi-core
- FxHashMap: Rust compiler's hash function → 30% faster than std HashMap
- BTreeMap: O(log n) ordered iteration → enables binary search in LeapFrog
- Pre-allocation: 100K capacity avoids rehashing during bulk inserts
use storage::{QuadStore, InMemoryBackend};
let store = QuadStore::new(InMemoryBackend::new());
// Ultra-fast: 2.78 µs lookups, zero disk I/ORocksDB (Persistent)
LSM-tree based storage with ACID transactions. Tested with 61 comprehensive tests.
# Cargo.toml - Enable RocksDB backend
[dependencies]
storage = { version = "0.1.10", features = ["rocksdb-backend"] }use storage::{QuadStore, RocksDbBackend};
// Create persistent database
let backend = RocksDbBackend::new("/path/to/data")?;
let store = QuadStore::new(backend);
// Features:
// - ACID transactions
// - Snappy compression (automatic)
// - Crash recovery
// - Range & prefix scanning
// - 1MB+ value support
// Force sync to disk
store.flush()?;RocksDB Test Coverage:
- Basic CRUD operations (14 tests)
- Range scanning (8 tests)
- Prefix scanning (6 tests)
- Batch operations (8 tests)
- Transactions (8 tests)
- Concurrent access (5 tests)
- Unicode & binary data (4 tests)
- Large key/value handling (8 tests)
LMDB (Memory-Mapped Persistent)
B+tree based storage with memory-mapped I/O (via heed crate). Optimized for read-heavy workloads with MVCC (Multi-Version Concurrency Control). Tested with 31 comprehensive tests.
# Cargo.toml - Enable LMDB backend
[dependencies]
storage = { version = "0.1.12", features = ["lmdb-backend"] }use storage::{QuadStore, LmdbBackend};
// Create persistent database (default 10GB map size)
let backend = LmdbBackend::new("/path/to/data")?;
let store = QuadStore::new(backend);
// Or with custom map size (1GB)
let backend = LmdbBackend::with_map_size("/path/to/data", 1024 * 1024 * 1024)?;
// Features:
// - Memory-mapped I/O (zero-copy reads)
// - MVCC for concurrent readers
// - Crash-safe ACID transactions
// - Range & prefix scanning
// - Excellent for read-heavy workloads
// Sync to disk
store.flush()?;When to use LMDB vs RocksDB:
| Characteristic | LMDB | RocksDB |
|---|---|---|
| Read Performance | ✅ Faster (memory-mapped) | Good |
| Write Performance | Good | ✅ Faster (LSM-tree) |
| Concurrent Readers | ✅ Unlimited | Limited by locks |
| Write Amplification | Low | Higher (compaction) |
| Memory Usage | Higher (map size) | Lower (cache-based) |
| Best For | Read-heavy, OLAP | Write-heavy, OLTP |
LMDB Test Coverage:
- Basic CRUD operations (8 tests)
- Range scanning (4 tests)
- Prefix scanning (3 tests)
- Batch operations (3 tests)
- Large key/value handling (4 tests)
- Concurrent access (4 tests)
- Statistics & flush (3 tests)
- Edge cases (2 tests)
TypeScript SDK
The npm package uses the in-memory backend—ideal for:
- Knowledge graph queries
- SPARQL execution
- Data transformation pipelines
- Embedded applications
import { GraphDB } from 'rust-kgdb'
// In-memory database (default, no configuration needed)
const db = new GraphDB('http://example.org/app')
// For persistence, export via CONSTRUCT:
const ntriples = db.queryConstruct('CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o }')
fs.writeFileSync('backup.nt', ntriples)Installation
npm install rust-kgdbPlatform Support (v0.2.1)
| Platform | Architecture | Status | Notes |
|---|---|---|---|
| macOS | Intel (x64) | ✅ Works out of the box | Pre-built binary included |
| macOS | Apple Silicon (arm64) | ⏳ v0.2.2 | Coming soon |
| Linux | x64 | ⏳ v0.2.2 | Coming soon |
| Linux | arm64 | ⏳ v0.2.2 | Coming soon |
| Windows | x64 | ⏳ v0.2.2 | Coming soon |
This release (v0.2.1) includes pre-built binary for macOS x64 only. Other platforms will be added in the next release.
Quick Start
Complete Working Example
import { GraphDB } from 'rust-kgdb'
// 1. Create database
const db = new GraphDB('http://example.org/myapp')
// 2. Load data (Turtle format)
db.loadTtl(`
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex: <http://example.org/> .
ex:alice a foaf:Person ;
foaf:name "Alice" ;
foaf:age 30 ;
foaf:knows ex:bob, ex:charlie .
ex:bob a foaf:Person ;
foaf:name "Bob" ;
foaf:age 25 ;
foaf:knows ex:charlie .
ex:charlie a foaf:Person ;
foaf:name "Charlie" ;
foaf:age 35 .
`, null)
// 3. Query: Find friends-of-friends (WCOJ optimized!)
const fof = db.querySelect(`
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex: <http://example.org/>
SELECT ?person ?friend ?fof WHERE {
?person foaf:knows ?friend .
?friend foaf:knows ?fof .
FILTER(?person != ?fof)
}
`)
console.log('Friends of Friends:', fof)
// [{ person: 'ex:alice', friend: 'ex:bob', fof: 'ex:charlie' }]
// 4. Aggregation: Average age
const stats = db.querySelect(`
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT (COUNT(?p) AS ?count) (AVG(?age) AS ?avgAge) WHERE {
?p a foaf:Person ; foaf:age ?age .
}
`)
console.log('Stats:', stats)
// [{ count: '3', avgAge: '30.0' }]
// 5. ASK query
const hasAlice = db.queryAsk(`
PREFIX ex: <http://example.org/>
ASK { ex:alice a <http://xmlns.com/foaf/0.1/Person> }
`)
console.log('Has Alice?', hasAlice) // true
// 6. CONSTRUCT query
const graph = db.queryConstruct(`
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex: <http://example.org/>
CONSTRUCT { ?p foaf:knows ?f }
WHERE { ?p foaf:knows ?f }
`)
console.log('Extracted graph:', graph)
// 7. Count and cleanup
console.log('Triple count:', db.count()) // 11
db.clear()Save to File
import { writeFileSync } from 'fs'
// Save as N-Triples
const db = new GraphDB('http://example.org/export')
db.loadTtl(`<http://example.org/s> <http://example.org/p> "value" .`, null)
const ntriples = db.queryConstruct(`CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o }`)
writeFileSync('output.nt', ntriples)SPARQL 1.1 Features (100% W3C Compliant)
Query Forms
// SELECT - return bindings
db.querySelect('SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10')
// ASK - boolean existence check
db.queryAsk('ASK { <http://example.org/x> ?p ?o }')
// CONSTRUCT - build new graph
db.queryConstruct('CONSTRUCT { ?s <http://new/prop> ?o } WHERE { ?s ?p ?o }')Aggregates
db.querySelect(`
SELECT ?type (COUNT(*) AS ?count) (AVG(?value) AS ?avg)
WHERE { ?s a ?type ; <http://ex/value> ?value }
GROUP BY ?type
HAVING (COUNT(*) > 5)
ORDER BY DESC(?count)
`)Property Paths
// Transitive closure (rdfs:subClassOf*)
db.querySelect('SELECT ?class WHERE { ?class rdfs:subClassOf* <http://top/Class> }')
// Alternative paths
db.querySelect('SELECT ?name WHERE { ?x (foaf:name|rdfs:label) ?name }')
// Sequence paths
db.querySelect('SELECT ?grandparent WHERE { ?x foaf:parent/foaf:parent ?grandparent }')Named Graphs
// Load into named graph
db.loadTtl('<http://s> <http://p> "o" .', 'http://example.org/graph1')
// Query specific graph
db.querySelect(`
SELECT ?s ?p ?o WHERE {
GRAPH <http://example.org/graph1> { ?s ?p ?o }
}
`)UPDATE Operations
// INSERT DATA - Add new triples
db.updateInsert(`
PREFIX ex: <http://example.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
INSERT DATA {
ex:david a foaf:Person ;
foaf:name "David" ;
foaf:age 28 ;
foaf:email "david@example.org" .
ex:project1 ex:hasLead ex:david ;
ex:budget 50000 ;
ex:status "active" .
}
`)
// Verify insert
const count = db.count()
console.log(`Total triples after insert: ${count}`)
// DELETE WHERE - Remove matching triples
db.updateDelete(`
PREFIX ex: <http://example.org/>
DELETE WHERE { ?s ex:status "completed" }
`)Bulk Data Loading Example
import { GraphDB } from 'rust-kgdb'
import { readFileSync } from 'fs'
const db = new GraphDB('http://example.org/bulk-load')
// Load Turtle file
const ttlData = readFileSync('data/knowledge-graph.ttl', 'utf-8')
db.loadTtl(ttlData, null) // null = default graph
// Load into named graph
const orgData = readFileSync('data/organization.ttl', 'utf-8')
db.loadTtl(orgData, 'http://example.org/graphs/org')
// Load N-Triples format
const ntData = readFileSync('data/triples.nt', 'utf-8')
db.loadNTriples(ntData, null)
console.log(`Loaded ${db.count()} triples`)
// Query across all graphs
const results = db.querySelect(`
SELECT ?g (COUNT(*) AS ?count) WHERE {
GRAPH ?g { ?s ?p ?o }
}
GROUP BY ?g
`)
console.log('Triples per graph:', results)Sample Application
Knowledge Graph Demo
A complete, production-ready sample application demonstrating enterprise knowledge graph capabilities is available in the repository.
Location: examples/knowledge-graph-demo/
Features Demonstrated:
- Complete organizational knowledge graph (employees, departments, projects, skills)
- SPARQL SELECT queries with star and chain patterns (WCOJ-optimized)
- Aggregations (COUNT, AVG, GROUP BY, HAVING)
- Property paths for transitive closure (organizational hierarchy)
- SPARQL ASK and CONSTRUCT queries
- Named graphs for multi-tenant data isolation
- Data export to Turtle format
Run the Demo:
cd examples/knowledge-graph-demo
npm install
npm startSample Output:
The demo creates a realistic knowledge graph with:
- 5 employees across 4 departments
- 13 technical and soft skills
- 2 software projects
- Reporting hierarchies and salary data
- Named graph for sensitive compensation data
Example Query from Demo (finds all direct and indirect reports):
const pathQuery = `
PREFIX ex: <http://example.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?employee ?name WHERE {
?employee ex:reportsTo+ ex:alice . # Transitive closure
?employee foaf:name ?name .
}
ORDER BY ?name
`
const results = db.querySelect(pathQuery)Learn More: See the demo README for full documentation, query examples, and how to customize the knowledge graph.
API Reference
GraphDB Class
class GraphDB {
constructor(baseUri: string) // Create with base URI
static inMemory(): GraphDB // Create anonymous in-memory DB
// Data Loading
loadTtl(data: string, graph: string | null): void
loadNTriples(data: string, graph: string | null): void
// SPARQL Queries (WCOJ-optimized)
querySelect(sparql: string): Array<Record<string, string>>
queryAsk(sparql: string): boolean
queryConstruct(sparql: string): string // Returns N-Triples
// SPARQL Updates
updateInsert(sparql: string): void
updateDelete(sparql: string): void
// Database Operations
count(): number
clear(): void
getVersion(): string
}Node Class
class Node {
static iri(uri: string): Node
static literal(value: string): Node
static langLiteral(value: string, lang: string): Node
static typedLiteral(value: string, datatype: string): Node
static integer(value: number): Node
static boolean(value: boolean): Node
static blank(id: string): Node
}Performance Characteristics
Complexity Analysis
| Operation | Complexity | Notes |
|---|---|---|
| Triple lookup | O(1) | Hash-based SPOC index |
| Pattern scan | O(k) | k = matching triples |
| Star join (WCOJ) | O(n log n) | LeapFrog intersection |
| Complex join (WCOJ) | O(n log n) | Trie-based |
| Transitive closure | O(n²) worst | CSR matrix optimization |
| Bulk insert | O(n) | Batch indexing |
Memory Layout
Triple: 24 bytes
├── Subject: 8 bytes (dictionary ID)
├── Predicate: 8 bytes (dictionary ID)
└── Object: 8 bytes (dictionary ID)
String Interning: All URIs/literals stored once in Dictionary
Index Overhead: ~4x base triple size (4 indexes)
Total: ~120 bytes/triple including indexesPerformance Benchmarks
By Deployment Mode
| Mode | Lookup | Insert | Memory | Dataset Size |
|---|---|---|---|---|
| In-Memory (npm) | 2.78 µs | 146K/sec | 24 bytes/triple | <10M triples |
| Single Node (RocksDB) | 5-10 µs | 100K/sec | On-disk | <100M triples |
| Distributed Cluster | 10-50 µs | 500K+/sec* | Distributed | 1B+ triples |
*Aggregate throughput across all executors with HDRF partitioning
SIMD + PGO Query Performance (LUBM Benchmark)
| Query | Pattern | Time | Improvement |
|---|---|---|---|
| Q5 | 2-hop chain | 53ms | 77% faster |
| Q3 | 3-way star | 62ms | 65% faster |
| Q4 | 3-hop chain | 101ms | 60% faster |
| Q8 | Triangle | 193ms | 53% faster |
| Q7 | Hierarchy | 198ms | 42% faster |
Average: 44.5% speedup with zero code changes (compiler optimizations only).
Version History
v0.2.2 (2025-12-08) - Enhanced Documentation
- Added comprehensive INSERT DATA examples with PREFIX syntax
- Added bulk data loading example with named graphs
- Enhanced SPARQL UPDATE section with real-world patterns
- Improved documentation for data import workflows
v0.2.1 (2025-12-08) - npm Platform Fix
- Fixed native module loading for platform-specific binaries
- This release includes pre-built binary for macOS x64 only
- Other platforms coming in next release
v0.2.0 (2025-12-08) - Distributed Cluster Support
- NEW: Distributed cluster architecture with HDRF partitioning
- Subject-Hash Filter for accurate COUNT deduplication across replicas
- Arrow-powered OLAP query path for high-performance analytical queries
- Coordinator-Executor pattern with gRPC communication
- 9-partition default for optimal data distribution
- Contact for cluster deployment: gonnect.uk@gmail.com
- Coming soon: Embedding support for semantic search (v0.3.0)
v0.1.12 (2025-12-01) - LMDB Backend Release
- LMDB storage backend fully implemented (31 tests passing)
- Memory-mapped I/O for optimal read performance
- MVCC concurrency for unlimited concurrent readers
- Complete LMDB vs RocksDB comparison documentation
- Sample application with 87 triples demonstrating all features
v0.1.9 (2025-12-01) - SIMD + PGO Release
- 44.5% average speedup via SIMD + PGO compiler optimizations
- WCOJ execution with LeapFrog TrieJoin
- Release automation infrastructure
- All packages updated to gonnect-uk namespace
v0.1.8 (2025-12-01) - WCOJ Execution
- WCOJ execution path activated
- Variable ordering analysis for optimal joins
- 577 tests passing
v0.1.7 (2025-11-30)
- Query optimizer with automatic strategy selection
- WCOJ algorithm integration (planning phase)
v0.1.3 (2025-11-18)
- Initial TypeScript SDK
- 100% W3C SPARQL 1.1 compliance
- 100% W3C RDF 1.2 compliance
Use Cases
| Domain | Application |
|---|---|
| Knowledge Graphs | Enterprise ontologies, taxonomies |
| Semantic Search | Structured queries over unstructured data |
| Data Integration | ETL with SPARQL CONSTRUCT |
| Compliance | SHACL validation, provenance tracking |
| Graph Analytics | Pattern detection, community analysis |
| Mobile Apps | Embedded RDF on iOS/Android |
Links
License
Apache License 2.0
Built with Rust + NAPI-RS