ComposeCache

Adaptive compositional semantic caching for LLM APIs and RAG pipelines.

Why ComposeCache?

Existing semantic caches like GPTCache treat every query atomically. ComposeCache decomposes compositional queries (e.g., "Compare X and Y") into sub-queries, caches each independently, and enables partial hits - saving 50%+ on LLM API costs.

Quick Start

npm install composecache
npx composecache init --db postgres://localhost/myapp

import { ComposeCache } from 'composecache';

const cache = new ComposeCache({
  database: process.env.DATABASE_URL,
  openaiApiKey: process.env.OPENAI_API_KEY
});

const response = await cache.complete({
  model: 'gpt-3.5-turbo',
  messages: [{ role: 'user', content: 'Compare France and Germany' }],
  documents: retrievedDocs // Optional: for RAG
});

console.log(response.content); // The answer
console.log(response.cacheType); // 'exact' | 'semantic' | 'partial' | 'miss'
console.log(response.costSaved); // $ saved

Features

Compositional query decomposition (novel)
Document-aware cache keys via MinHash
Uncertainty-gated population (blocks hallucinations)
Drop-in SDK for Node.js and Python
Works with your own PostgreSQL database

Architecture

[TODO: diagram]

API Reference

[TODO: full typing reference]

Benchmarks

[TODO: HotpotQA results table]

License

MIT