ComposeCache

Adaptive compositional semantic caching for LLM APIs and RAG pipelines.

Why ComposeCache?

Existing semantic caches like GPTCache treat every query atomically. ComposeCache decomposes compositional queries (e.g., "Compare X and Y") into sub-queries, caches each independently, and enables partial hits - saving 50%+ on LLM API costs.

Quick Start

npm install composecache
npx composecache init --db postgres://localhost/myapp

import { ComposeCache } from 'composecache';

const cache = new ComposeCache({
  database: process.env.DATABASE_URL,
  openaiApiKey: process.env.OPENAI_API_KEY
});

const response = await cache.complete({
  model: 'gpt-3.5-turbo',
  messages: [{ role: 'user', content: 'Compare France and Germany' }],
  documents: retrievedDocs // Optional: for RAG
});

console.log(response.content); // The answer
console.log(response.cacheType); // 'exact' | 'semantic' | 'partial' | 'miss'
console.log(response.costSaved); // $ saved

Features

Compositional query decomposition (novel)
Document-aware cache keys via MinHash
Uncertainty-gated population (blocks hallucinations)
Drop-in SDK for Node.js and Python
Works with your own PostgreSQL database

Architecture

Query Processing Flow

flowchart TD
  Q[Incoming query q] --> C{Classify: atomic or compositional?}

  C -->|atomic| A[Compute SHA-256 key<br/>norm(q) || fD || theta]
  C -->|compositional| D[Decompose into sub-queries<br/>s1 ... sk with deps E]

  A --> P[Probe cache for each query<br/>exact hash, then semantic + doc]
  D --> P

  P --> H{All hits?}
  H -->|yes| R[Return cached response<br/>or compose from subs]
  H -->|no / partial| G[Generate missing sub-answers<br/>via RAG + LLM API]

  R --> F[Compose final response]
  G --> F

  F --> U[Uncertainty gate: u <= umax?<br/>Write to cache if yes]

System Architecture

flowchart TD
  APP[Developer application<br/>Node.js / Python]

  subgraph SDK[ComposeCache middleware - SDK / npm package]
    direction LR
    S1[1. Decompose] --> S2[2. Probe] --> S3[3. Resolve] --> S4[4. Compose] --> S5[5. Populate]
  end

  subgraph MODS[Core modules]
    direction LR
    E[Embedder<br/>all-MiniLM-L6-v2]
    L[Decomposition LLM<br/>GPT-4o-mini]
    M[MinHash + uncertainty<br/>estimator]
  end

  DB[Developer PostgreSQL + pgvector<br/>exact keys + semantic vectors]
  API[Upstream LLM API<br/>OpenAI / Anthropic]

  APP --> SDK
  SDK --> MODS
  SDK -->|cache read / write| DB
  SDK -->|miss only| API

API Reference

[TODO: full typing reference]

Benchmarks

[TODO: HotpotQA results table]

License

MIT