Package Exports
- audrey
- audrey/mcp
Readme
Audrey
Biological memory architecture for AI agents. Gives agents cognitive memory that decays, consolidates, self-validates, and learns from experience — not just a database.
Why Audrey Exists
Every AI memory tool today (Mem0, Zep, LangChain Memory) is a filing cabinet. Store stuff, retrieve stuff. None of them do what biological memory actually does:
- Memories don't decay. A fact from 6 months ago has the same weight as one from today.
- No consolidation. Raw events never become general principles.
- No contradiction detection. Conflicting facts coexist silently.
- No self-defense. If an agent hallucinates and encodes the hallucination, it becomes "truth."
Audrey fixes all of this by modeling memory the way the brain does:
| Brain Structure | Audrey Component | What It Does |
|---|---|---|
| Hippocampus | Episodic Memory | Fast capture of raw events and observations |
| Neocortex | Semantic Memory | Consolidated principles and patterns |
| Sleep Replay | Consolidation Engine | Extracts patterns from episodes, promotes to principles |
| Prefrontal Cortex | Validation Engine | Truth-checking, contradiction detection |
| Amygdala | Salience Scorer | Importance weighting for retention priority |
Install
MCP Server for Claude Code (one command)
npx audrey installThat's it. Audrey auto-detects API keys from your environment:
OPENAI_API_KEYset? Uses real OpenAI embeddings (1536d) for semantic search.ANTHROPIC_API_KEYset? Enables LLM-powered consolidation and contradiction detection.- Neither? Runs with mock embeddings — fully functional, upgrade anytime.
To upgrade later, set the keys and re-run npx audrey install.
# Check status
npx audrey status
# Uninstall
npx audrey uninstallEvery Claude Code session now has 5 memory tools: memory_encode, memory_recall, memory_consolidate, memory_introspect, memory_resolve_truth.
SDK in Your Code
npm install audreyZero external infrastructure. One SQLite file.
Usage
import { Audrey } from 'audrey';
// 1. Create a brain
const brain = new Audrey({
dataDir: './agent-memory',
agent: 'my-agent',
embedding: { provider: 'mock', dimensions: 8 }, // or 'openai' for production
});
// 2. Encode observations
await brain.encode({
content: 'Stripe API returns 429 above 100 req/s',
source: 'direct-observation',
tags: ['stripe', 'rate-limit'],
});
// 3. Recall what you know
const memories = await brain.recall('stripe rate limits', { limit: 5 });
// Returns: [{ content, type, confidence, score, ... }]
// 4. Consolidate episodes into principles (the "sleep" cycle)
await brain.consolidate();
// 5. Check brain health
const stats = brain.introspect();
// { episodic: 47, semantic: 12, procedural: 3, dormant: 8, ... }
// 6. Clean up
brain.close();Configuration
const brain = new Audrey({
dataDir: './audrey-data', // SQLite database directory
agent: 'my-agent', // Agent identifier
// Embedding provider (required)
embedding: {
provider: 'mock', // 'mock' for testing, 'openai' for production
dimensions: 8, // 8 for mock, 1536 for openai text-embedding-3-small
apiKey: '...', // Required for openai
},
// LLM provider (optional — enables smart consolidation + contradiction detection)
llm: {
provider: 'anthropic', // 'mock', 'anthropic', or 'openai'
apiKey: '...', // Required for anthropic/openai
model: 'claude-sonnet-4-6', // Optional model override
},
// Consolidation settings
consolidation: {
minEpisodes: 3, // Minimum cluster size for principle extraction
},
// Decay settings
decay: {
dormantThreshold: 0.1, // Below this confidence = dormant
},
});Without an LLM provider, consolidation uses a default text-based extractor and contradiction detection is similarity-only. With an LLM provider, Audrey extracts real generalized principles, detects semantic contradictions, and resolves context-dependent truths.
Environment Variables (MCP Server)
| Variable | Default | Purpose |
|---|---|---|
AUDREY_DATA_DIR |
~/.audrey/data |
SQLite database directory |
AUDREY_AGENT |
claude-code |
Agent identifier |
AUDREY_EMBEDDING_PROVIDER |
mock |
mock or openai |
AUDREY_EMBEDDING_DIMENSIONS |
8 |
Vector dimensions (1536 for openai) |
OPENAI_API_KEY |
— | Required when embedding/LLM provider is openai |
AUDREY_LLM_PROVIDER |
— | mock, anthropic, or openai |
ANTHROPIC_API_KEY |
— | Required when LLM provider is anthropic |
Core Concepts
Four Memory Types
Episodic (hot, fast decay) — Raw events. "Stripe returned 429 at 3pm." Immutable. Append-only. Never modified.
Semantic (warm, slow decay) — Consolidated principles. "Stripe enforces 100 req/s rate limit." Extracted automatically from clusters of episodic memories.
Procedural (cold, slowest decay) — Learned workflows. "When Stripe rate-limits, implement exponential backoff." Skills the agent has acquired.
Causal — Why things happened. Not just "A then B" but "A caused B because of mechanism C." Prevents correlation-as-causation.
Confidence Formula
Every memory has a compositional confidence score:
C(m, t) = w_s * S + w_e * E + w_r * R(t) + w_ret * Ret(t)| Component | What It Measures | Default Weight |
|---|---|---|
| S — Source reliability | How trustworthy is the origin? | 0.30 |
| E — Evidence agreement | Do observations agree or contradict? | 0.35 |
| R(t) — Recency decay | How old is the memory? (Ebbinghaus curve) | 0.20 |
| Ret(t) — Retrieval reinforcement | How often is this memory accessed? | 0.15 |
Source reliability hierarchy:
| Source Type | Reliability |
|---|---|
direct-observation |
0.95 |
told-by-user |
0.90 |
tool-result |
0.85 |
inference |
0.60 |
model-generated |
0.40 (capped at 0.6 confidence) |
The model-generated cap prevents circular self-confirmation — an agent can't boost its own hallucinations into high-confidence "facts."
Decay (Forgetting Curves)
Unreinforced memories lose confidence over time following Ebbinghaus exponential decay:
| Memory Type | Half-Life | Rationale |
|---|---|---|
| Episodic | 7 days | Raw events go stale fast |
| Semantic | 30 days | Principles are hard-won |
| Procedural | 90 days | Skills are slowest to forget |
Retrieval resets the decay clock. Frequently accessed memories persist. Memories below the dormant threshold (0.1) become dormant — still searchable with includeDormant: true, but excluded from default recall.
Consolidation (The "Sleep" Cycle)
Audrey's consolidation engine periodically clusters similar episodic memories and extracts general principles:
3 episodes about Stripe 429 errors
→ 1 semantic principle: "Stripe enforces ~100 req/s rate limit"The pipeline: Cluster (embedding similarity) → Extract (LLM or callback) → Validate (check for contradictions) → Promote (write semantic memory) → Audit (log everything).
Consolidation is idempotent. Re-running on the same data produces no duplicates. Every run creates an audit record with input/output IDs for full traceability.
Contradiction Handling
When memories conflict, Audrey doesn't force a winner. Contradictions have a lifecycle:
open → resolved | context_dependent | reopenedContext-dependent truths are modeled explicitly:
// "Stripe rate limit is 100 req/s" (live keys)
// "Stripe rate limit is 25 req/s" (test keys)
// Both true — under different conditionsNew high-confidence evidence can reopen resolved disputes.
Rollback
Bad consolidation? Undo it:
const history = brain.consolidationHistory();
brain.rollback(history[0].id);
// Semantic memories → rolled_back state
// Source episodes → un-consolidated
// Full audit trail preservedCircular Self-Confirmation Defense
The most dangerous exploit in AI memory: agent hallucinates X, encodes it, later retrieves it, "reinforcement" boosts confidence, X eventually consolidates as "established truth."
Audrey's defenses:
- Source diversity requirement — Consolidation requires evidence from 2+ distinct source types
- Model-generated cap — Memories from
model-generatedsources are capped at 0.6 confidence - Source lineage tracking — Provenance chains detect when all evidence traces back to a single inference
- Source diversity score — Every semantic memory tracks how many different source types contributed
API Reference
new Audrey(config)
See Configuration above for all options.
brain.encode(params) → Promise<string>
Encode an episodic memory. Returns the memory ID.
const id = await brain.encode({
content: 'What happened', // Required. Non-empty string.
source: 'direct-observation', // Required. See source types above.
salience: 0.8, // Optional. 0-1. Default: 0.5
causal: { // Optional. What caused this / what it caused.
trigger: 'batch-processing',
consequence: 'queue-backed-up',
},
tags: ['stripe', 'production'], // Optional. Array of strings.
supersedes: 'previous-id', // Optional. ID of episode this corrects.
});Episodes are immutable. Corrections create new records with supersedes links. The original is preserved.
brain.recall(query, options) → Promise<Memory[]>
Retrieve memories ranked by similarity * confidence.
const memories = await brain.recall('stripe rate limits', {
minConfidence: 0.5, // Filter below this confidence
types: ['semantic'], // Filter by memory type
limit: 5, // Max results
includeProvenance: true, // Include evidence chains
includeDormant: false, // Include dormant memories
});Each result:
{
id: '01ABC...',
content: 'Stripe enforces ~100 req/s rate limit',
type: 'semantic',
confidence: 0.87,
score: 0.74, // similarity * confidence
source: 'consolidation',
state: 'active',
provenance: { // When includeProvenance: true
evidenceEpisodeIds: ['01XYZ...', '01DEF...'],
evidenceCount: 3,
supportingCount: 3,
contradictingCount: 0,
},
}Retrieval automatically reinforces matched memories (boosts confidence, resets decay clock).
brain.encodeBatch(paramsList) → Promise<string[]>
Encode multiple episodes in one call. Same params as encode(), but as an array.
const ids = await brain.encodeBatch([
{ content: 'Stripe returned 429', source: 'direct-observation' },
{ content: 'Redis timed out', source: 'tool-result' },
{ content: 'User reports slow checkout', source: 'told-by-user' },
]);brain.recallStream(query, options) → AsyncGenerator<Memory>
Streaming version of recall(). Yields results one at a time. Supports early break.
for await (const memory of brain.recallStream('stripe issues', { limit: 10 })) {
console.log(memory.content, memory.score);
if (memory.score > 0.9) break;
}brain.consolidate(options) → Promise<ConsolidationResult>
Run the consolidation engine manually.
const result = await brain.consolidate({
minClusterSize: 3,
similarityThreshold: 0.80,
extractPrinciple: (episodes) => ({ // Optional LLM callback
content: 'Extracted principle text',
type: 'semantic',
}),
});
// { runId, status, episodesEvaluated, clustersFound, principlesExtracted }brain.decay(options) → DecayResult
Apply forgetting curves. Transitions low-confidence memories to dormant.
const result = brain.decay({ dormantThreshold: 0.1 });
// { totalEvaluated, transitionedToDormant, timestamp }brain.rollback(runId) → RollbackResult
Undo a consolidation run.
brain.rollback('01ABC...');
// { rolledBackMemories: 3, restoredEpisodes: 9 }brain.resolveTruth(contradictionId) → Promise<Resolution>
Resolve an open contradiction using LLM reasoning. Requires an LLM provider configured.
const resolution = await brain.resolveTruth('contradiction-id');
// { resolution: 'context_dependent', conditions: { a: 'live keys', b: 'test keys' }, explanation: '...' }brain.introspect() → Stats
Get memory system health stats.
brain.introspect();
// {
// episodic: 247, semantic: 31, procedural: 8,
// causalLinks: 42, dormant: 15,
// contradictions: { open: 2, resolved: 7, context_dependent: 3, reopened: 0 },
// lastConsolidation: '2026-02-18T22:00:00Z',
// totalConsolidationRuns: 14,
// }brain.consolidationHistory() → ConsolidationRun[]
Full audit trail of all consolidation runs.
Events
brain.on('encode', ({ id, content, source }) => { ... });
brain.on('reinforcement', ({ episodeId, targetId, similarity }) => { ... });
brain.on('contradiction', ({ episodeId, contradictionId, semanticId, resolution }) => { ... });
brain.on('consolidation', ({ runId, principlesExtracted }) => { ... });
brain.on('decay', ({ totalEvaluated, transitionedToDormant }) => { ... });
brain.on('rollback', ({ runId, rolledBackMemories }) => { ... });
brain.on('error', (err) => { ... });brain.close()
Close the database connection.
Architecture
audrey-data/
audrey.db ← Single SQLite file. WAL mode. That's your brain.src/
audrey.js Main class. EventEmitter. Public API surface.
causal.js Causal graph management. LLM-powered mechanism articulation.
confidence.js Compositional confidence formula. Pure math.
consolidate.js "Sleep" cycle. KNN clustering → LLM extraction → promote.
db.js SQLite + sqlite-vec. Schema, vec0 tables, migrations.
decay.js Ebbinghaus forgetting curves.
embedding.js Pluggable providers (Mock, OpenAI). Batch embedding.
encode.js Immutable episodic memory creation + vec0 writes.
introspect.js Health dashboard queries.
llm.js Pluggable LLM providers (Mock, Anthropic, OpenAI).
prompts.js Structured prompt templates for LLM operations.
recall.js KNN retrieval + confidence scoring + async streaming.
rollback.js Undo consolidation runs.
utils.js Date math, safe JSON parse.
validate.js KNN validation + LLM contradiction detection.
index.js Barrel export.
mcp-server/
index.js MCP tool server (5 tools, stdio transport) + CLI subcommands.
config.js Shared config (env var parsing, install arg builder).Database Schema
| Table | Purpose |
|---|---|
episodes |
Immutable raw events (content, source, salience, causal context) |
semantics |
Consolidated principles (content, state, evidence chain) |
procedures |
Learned workflows (trigger conditions, success/failure counts) |
causal_links |
Causal relationships (cause, effect, mechanism, link type) |
contradictions |
Dispute tracking (claims, state, resolution) |
consolidation_runs |
Audit trail (inputs, outputs, status) |
vec_episodes |
sqlite-vec KNN index for episode embeddings |
vec_semantics |
sqlite-vec KNN index for semantic embeddings |
vec_procedures |
sqlite-vec KNN index for procedural embeddings |
audrey_config |
Dimension configuration and metadata |
All mutations use SQLite transactions. CHECK constraints enforce valid states and source types. Vector search uses sqlite-vec with cosine distance.
Running Tests
npm test # 208 tests across 17 files
npm run test:watchRunning the Demo
node examples/stripe-demo.jsDemonstrates the full pipeline: encode 3 rate-limit observations → consolidate into principle → recall proactively.
Roadmap
v0.1.0 — Foundation
- Immutable episodic memory with append-only records
- Compositional confidence formula (source + evidence + recency + retrieval)
- Ebbinghaus-inspired forgetting curves with configurable half-lives
- Dormancy transitions for low-confidence memories
- Confidence-weighted recall across episodic/semantic/procedural types
- Provenance chains (which episodes contributed to which principles)
- Retrieval reinforcement (frequently accessed memories resist decay)
- Consolidation engine with clustering and principle extraction
- Idempotent consolidation with checkpoint cursors
- Full consolidation audit trail (input/output IDs per run)
- Consolidation rollback (undo bad runs, restore episodes)
- Contradiction lifecycle (open/resolved/context_dependent/reopened)
- Circular self-confirmation defense (model-generated cap at 0.6)
- Source type diversity tracking on semantic memories
- Supersedes links for correcting episodic memories
- Pluggable embedding providers (Mock for tests, OpenAI for production)
- Causal context storage (trigger/consequence per episode)
- Introspection API (memory counts, contradiction stats, consolidation history)
- EventEmitter lifecycle hooks (encode, reinforcement, consolidation, decay, rollback, error)
- SQLite with WAL mode, CHECK constraints, indexes, foreign keys
- Transaction safety on all multi-step mutations
- Input validation on public API (content, salience, tags, source)
- Shared utility extraction (cosine similarity, date math, safe JSON parse)
- 104 tests across 12 test files
- Proof-of-concept demo (Stripe rate limit scenario)
v0.2.0 — LLM Integration
- LLM-powered principle extraction (replace callback with Anthropic/OpenAI calls)
- LLM-based contradiction detection during validation
- Causal mechanism articulation via LLM (not just trigger/consequence)
- Spurious correlation detection (require mechanistic explanation for causal links)
- Context-dependent truth resolution via LLM
- Configurable LLM provider for consolidation (Mock, Anthropic, OpenAI)
- Structured prompt templates for all LLM operations
- 142 tests across 15 test files
v0.3.0 — Vector Performance
- sqlite-vec native vector indexing (vec0 virtual tables with cosine distance)
- KNN queries for recall, validation, and consolidation clustering (all vector math in C)
- SQL-native metadata filtering in KNN (state, source, consolidated)
- Batch encoding API (
encodeBatch— encode N episodes in one call) - Streaming recall with async generators (
recallStream) - Dimension configuration and mismatch validation
- Automatic migration from v0.2.0 embedding BLOBs to vec0 tables
- 168 tests across 16 test files
v0.3.1 — MCP Server + JSDoc Types
- MCP tool server via
@modelcontextprotocol/sdkwith stdio transport - 5 tools:
memory_encode,memory_recall,memory_consolidate,memory_introspect,memory_resolve_truth - Configuration via environment variables (data dir, embedding provider, LLM provider)
- One-command install:
npx audrey install(auto-detects API keys) - CLI subcommands:
install,uninstall,status - JSDoc type annotations on all public exports (16 source files)
- Published to npm with proper package metadata
- 194 tests across 17 test files
v0.3.3 — Hardening (current)
- Fix status command dimension mismatch (read stored dimensions from existing database)
- Safe JSON parsing in LLM providers (descriptive errors on malformed responses)
- Fetch timeouts on all API calls (configurable, default 30s)
- Config validation in Audrey constructor (dormantThreshold, minEpisodes)
- encodeBatch error isolation tests
- 208 tests across 17 test files
v0.4.0 — Type Safety & Developer Experience
- Full TypeScript conversion with strict mode
- Published type declarations (.d.ts)
- Schema versioning and migration system
- Structured logging (optional, pluggable)
v0.4.5 — Embedding Migration (deferred from v0.3.0)
- Embedding migration pipeline (re-embed when models change)
- Re-consolidation queue (re-run consolidation with new embedding model)
v0.5.0 — Advanced Memory Features
- Adaptive consolidation threshold (learn optimal N per domain, not fixed N=3)
- Source-aware confidence for semantic memories (track strongest source composition)
- Configurable decay rates per Audrey instance
- Configurable confidence weights per Audrey instance
- PII detection and redaction (opt-in)
- Memory export/import (JSON snapshot)
- Auto-consolidation scheduling (setInterval with configurable interval)
v0.6.0 — Scale
- pgvector adapter for PostgreSQL backend
- Redis adapter for distributed caching
- Connection pooling for concurrent agent access
- Pagination on recall queries (cursor-based)
- Benchmarks: encode throughput, recall latency at 10k/100k/1M memories
v1.0.0 — Production Ready
- Comprehensive error handling at all boundaries
- Rate limiting on embedding API calls
- Memory usage profiling and optimization
- Security audit (injection, data isolation)
- Cross-agent knowledge sharing protocol (Hivemind)
- Documentation site
- Integration guides (LangChain, CrewAI, Claude Code, custom agents)
Design Decisions
Why SQLite, not Postgres? Zero infrastructure. npm install and you have a brain. The adapter pattern means you can migrate to pgvector when you need to scale.
Why append-only episodes? Immutability creates a reliable audit trail. Corrections use supersedes links rather than mutations. You can always trace back to what actually happened.
Why Ebbinghaus curves? Biological forgetting is an adaptive feature, not a bug. It prevents cognitive overload, maintains relevance, and enables generalization. Audrey's forgetting works the same way.
Why model-generated cap at 0.6? Prevents the most dangerous exploit in AI memory: circular self-confirmation where an agent's own inferences bootstrap themselves into high-confidence "facts" through repeated retrieval.
Why no TypeScript yet? Prototyping speed. TypeScript conversion is on the roadmap for v0.4.0. The pure-math modules (confidence.js, utils.js) are already type-safe in practice.
License
MIT