Package Exports
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (recallmem) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
Persistent Private AI. Powered by Gemma 4 running locally on your own machine.
Two chats. Different sessions. The AI remembers.
What is this
A personal AI chat app with real memory that runs 100% on your machine. Your conversations stay local. The AI builds a profile of who you are over time, extracts facts after every chat, and vector-searches across your entire history to find relevant context. By the time you've used it for a week, it knows you better than any cloud AI because it never forgets.
The default model is Gemma 4 (Apache 2.0) running locally via Ollama. Pick any size from E2B (runs on a phone) up to 31B Dense (best quality, needs a workstation). Or skip Ollama entirely and bring your own API key for Claude, GPT, Groq, Together, OpenRouter, or anything OpenAI-compatible.
The memory is the actual differentiator. Not the model. Not the UI. Memory reads are deterministic SQL + cosine similarity, not LLM tool calls. The chat model never touches your database. Facts are proposed by a local LLM but validated by TypeScript before storage. Deep dive on the architecture →
Features
- Three-layer memory across every chat: synthesized profile, extracted facts table, and vector search over all past conversations
- Temporal awareness so the model knows what's current vs. historical. Auto-retires stale facts when the truth changes.
- Live fact extraction after every assistant reply, not just when the chat ends
- Memory inspector where you can view, edit, or delete every fact
- Vector search across past conversations with dated recall
- Custom rules for how you want the AI to talk to you
- File uploads (images, PDFs, code). Gemma 4 handles vision natively.
- Web search when using Anthropic or Ollama (via Brave Search)
- Wipe memory unrecoverably with
DELETE+VACUUM FULL+CHECKPOINT - Bring any LLM. Ollama, Anthropic, OpenAI, or any OpenAI-compatible API.
Quick start (Mac)
RecallMEM is built and tested on macOS. Mac is the supported platform.
Prerequisites: Node.js 20+ and Homebrew.
npx recallmemThat's the whole install. Here's what happens after you hit Enter:
- It checks what you already have on your Mac (Node, Postgres, Ollama). Anything already installed gets skipped.
- It shows you a list of what's missing with ✓ and ✗ marks.
- It asks one question:
Install everything now? [Y/n]. Hit Enter to say yes. - It runs
brew installfor Postgres 17, pgvector, and Ollama. You'll see real-time progress in your terminal. - It starts Postgres and Ollama as background services so they keep running across reboots.
- It downloads EmbeddingGemma (~600 MB, ~1-2 min). This is required for the memory system.
- It asks which Gemma 4 model you want. Three options:
- 1) Gemma 4 26B — 18 GB, fast, recommended for most people
- 2) Gemma 4 31B — 19 GB, slower, smartest answers
- 3) Gemma 4 E2B — 2 GB, very fast, good for testing or older laptops
- It downloads the model you picked. E2B finishes in 2-3 min. The 18 GB option takes 10-30 min depending on your internet.
- It runs database migrations (~5 seconds).
- It builds the app for production (~30-60 seconds, first install only).
- It starts the server. Open
http://localhost:3000in your browser and start chatting.
Total time: 5-45 minutes depending on which model you picked and your internet speed. Most of that is the model download. You only have to interact with it twice — once to confirm install, once to pick a model. After that, walk away.
Subsequent runs are instant. Just npx recallmem and the chat opens.
Just want cloud models? (Claude / GPT)
You still need Postgres for local memory storage, but you can skip Ollama entirely:
brew install postgresql@17 pgvector
brew services start postgresql@17
npx recallmemAfter the app starts, go to Settings → Providers → Add a new provider, paste your API key, and pick that model from the chat dropdown.
Linux (not officially supported, manual install)
Auto-install isn't wired up for Linux. You'll need to install everything by hand:
# Postgres + pgvector (apt example)
sudo apt install postgresql-17 postgresql-17-pgvector
sudo systemctl start postgresql
# Ollama
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl start ollama
ollama pull embeddinggemma
ollama pull gemma4:26b
# Run
npx recallmemWindows (not supported, use WSL2)
Native Windows is not supported. Use WSL2 with Ubuntu and follow the Linux steps above inside WSL.
CLI commands
npx recallmem # Setup if needed, then start the app
npx recallmem init # Setup only (deps, DB, models, env)
npx recallmem start # Start the server (assumes setup done)
npx recallmem doctor # Check what's missing or broken
npx recallmem upgrade # Pull latest code, run pending migrations
npx recallmem version # Print versionPrivacy
If you only use Ollama, nothing leaves your machine, ever. You can air-gap the computer and it keeps working. If you add a cloud provider, only the chat messages and your assembled system prompt go to that provider's servers. Your database, embeddings, and saved API keys stay local.
For developers
Underneath the chat UI, RecallMEM is a deterministic memory framework you can fork and use in your own AI app. The whole lib/ folder is intentionally framework-shaped.
lib/
├── memory.ts Memory orchestrator (profile + facts + vector recall in parallel)
├── prompts.ts System prompt assembly with all memory context
├── facts.ts Fact extraction (LLM proposes) + validation (TypeScript decides)
├── profile.ts Synthesizes a structured profile from active facts
├── chunks.ts Transcript splitting, embedding, vector search
├── chats.ts Chat CRUD + transcript serialization
├── post-chat.ts Post-chat pipeline (title, facts, profile rebuild, embed)
├── rules.ts Custom user rules / instructions
├── embeddings.ts EmbeddingGemma calls via Ollama
├── llm.ts LLM router (Ollama, Anthropic, OpenAI, OpenAI-compatible)
└── db.ts Postgres pool + configurable user ID resolverWire in your own auth with two calls at startup and every lib function respects it. See the developer docs for embedding the memory layer into your own app, the database schema, testing, and optional Langfuse observability.
Docs
| Doc | What's in it |
|---|---|
| Architecture deep dive | How deterministic memory works, read/write paths, validation pipeline, why the LLM is not in charge |
| Developer guide | Embedding the memory framework, auth wiring, schema, testing, Langfuse setup |
| Hardware guide | Which model fits which machine, RAM requirements, cloud vs. local tradeoffs |
| Troubleshooting | Every gotcha I've hit and how to fix it |
| Manual install | Step-by-step if you don't want to use the CLI |
Limitations (v0.1)
Text only (no voice yet). No multi-user. No mobile app. OpenAI vision not fully wired. Reasoning models (o1/o3, extended thinking) may have edge cases. Fact supersession is LLM-judged and intentionally conservative. See the full limitations list.
Contributing
Forks, PRs, bug reports, ideas, all welcome. See CONTRIBUTING.md for the dev setup.
License
Apache 2.0. See LICENSE and NOTICE. Use it, modify it, fork it, ship it commercially.
Status
v0.1. It works. I use it every day. There's no CI, no error monitoring, no SLA. If you want to use it as your daily AI tool, fork it, make it yours, and expect to read the code if something breaks. That's the deal.