JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 37
  • Score
    100M100P100Q85297F
  • License Apache-2.0

Private, local-first AI chatbot with persistent working memory. One command install via npx.

Package Exports

    This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (recallmem) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

    Readme

    RecallMEM

    Persistent personal AI that actually remembers you.

    LLMs like ChatGPT, Claude.ai, and Gemini tend to forget you the moment you end your session. RecallMEM doesn't. It builds a profile of who you are, extracts facts after every conversation, and runs vector search across your entire history to find relevant context. By the time you've used it for a week, it knows you better than any AI ever will.

    Use it with Claude or OpenAI for fast responses and the best models (~5 minute setup). Or run everything locally with Gemma 4 for 100% privacy. You'll get the same memory framework either way. Your call.

    RecallMEM chat UI showing the AI remembering the user's name across conversations

    Two chats. Different sessions. The AI remembers.


    What is this

    A personal AI chatbot with REAL memory. Plug in any LLM you want and RecallMEM gives it persistent memory of who you are, what you've talked about, and what's currently true vs historical.

    The best part is that the LLM will never touch your memory in the database. Every retrieval is deterministic SQL + cosine similarity, assembled by TypeScript before the LLM ever sees it. The LLM only proposes new facts; a TypeScript validator decides what gets stored. Facts have timestamps and get auto-retired when you contradict them ("works at Acme" → "left Acme"). Deep dive on the architecture →

    You can run it three ways:

    • Cloud LLMs (recommended for most people). Add a Claude or OpenAI API key in Settings. Fast, smart, works on any computer. Your memory still stays local in your own Postgres database. Only the chat messages go to the provider.
    • Local LLMs (recommended for privacy). Run Gemma 4 via Ollama. Nothing leaves your machine, ever. Slower setup (~18 GB model download) and slower responses, but truly air-gappable.
    • Both. Use cloud for daily chat, switch to local for the sensitive stuff. The model dropdown lets you pick per-conversation.

    Features

    • Three-layer memory across every chat: synthesized profile, extracted facts table, and vector search over all past conversations
    • Temporal awareness so the model knows what's current vs. historical. Auto-retires stale facts when the truth changes.
    • Live fact extraction after every assistant reply, not just when the chat ends
    • Memory inspector where you can view, edit, or delete every fact
    • Vector search across past conversations with dated recall
    • Custom rules for how you want the AI to talk to you
    • File uploads (images, PDFs, code). Gemma 4 handles vision natively.
    • Web search when using Anthropic or Ollama (via Brave Search)
    • Wipe memory unrecoverably with DELETE + VACUUM FULL + CHECKPOINT
    • Bring any LLM. Ollama, Anthropic, OpenAI, or any OpenAI-compatible API.

    Quick start (Mac)

    Two options. Pick whichever fits your priority.

    Option A: Cloud LLM (Claude or OpenAI) — fastest, ~5 minutes

    You need Node.js 20+ and Homebrew. Then:

    npx recallmem

    The installer sets up Postgres, pgvector, and Ollama (for the embedding model that powers memory). When the browser opens to localhost:3000:

    1. Click Settings in the top right
    2. Click Providers
    3. Add your Claude or OpenAI API key
    4. Pick that model from the dropdown in the chat header
    5. Start chatting

    Total time: ~5 minutes. The AI remembers everything across every chat. Your memory stays in your local Postgres database. Only the chat messages go to the cloud provider.

    Option B: Local Gemma 4 — 100% private, ~15-45 minutes

    Same npx recallmem command. When the app opens, click Settings → Manage models and download one of these:

    • Gemma 4 E4B (4 GB, ~5 minute download) — fastest to test
    • Gemma 4 26B (18 GB, ~20-30 minute download) — recommended for daily use
    • Gemma 4 31B (19 GB, slower, best quality)

    Then pick that model from the dropdown and chat. Nothing leaves your machine.

    Linux (not officially supported, manual install)

    Auto-install isn't wired up for Linux. You'll need to install everything by hand:

    # Postgres + pgvector (apt example)
    sudo apt install postgresql-17 postgresql-17-pgvector
    sudo systemctl start postgresql
    
    # Ollama
    curl -fsSL https://ollama.com/install.sh | sh
    sudo systemctl start ollama
    ollama pull embeddinggemma
    ollama pull gemma4:26b
    
    # Run
    npx recallmem
    Windows (not supported, use WSL2)

    Native Windows is not supported. Use WSL2 with Ubuntu and follow the Linux steps above inside WSL.

    CLI commands

    npx recallmem            # Setup if needed, then start the app
    npx recallmem init       # Setup only (deps, DB, models, env)
    npx recallmem start      # Start the server (assumes setup done)
    npx recallmem doctor     # Check what's missing or broken
    npx recallmem upgrade    # Pull latest code, run pending migrations
    npx recallmem version    # Print version

    Privacy

    If you only use Ollama, nothing leaves your machine, ever. You can air-gap the computer and it keeps working. If you add a cloud provider, only the chat messages and your assembled system prompt go to that provider's servers. Your database, embeddings, and saved API keys stay local.

    For developers

    Underneath the chat UI, RecallMEM is a deterministic memory framework you can fork and use in your own AI app. The whole lib/ folder is intentionally framework-shaped.

    lib/
    ├── memory.ts        Memory orchestrator (profile + facts + vector recall in parallel)
    ├── prompts.ts       System prompt assembly with all memory context
    ├── facts.ts         Fact extraction (LLM proposes) + validation (TypeScript decides)
    ├── profile.ts       Synthesizes a structured profile from active facts
    ├── chunks.ts        Transcript splitting, embedding, vector search
    ├── chats.ts         Chat CRUD + transcript serialization
    ├── post-chat.ts     Post-chat pipeline (title, facts, profile rebuild, embed)
    ├── rules.ts         Custom user rules / instructions
    ├── embeddings.ts    EmbeddingGemma calls via Ollama
    ├── llm.ts           LLM router (Ollama, Anthropic, OpenAI, OpenAI-compatible)
    └── db.ts            Postgres pool + configurable user ID resolver

    Wire in your own auth with two calls at startup and every lib function respects it. See the developer docs for embedding the memory layer into your own app, the database schema, testing, and optional Langfuse observability.

    Docs

    Doc What's in it
    Architecture deep dive How deterministic memory works, read/write paths, validation pipeline, why the LLM is not in charge
    Developer guide Embedding the memory framework, auth wiring, schema, testing, Langfuse setup
    Hardware guide Which model fits which machine, RAM requirements, cloud vs. local tradeoffs
    Troubleshooting Every gotcha I've hit and how to fix it
    Manual install Step-by-step if you don't want to use the CLI

    Limitations (v0.1)

    Text only (no voice yet). No multi-user. No mobile app. OpenAI vision not fully wired. Reasoning models (o1/o3, extended thinking) may have edge cases. Fact supersession is LLM-judged and intentionally conservative. See the full limitations list.

    Contributing

    Forks, PRs, bug reports, ideas, all welcome. See CONTRIBUTING.md for the dev setup.

    License

    Apache 2.0. See LICENSE and NOTICE. Use it, modify it, fork it, ship it commercially.

    Status

    v0.1.2. It works. I use it every day.

    I built RecallMEM because I wanted an AI that actually knows me. Not because I'm paranoid about privacy (though that's a nice bonus). The chat models you use today forget you the second you close the tab and that drives me crazy. So I fixed it.

    There's no CI, no error monitoring, no SLA. If you want to use it as your daily AI tool, fork it, make it yours, and expect to read the code if something breaks. That's the deal. If this is useful to you, that's cool. If not, no hard feelings.

    github.com/RealChrisSean/RecallMEM