Package Exports
- @vinaes/succ
- @vinaes/succ/api
- @vinaes/succ/mcp
Readme
Semantic Understanding for Code Contexts
Quick Start • Features • Commands • Configuration • Docs
Persistent semantic memory for any MCP-compatible AI editor. Remember decisions, learn from mistakes, never lose context.
Works with
| Editor | Setup |
|---|---|
| Claude Code | succ init (auto-configured) |
| Cursor | succ setup cursor |
| Windsurf | succ setup windsurf |
| Continue.dev | succ setup continue |
| Codex | succ setup codex, then always launch via succ codex |
See Editor Guides for detailed setup.
Quick Start
npm install -g @vinaes/succcd your-project
succ init
succ index
succ index-code
succ analyzeThat's it. Claude Code now has persistent memory for your project.
Features
| Feature | Description |
|---|---|
| Hybrid Search | Semantic embeddings + BM25 keyword matching with cross-encoder reranking and AST symbol boost |
| AST Code Indexing | Tree-sitter parsing for 21 languages — 13 with full symbol extraction, 8 grammar-only |
| Code Scanning | Recursive code discovery and indexing via succ_index action="scan" with .succignore support |
| Brain Vault | Obsidian-compatible markdown knowledge base with hierarchical summaries |
| Persistent Memory | Decisions, learnings, patterns across sessions with auto-extraction |
| Cross-Project | Global memories shared between all projects; cross-repo search |
| Knowledge Graph | Directed graph with PPR, SCC, articulation points, bridge edges (code↔memory), LLM-enriched relations |
| Graph Algorithms | Personalized PageRank, Louvain communities, Dijkstra shortest path, betweenness centrality, co-change analysis |
| MCP Native | 15 consolidated tools — Claude uses succ tools directly |
| Reranker | ONNX cross-encoder (ms-marco-MiniLM-L6-v2) for search result post-processing |
| HyDE | Hypothetical Document Embeddings — LLM generates code snippets for NL queries to bridge embedding gap |
| Late Chunking | Long-context embedding with per-AST-chunk pooling for context-aware chunks |
| Web Search | Real-time web search via Perplexity Sonar (quick, quality, deep research) |
| Web Fetch | Fetch any URL as clean Markdown via md.succ.ai (Readability + Playwright) |
| LSP Integration | Language Server Protocol client for definition, references, hover queries |
| Working Memory | Priority scoring, validity filtering, diversity, pinned memories |
| Dynamic Hook Rules | Save memories that auto-fire as pre-tool rules — inject context, block, ask confirmation, or auto-approve permissions |
| File-Linked Memories | Link memories to files; auto-recalled when editing those files |
| Dead-End Tracking | Record failed approaches to prevent retrying |
| Debug Sessions | Structured debugging with hypothesis testing, 13-language instrumentation |
| Session Surgeon | Auto compact stats, trim tool content/thinking/images, manual compact with chain integrity |
| Observability | Search latency, embedding times, LLM call metrics; retrieval feedback loop |
| PRD Pipeline | Generate PRDs, parse into tasks, execute with quality gates |
| Team Mode | Parallel task execution with git worktrees |
| Multi-Backend Storage | SQLite, PostgreSQL, Qdrant — scale from laptop to cloud |
All features
- AST Code Indexing — Tree-sitter parsing for 21 languages (13 with full symbol extraction + 8 grammar-only); symbol-aware BM25 tokenization boosts function/class names in search results
- Web Search — Real-time search via Perplexity Sonar through OpenRouter (quick $1/MTok, quality $3-15/MTok, deep research); search history tracking with cost auditing
- PRD Pipeline — Generate PRDs from feature descriptions, parse into executable tasks, run with Claude Code agent, export workflow to Obsidian (Mermaid Gantt + dependency DAG)
- Team Mode — Parallel task execution using git worktrees; each worker gets an isolated checkout, results merge via cherry-pick
- Quality Gates — Auto-detected (TypeScript, Go, Python, Rust) or custom; run after each task to verify code quality
- Graph Algorithms — Personalized PageRank (PPR) retrieval, Tarjan's SCC, articulation points, Dijkstra shortest path, betweenness centrality, Louvain communities with LLM summaries, bridge edges (code↔memory graph)
- Cross-encoder Reranker — ONNX ms-marco-MiniLM-L6-v2 rescores (query, document) pairs; configurable weight, topK clamping, graceful degradation
- HyDE — Hypothetical Document Embeddings via LLM for natural language → code search; tree-sitter AST code detection
- Late Chunking — Long-context embedding (jina 8192 tokens) with per-AST-chunk pooling for context-aware embeddings
- Hierarchical Summaries (RAPTOR-style) — bottom-up LLM summarization at file → directory → module → repo zoom levels with query routing
- Code Scanning —
succ_index action="scan"recursively discovers and indexes code files via git ls-files / directory walk with .succignore, size filtering, symlink rejection - Observability — search latency, embedding times, LLM call metrics; retrieval feedback loop for ranking adjustment
- Auto-memory Extraction — session-end fact extraction via LLM with quality gate + periodic dimension-bucketed consolidation
- Cross-repo Search — search across multiple succ-indexed repositories
- Diff-brain Analysis — LLM-powered diff analysis for brain vault document changes
- LSP Integration — language server protocol client, installer, and server registry (Kotlin + Swift added)
- MCP Review Tool —
succ_reviewfor code review with blast-radius estimation - Co-change Analysis — git log mining to detect files frequently changed together
- Brain Vault Export — structured export with metadata
- API Versioning —
/v1/route prefix aliases for all daemon endpoints - Graph Enrichment — LLM-classified relations (implements, leads_to, contradicts...), contextual proximity, Label Propagation communities, degree centrality with recall boost
- Dead-End Tracking — Record failed approaches; auto-boosted in recall to prevent retrying
- AGENTS.md Auto-Export — Auto-generate editor instructions from decisions, patterns, dead-ends
- Learning Delta — Track knowledge growth per session (memories added, types, quality)
- Confidence Retention — Time-decay scoring with auto-cleanup of low-value memories
- Safe Consolidation — Soft-delete with undo support; no data loss on merge
- Skill Discovery — Auto-suggest relevant skills based on user prompt (opt-in, disabled by default)
- Skyll Integration — Access community skills from Skyll registry (requires skills.enabled = true)
- Soul Document — Define AI personality and values
- Dynamic Hook Rules — Memories tagged
hook-ruleauto-fire before matching tool calls; filter bytool:{Name}andmatch:{regex}tags;errortype blocks,patternasks confirmation,allowtype auto-approves permission dialogs (v2.1.63+), others inject as context - PermissionRequest — Auto-approve or deny Claude Code permission dialogs based on memory rules (requires Claude Code v2.1.63+)
- HTTP Hooks — Direct HTTP hooks to daemon (no process spawn) for faster, more reliable hook execution (requires Claude Code v2.1.63+, auto-detected at
succ init) - File-Linked Memories — Attach memories to files via
filesparameter; pre-tool hook auto-recalls related memories when editing those files - Auto-Hooks — Context injection at session start/end
- Idle Reflections — AI generates insights during idle time
- Session Context — Auto-generated briefings for next session
- Security Hardening — 3-tier prompt injection detection (structural + multilingual regex + embedding semantic), content sanitization for 13 entry points, Bell-LaPadula IFC with compartments, file operation guards, exfiltration detection, post-tool secret scanning
- LLM Guardrails — Optional Tier 3 LLM classification (Llama Guard, safeguard-20b) for sensitivity, code policy (OWASP SC2-SC7), and injection detection with LRU caching
- Sensitive Filter — Detect and redact PII, API keys, secrets
- Quality Scoring — Local ONNX classification to filter noise
- Token Savings — Track RAG efficiency vs full files
- Temporal Awareness — Time decay, validity periods, point-in-time queries
- Unified Daemon — Single background process for watch, analyze, idle tracking
- Watch Mode — Auto-reindex on file changes via @parcel/watcher
- Fast Analyze —
--fastmode with fewer agents and smaller context for quick onboarding - Incremental Analyze — Git-based change detection, skip unchanged agents
- Local LLM — Ollama, LM Studio, llama.cpp support
- Sleep Agent — Offload heavy operations to local LLM
- Checkpoints — Backup and restore full succ state
- AI-Readiness Score — Measure project readiness for AI collaboration
- Multiple LLM Backends — Local (Ollama), OpenRouter, or Claude CLI
- Storage Backends — SQLite (default), PostgreSQL + pgvector, Qdrant
- Data Migration — Export/import JSON, migrate between backends
Claude Code Agents
succ ships with 20 specialized agents in .claude/agents/ that run as subagents inside Claude Code:
| Agent | What it does |
|---|---|
succ-explore |
Codebase exploration powered by semantic search |
succ-plan |
TDD-enforced implementation planning with red-green-refactor cycles |
succ-code-reviewer |
Full code review with OWASP Top 10 checklist — works with any language |
succ-diff-reviewer |
Fast pre-commit diff review for security, bugs, and regressions |
succ-deep-search |
Cross-search memories, brain vault, and code |
succ-memory-curator |
Consolidate, deduplicate, and clean up memories |
succ-memory-health-monitor |
Detect decayed, stale, or low-quality memories |
succ-pattern-detective |
Surface recurring patterns and anti-patterns from sessions |
succ-session-handoff-orchestrator |
Extract summary and briefing at session end |
succ-session-reviewer |
Review past sessions, extract missed learnings |
succ-decision-auditor |
Find contradictions and reversals in architectural decisions |
succ-knowledge-indexer |
Index documentation and code into the knowledge base |
succ-knowledge-mapper |
Maintain knowledge graph, find orphaned memories |
succ-checkpoint-manager |
Create and manage state backups |
succ-context-optimizer |
Optimize what gets preloaded at session start |
succ-quality-improvement-coach |
Analyze memory quality, suggest improvements |
succ-readiness-improver |
Actionable steps to improve AI-readiness score |
succ-general |
General-purpose agent with semantic search, web search, and all tools |
succ-debug |
Structured debugging — hypothesize, instrument, reproduce, fix with dead-end tracking |
succ-style-tracker |
Track communication style changes, update soul.md and brain vault |
Agents are auto-discovered by Claude Code from .claude/agents/ and can be launched via the Task tool with subagent_type.
Commands
| Command | Description |
|---|---|
succ init |
Interactive setup wizard |
succ setup <editor> |
Configure MCP for any editor |
succ codex-chat |
Launch Codex chat with succ briefing/hooks |
succ analyze |
Generate brain vault with Claude agents |
succ index [path] |
Index files for semantic search |
succ scan-code [path] |
Recursive code discovery and indexing |
succ search <query> |
Semantic search in brain vault |
succ remember <content> |
Save to memory |
succ memories |
List and search memories |
succ watch |
Watch for changes and auto-reindex |
succ daemon <action> |
Manage unified daemon |
succ prd generate |
Generate PRD from feature description |
succ prd run |
Execute PRD tasks with quality gates |
succ session analyze |
Token breakdown by type and tool name |
succ session trim |
Trim tool content from session transcript |
succ session compact |
Manual compact with dialogue summary |
succ status |
Show index statistics |
All commands
| Command | Description |
|---|---|
succ index-code [path] |
Index source code (AST chunking via tree-sitter) |
succ index --memories |
Re-embed all memories with current embedding model |
succ reindex |
Detect and fix stale/deleted index entries |
succ chat <query> |
RAG chat with context |
succ train-bpe |
Train BPE vocabulary from indexed code |
succ forget |
Delete memories |
succ graph <action> |
Knowledge graph: stats, auto-link, enrich, proximity, communities, centrality |
succ consolidate |
Merge duplicate memories (soft-delete with undo) |
succ agents-md |
Generate .claude/AGENTS.md from memories |
succ progress |
Show learning delta history |
succ retention |
Memory retention analysis and cleanup |
succ soul |
Generate personalized soul.md |
succ config |
Interactive configuration |
succ stats |
Show token savings statistics |
succ checkpoint <action> |
Create, restore, or list checkpoints |
succ score |
Show AI-readiness score |
succ prd parse <file> |
Parse PRD markdown into tasks |
succ prd list |
List all PRDs |
succ prd status [id] |
Show PRD status and tasks |
succ prd archive [id] |
Archive a PRD |
succ prd export [id] |
Export PRD workflow to Obsidian (Mermaid diagrams) |
succ session trim |
Trim tool content from session (--tools, --only-inputs, --only-results) |
succ session trim-thinking |
Trim thinking blocks only |
succ session trim-all |
Trim all strippable content (tools, thinking, images) |
succ session compact |
Manual compact with dialogue summary and chain integrity |
succ clear |
Clear index and/or memories |
succ benchmark |
Run performance benchmarks |
succ migrate |
Migrate data between storage backends |
succ init
succ init # Interactive mode
succ init --yes # Non-interactive (defaults)
succ init --force # Reinitialize existing projectCreates .succ/ structure, configures MCP server, sets up hooks.
succ analyze
succ analyze # Run via Claude CLI (recommended)
succ analyze --fast # Fast mode (fewer agents, smaller context)
succ analyze --force # Force full re-analysis (skip incremental)
succ analyze --local # Use local LLM (Ollama, LM Studio)
succ analyze --openrouter # Use OpenRouter API
succ analyze --background # Run in backgroundGenerates brain vault structure:
.succ/brain/
├── CLAUDE.md # Navigation hub
├── project/ # Project knowledge
│ ├── technical/ # Architecture, API, Conventions
│ ├── systems/ # Core systems/modules
│ ├── strategy/ # Project goals
│ └── features/ # Implemented features
├── knowledge/ # Research notes
└── archive/ # Old/supersededsucc watch
succ watch # Start watch service (via daemon)
succ watch --ignore-code # Watch only docs
succ watch --status # Check watch service status
succ watch --stop # Stop watch servicesucc daemon
succ daemon status # Show daemon status
succ daemon sessions # List active Claude Code sessions
succ daemon start # Start daemon manually
succ daemon stop # Stop daemon
succ daemon logs # Show recent logssucc prd
succ prd generate "Add JWT authentication" # Generate PRD + parse tasks
succ prd run # Execute sequentially (default)
succ prd run --mode team # Execute in parallel (git worktrees)
succ prd run --mode team --concurrency 5 # Parallel with 5 workers
succ prd run --resume # Resume interrupted run
succ prd run --dry-run # Preview execution plan
succ prd status # Show latest PRD status
succ prd list # List all PRDs
succ prd export # Export latest PRD to Obsidian
succ prd export --all # Export all PRDs
succ prd export prd_abc123 # Export specific PRDTeam mode runs independent tasks in parallel using git worktrees for isolation. Each worker gets its own checkout; results merge via cherry-pick. Quality gates (typecheck, test, lint, build) run automatically after each task.
Export generates Obsidian-compatible markdown with Mermaid diagrams (Gantt timeline, dependency DAG), per-task detail pages with gate results, and wiki-links between pages. Output goes to .succ/brain/prd/.
Configuration
No API key required. Uses local embeddings by default.
{
"llm": {
"embeddings": {
"mode": "local",
"model": "Xenova/all-MiniLM-L6-v2"
}
},
"chunk_size": 500,
"chunk_overlap": 50
}Embedding modes
Local (default):
{
"llm": { "embeddings": { "mode": "local" } }
}Ollama (unified namespace):
{
"llm": {
"embeddings": {
"mode": "api",
"model": "nomic-embed-text",
"api_url": "http://localhost:11434/v1/embeddings"
}
}
}OpenRouter:
{
"embedding_mode": "openrouter",
"openrouter_api_key": "sk-or-..."
}MRL dimension override (Matryoshka models):
{
"llm": {
"embeddings": {
"mode": "api",
"model": "nomic-embed-text-v1.5",
"api_url": "http://localhost:11434/v1/embeddings",
"dimensions": 256
}
}
}GPU acceleration
succ uses native ONNX Runtime for embedding inference with automatic GPU detection:
| Platform | Backend | GPUs |
|---|---|---|
| Windows | DirectML | AMD, Intel, NVIDIA |
| Linux | CUDA | NVIDIA |
| macOS | CoreML | Apple Silicon |
| Fallback | CPU | Any |
GPU is enabled by default. No manual configuration needed — the best available backend is auto-detected.
{
"gpu_enabled": true,
"gpu_device": "directml"
}Set gpu_device to override auto-detection: cuda, directml, coreml, or cpu.
Idle watcher
{
"idle_watcher": {
"enabled": true,
"idle_minutes": 2,
"check_interval": 30,
"min_conversation_length": 5
}
}Pre-commit review
Automatically run the succ-diff-reviewer agent before every git commit to catch security issues, bugs, and regressions:
{
"preCommitReview": true
}When enabled, Claude will run a diff review before each commit. Critical findings block the commit; high findings trigger a warning.
Disabled by default. Set via succ_config(action="set", key="preCommitReview", value="true").
Autonomous mode (bypass security guards)
When running with --dangerously-skip-permissions, succ's security guards can block autonomous operations. Enable trustAgentPermissions to downgrade deny/ask to context warnings:
{
"security": {
"trustAgentPermissions": true
}
}Injection detection stays ON (protects the agent). See Security Hardening for details.
Sleep agent
Offload heavy operations to local LLM:
{
"idle_reflection": {
"sleep_agent": {
"enabled": true,
"mode": "local",
"model": "qwen2.5-coder:14b",
"api_url": "http://localhost:11434/v1"
}
}
}Storage backends
succ supports multiple storage backends for different deployment scenarios:
| Setup | Use Case | Requirements |
|---|---|---|
| SQLite + sqlite-vec | Local development (default) | None |
| PostgreSQL + pgvector | Production/cloud | PostgreSQL 15+ with pgvector |
| SQLite + Qdrant | Local + powerful vector search | Qdrant server |
| PostgreSQL + Qdrant | Full production scale | PostgreSQL + Qdrant |
Example: PostgreSQL + pgvector
{
"storage": {
"backend": "postgresql",
"postgresql": {
"connection_string": "postgresql://user:pass@localhost:5432/succ"
}
}
}Example: PostgreSQL + Qdrant
{
"storage": {
"backend": "postgresql",
"vector": "qdrant",
"postgresql": { "connection_string": "postgresql://..." },
"qdrant": { "url": "http://localhost:6333" }
}
}See Storage Configuration for all options.
LLM Backend Configuration
succ supports multiple LLM backends for operations like analyze, idle reflection, and skill suggestions:
{
"llm": {
"type": "local",
"model": "qwen2.5:7b",
"local": {
"endpoint": "http://localhost:11434/v1/chat/completions"
},
"openrouter": {
"model": "anthropic/claude-3-haiku"
}
}
}| Key | Values | Default | Description |
|---|---|---|---|
llm.type |
local / openrouter / claude |
local |
LLM provider |
llm.model |
string | per-type | Model name for the active type |
llm.transport |
process / ws / http |
auto | How to talk to the backend |
Transport auto-selects based on type: claude uses process (or ws for persistent WebSocket), local/openrouter use http.
WebSocket transport (transport: "ws") keeps a persistent connection to Claude CLI, avoiding process spawn overhead on repeated calls:
{
"llm": {
"type": "claude",
"model": "sonnet",
"transport": "ws"
}
}Per-backend model overrides for the fallback chain:
{
"llm": {
"type": "claude",
"model": "sonnet",
"transport": "ws",
"local": { "endpoint": "http://localhost:11434/v1/chat/completions", "model": "qwen2.5:7b" },
"openrouter": { "model": "anthropic/claude-3-haiku" }
}
}Claude backend usage
The claude backend integrates with an existing, locally running Claude Code session and is intended only for in-session developer assistance by the same user, including tasks such as file analysis, documentation, indexing, and session summarization.
It is not supported for unattended background processing, cloud deployments, or multi-user scenarios. For automated, long-running, or cloud workloads, use the local or openrouter backends instead.
Retention policies
{
"retention": {
"enabled": true,
"decay_rate": 0.01,
"access_weight": 0.1,
"keep_threshold": 0.3,
"delete_threshold": 0.15
}
}Hybrid Search
Combines semantic embeddings with BM25 keyword search. Code search includes AST symbol boost, regex post-filtering, and symbol type filtering (function, method, class, interface, type_alias). Three output modes: full (code blocks), lean (file+lines), signatures (symbol names only).
| Aspect | Documents | Code |
|---|---|---|
| Tokenizer | Markdown-aware + stemming | Naming convention splitter + AST symbol boost |
| Stemming | Yes | No |
| Stop words | Filtered | Kept |
| Segmentation | Standard | Ronin + BPE |
| Symbol metadata | N/A | function, class, interface names via tree-sitter |
Code tokenizer handles all naming conventions:
| Convention | Example | Tokens |
|---|---|---|
| camelCase | getUserName |
get, user, name |
| PascalCase | UserService |
user, service |
| snake_case | get_user_name |
get, user, name |
| SCREAMING_SNAKE | MAX_RETRY_COUNT |
max, retry, count |
Memory System
Local memory — stored in .succ/succ.db, project-specific.
Global memory — stored in ~/.succ/global.db, shared across projects.
succ remember "User prefers TypeScript" --global
succ memories --globalArchitecture
your-project/
├── .claude/
│ └── settings.json # Claude Code hooks config
└── .succ/
├── brain/ # Obsidian-compatible vault
├── hooks/ # Hook scripts
├── config.json # Project configuration
├── soul.md # AI personality
└── succ.db # Vector database
~/.succ/
├── global.db # Global memories
└── config.json # Global configurationDocumentation
- Configuration Reference — All config options with examples
- PRD Pipeline — Generate, execute, and verify tasks with quality gates
- Storage Backends — SQLite, PostgreSQL, Qdrant setup and benchmarks
- Benchmarks — Performance and accuracy metrics
- Temporal Awareness — Time decay, validity periods
- Ollama Setup — Recommended local LLM setup
- llama.cpp GPU — GPU-accelerated embeddings
- MCP Integration — Claude Code tools and resources
- Security Hardening — Injection detection, IFC, guardrails, content sanitization
- Troubleshooting — Common issues and fixes
- Development — Contributing and testing
License
FSL-1.1-Apache-2.0 — Free to use, modify, self-host. Commercial cloud hosting restricted until Apache 2.0 date.