Package Exports
- just-bash-wiki
Readme
just-bash-wiki
A just-bash plugin that implements Karpathy's LLM Wiki pattern — a persistent, LLM-maintained knowledge base with semantic search.
Built on top of just-bash-data (db + vec).
Install
npm install just-bash-wikiPeer dependency: just-bash >= 2.14.0
Quick Start
import { Bash, InMemoryFs } from "just-bash";
import { createWikiPlugin } from "just-bash-wiki";
const bash = new Bash({
fs: new InMemoryFs({}),
customCommands: createWikiPlugin({ rootDir: "/wiki" }),
});
// Initialize wiki (creates collections + vector indexes)
await bash.exec(`wiki init --dim=1536`);
// Add a source
await bash.exec(`wiki source add '{"title":"AI Overview","type":"article","content":"..."}'`);
// Create pages
await bash.exec(`wiki page create '{"slug":"ai","title":"Artificial Intelligence","type":"concept","content":"# AI\\n...","tags":["ai"],"links_to":["ml"]}'`);
// Store embeddings for semantic search
await bash.exec(`wiki embed page ai '[0.1, 0.2, ...]'`);
// Search by vector similarity
await bash.exec(`wiki search '[0.1, 0.2, ...]' --k=5`);
// Health check
await bash.exec(`wiki lint`);How It Works
The LLM uses wiki as a bash tool to maintain a structured knowledge base. The pattern has three layers:
| Layer | Storage | Description |
|---|---|---|
| Sources | db sources |
Immutable raw documents (articles, papers, notes). The LLM reads them but never modifies them. |
| Pages | db pages + vec page_embeddings |
LLM-generated wiki pages with content, cross-references, and vector embeddings. The LLM writes and maintains these. |
| Log | db log |
Chronological record of all wiki operations. |
Three core operations:
- Ingest — add a source, extract information, create/update wiki pages, store embeddings
- Query — semantic search across pages, read relevant content, synthesize answers
- Lint — find orphan pages, broken links, missing embeddings, unreferenced sources
API
createWikiPlugin(opts?)
Returns an array of Command objects (includes db, vec, and wiki commands).
interface WikiOptions {
rootDir?: string; // Data directory (default: "/wiki")
encryptionKey?: string; // AES-256-GCM encryption key
authSecret?: string; // JWT signing secret
salt?: string; // PBKDF2 salt prefix
embeddingDim?: number; // Vector dimension (default: 1536)
metric?: "cosine" | "euclidean" | "dot"; // default: "cosine"
quantize?: "float32" | "int8"; // default: "float32"
logMaxEntries?: number; // cap on db log; auto-trims past 1.5×
}Exported types
The package exports TypeScript types for all data structures:
import type { Page, Source, LogEntry, LintIssue, LintResult } from "just-bash-wiki";Command Reference
Initialization
wiki init [--dim=1536] [--metric=cosine] [--quantize=float32]Creates all collections and indexes. Safe to call multiple times — skips existing collections.
Sources
# Add a source document
wiki source add '{"title":"...","type":"article","content":"...","url":"...","author":"..."}'
# List sources (optionally filter, paginate)
wiki source list [--type=article] [--status=raw] [--limit=50] [--offset=0]
# Get a source by ID
wiki source get <id>
# Count sources
wiki source count
# Update a source (MongoDB-style update operators)
wiki source update <id> '{"$set":{"status":"processed"}}'
# Delete a source (also removes its embedding)
wiki source delete <id>Source fields: _id (auto-generated), title (required), type, content, url, author, date. Auto-added: ingested_at, status (default: "raw").
Pages
# Create a page
wiki page create '{"slug":"ai","title":"Artificial Intelligence","type":"concept","content":"# AI\n...","tags":["ai","tech"],"links_to":["ml","neural-nets"],"source_ids":["src-id-1"]}'
# Update a page (MongoDB-style update operators)
wiki page update <slug> '{"$set":{"content":"...","tags":["ai","updated"]}}'
# Get a page by slug
wiki page get <slug>
# List pages with optional filters and pagination
wiki page list [--type=concept] [--tag=ai] [--status=draft] [--limit=50] [--offset=0]
# Delete a page (cleans up cross-references and embeddings)
wiki page delete <slug>
# Rename a page (updates all cross-references and re-keys embedding)
wiki page rename <old-slug> <new-slug>
# Find pages with no inbound links (paginated)
wiki page orphans [--limit=50] [--offset=0]Page fields: _id (auto-generated), slug (required, unique), title (required), type (defaults to "concept"), content (defaults to ""), tags (defaults to []), links_to (defaults to []), source_ids (defaults to []), status (optional free-form lifecycle marker, e.g. "draft" / "published"; not set automatically — callers manage it; filterable via --status). Auto-managed: linked_from, created_at, updated_at.
Slug format: must match ^[a-z0-9][a-z0-9_-]*$ — lowercase alphanumeric, hyphens, and underscores. Must start with a letter or digit.
Page types: entity, concept, source-summary, comparison, synthesis, overview, index (or any custom type).
Embeddings
# Store/update a page embedding (with optional metadata)
wiki embed page <slug> '[0.1, 0.2, ...]' [--meta='{"key":"value"}']
# Store/update a source embedding
wiki embed source <id> '[0.1, 0.2, ...]' [--meta='{"key":"value"}']Embeddings are stored in vector collections (page_embeddings, source_embeddings) for semantic search. The embedding dimension must match what was set in wiki init --dim=N. The optional --meta JSON is attached to the vector record and returned by vec get/wiki search; useful for round-tripping titles, model identifiers, or chunk indices alongside the vector. Position-independent — --meta=... may appear before or after the vector argument.
Search
# Search pages by vector similarity
wiki search '[0.1, 0.2, ...]' --k=10
# Search sources only
wiki search '[0.1, 0.2, ...]' --k=5 --type=sources
# Search across pages AND sources
wiki search '[0.1, 0.2, ...]' --k=10 --type=allReturns results sorted by similarity using the metric configured in wiki init --metric=<cosine|euclidean|dot> (default: cosine). Each hit includes the score and any metadata stored alongside the vector via wiki embed --meta=....
Lint
wiki lintRuns a health check and reports issues:
| Issue | Severity | Description |
|---|---|---|
orphan |
warning | Page has no inbound links |
broken-link |
error | Page links to non-existent slug |
empty-content |
warning | Page's content is missing, null, or "" (pure-whitespace content is not flagged — see note below) |
no-tags |
info | Page has no tags |
no-sources |
info | Page has no source references |
missing-embeddings |
warning | Some pages lack vector embeddings |
unreferenced-source |
info | Source not referenced by any page |
Note on
empty-content: since v1.2.0 lint stops projecting the fullcontentfield on every page (it can be MBs each) and runs an$exists: false/null/""query instead. Pure-whitespace content (" \n") is therefore not detected — accepted tradeoff for capping the lint payload at metadata size. If you need that check, run a one-shotdb pages aggregatequery that materialisescontentfor the few pages you care about.
Log
# View recent log entries
wiki log [--last=20] [--type=ingest]
# Add a custom log entry
wiki log add '{"type":"note","summary":"Started research on topic X"}'
# Trim the log to the N most recent entries (older ones are deleted)
wiki log trim --keep=1000All wiki operations are automatically logged with timestamps. For long-running
agents, set WikiOptions.logMaxEntries to enable opportunistic auto-trim — the
plugin samples the log size every 16 commands and trims back to the cap when
the count exceeds 1.5× the cap. The log size therefore oscillates between
cap (just after a trim) and approximately 1.5 × cap + 16 (just before the
next sample fires); pick a cap of ≤ ⅔ of the largest acceptable log size.
wiki log trim --keep=N is always available for explicit trims regardless of
the option.
Stats
wiki statsReturns a comprehensive overview: page/source/log counts, pages grouped by type, vector collection stats, and recent activity.
Index
wiki index [--limit=N] [--offset=M]
wiki index --rebuildDefault: returns pages grouped by type with their slugs, titles, tags, and last update timestamps. --limit / --offset paginate this view.
--rebuild re-derives all linked_from arrays from the links_to graph and ignores pagination flags (correctness over scalability).
Direct Access to db and vec
The wiki plugin includes all just-bash-data commands. You can use db and vec directly for advanced operations:
# Aggregation: find most-linked pages
db pages aggregate '[{"$unwind":"$linked_from"},{"$group":{"_id":"$slug","inbound":{"$sum":1}}},{"$sort":{"inbound":-1}}]'
# Complex queries
db pages find '{"$or":[{"tags":{"$contains":"ai"}},{"type":"overview"}]}'
# Cross-collection vector search
vec search-across "page_embeddings,source_embeddings" '[0.1,...]' --k=10Example: LLM Agent Workflow
A typical LLM agent session using just-bash-wiki:
1. Agent receives a new article to process
2. wiki source add '{"title":"...","content":"..."}'
3. Agent reads the source content and extracts key entities/concepts
4. wiki page create '{"slug":"entity-name",...}' (for each entity)
5. wiki page update existing-page '{"$set":{"content":"updated..."}}' (update related pages)
6. wiki embed page entity-name '[...]' (store embeddings)
7. wiki lint (check for issues)
8. wiki stats (verify state)When answering questions:
1. Generate embedding for the question
2. wiki search '[...]' --k=5 (find relevant pages)
3. wiki page get relevant-slug (read full content)
4. Synthesize answer from page content
5. Optionally: wiki page create '{"slug":"analysis-topic",...}' (save the answer as a new page)Architecture
┌─────────────────────────────────────────┐
│ LLM Agent │
│ (generates bash commands) │
└─────────────┬───────────────────────────┘
│ tool_use
┌─────────────▼───────────────────────────┐
│ just-bash │
│ (sandboxed bash interpreter) │
├─────────────────────────────────────────┤
│ just-bash-wiki │
│ ┌──────────┬──────────┬────────┐ │
│ │ wiki │ db │ vec │ │
│ │ command │ command │command │ │
│ └────┬─────┴────┬─────┴───┬────┘ │
│ │ │ │ │
│ ┌────▼──────────▼─────────▼────┐ │
│ │ just-bash-data │ │
│ │ ┌──────────┐ ┌───────────┐ │ │
│ │ │ DocStore │ │VectorStore│ │ │
│ │ │(js-doc- │ │(js-vector-│ │ │
│ │ │ store) │ │ store) │ │ │
│ │ └──────────┘ └───────────┘ │ │
│ └──────────────────────────────┘ │
├─────────────────────────────────────────┤
│ InMemoryFs / ReadWriteFs │
│ (virtual filesystem) │
└─────────────────────────────────────────┘Collections Created by wiki init
| Collection | Type | Description |
|---|---|---|
sources |
db | Raw source documents (indexed on title unique) |
pages |
db | Wiki pages (indexed on slug unique, type) |
log |
db | Operation log |
page_embeddings |
vec | Page vector embeddings |
source_embeddings |
vec | Source vector embeddings |
Validation
npm test runs the unit suite (86 tests, in-memory). For an end-to-end run
against real embeddings, npm run e2e exercises the full plugin surface
against Cloudflare Workers AI
(@cf/baai/bge-base-en-v1.5, 768 dim by default).
The script auto-detects credentials in this order:
CLOUDFLARE_API_TOKEN+CLOUDFLARE_ACCOUNT_IDenv vars (preferred).~/.wrangler/config/default.tomlOAuth token +WRANGLER_ACCOUNT_ID.
If neither is available it exits 0 with a "skipped" notice, so the script is
safe to wire into CI without making the build red on credential-less
environments. Override the model with E2E_MODEL=… and E2E_DIM=….
License
MIT