Package Exports

just-bash-wiki

Readme

just-bash-wiki

A just-bash plugin that implements Karpathy's LLM Wiki pattern — a persistent, LLM-maintained knowledge base with semantic search.

Built on top of just-bash-data (db + vec).

Install

npm install just-bash-wiki

Peer dependency: just-bash >= 2.14.0

Quick Start

import { Bash, InMemoryFs } from "just-bash";
import { createWikiPlugin } from "just-bash-wiki";

const bash = new Bash({
  fs: new InMemoryFs({}),
  customCommands: createWikiPlugin({ rootDir: "/wiki" }),
});

// Initialize wiki (creates collections + vector indexes)
await bash.exec(`wiki init --dim=1536`);

// Add a source
await bash.exec(`wiki source add '{"title":"AI Overview","type":"article","content":"..."}'`);

// Create pages
await bash.exec(`wiki page create '{"slug":"ai","title":"Artificial Intelligence","type":"concept","content":"# AI\\n...","tags":["ai"],"links_to":["ml"]}'`);

// Store embeddings for semantic search
await bash.exec(`wiki embed page ai '[0.1, 0.2, ...]'`);

// Search by vector similarity
await bash.exec(`wiki search '[0.1, 0.2, ...]' --k=5`);

// Health check
await bash.exec(`wiki lint`);

How It Works

The LLM uses wiki as a bash tool to maintain a structured knowledge base. The pattern has three layers:

Layer	Storage	Description
Sources	`db sources`	Immutable raw documents (articles, papers, notes). The LLM reads them but never modifies them.
Pages	`db pages` + `vec page_embeddings`	LLM-generated wiki pages with content, cross-references, and vector embeddings. The LLM writes and maintains these.
Log	`db log`	Chronological record of all wiki operations.

Three core operations:

Ingest — add a source, extract information, create/update wiki pages, store embeddings
Query — semantic search across pages, read relevant content, synthesize answers
Lint — find orphan pages, broken links, missing embeddings, unreferenced sources

API

`createWikiPlugin(opts?)`

Returns an array of Command objects (includes db, vec, and wiki commands).

interface WikiOptions {
  rootDir?: string;        // Data directory (default: "/wiki")
  encryptionKey?: string;  // AES-256-GCM encryption key
  authSecret?: string;     // JWT signing secret
  salt?: string;           // PBKDF2 salt prefix
  embeddingDim?: number;   // Vector dimension (default: 1536)
  metric?: "cosine" | "euclidean" | "dot";  // default: "cosine"
  quantize?: "float32" | "int8";            // default: "float32"
  logMaxEntries?: number;                   // cap on db log; auto-trims past 1.5×
}

Exported types

The package exports TypeScript types for all data structures:

import type { Page, Source, LogEntry, LintIssue, LintResult } from "just-bash-wiki";

Command Reference

Initialization

wiki init [--dim=1536] [--metric=cosine] [--quantize=float32]

Creates all collections and indexes. Safe to call multiple times — skips existing collections.

Sources

# Add a source document
wiki source add '{"title":"...","type":"article","content":"...","url":"...","author":"..."}'

# List sources (optionally filter, paginate)
wiki source list [--type=article] [--status=raw] [--limit=50] [--offset=0]

# Get a source by ID
wiki source get <id>

# Count sources
wiki source count

# Update a source (MongoDB-style update operators)
wiki source update <id> '{"$set":{"status":"processed"}}'

# Delete a source (also removes its embedding)
wiki source delete <id>

Source fields: _id (auto-generated), title (required), type, content, url, author, date. Auto-added: ingested_at, status (default: "raw").

Pages

# Create a page
wiki page create '{"slug":"ai","title":"Artificial Intelligence","type":"concept","content":"# AI\n...","tags":["ai","tech"],"links_to":["ml","neural-nets"],"source_ids":["src-id-1"]}'

# Update a page (MongoDB-style update operators)
wiki page update <slug> '{"$set":{"content":"...","tags":["ai","updated"]}}'

# Get a page by slug
wiki page get <slug>

# List pages with optional filters and pagination
wiki page list [--type=concept] [--tag=ai] [--status=draft] [--limit=50] [--offset=0]

# Delete a page (cleans up cross-references and embeddings)
wiki page delete <slug>

# Rename a page (updates all cross-references and re-keys embedding)
wiki page rename <old-slug> <new-slug>

# Find pages with no inbound links (paginated)
wiki page orphans [--limit=50] [--offset=0]

Page fields: _id (auto-generated), slug (required, unique), title (required), type (defaults to "concept"), content (defaults to ""), tags (defaults to []), links_to (defaults to []), source_ids (defaults to []), status (optional free-form lifecycle marker, e.g. "draft" / "published"; not set automatically — callers manage it; filterable via --status). Auto-managed: linked_from, created_at, updated_at.

Slug format: must match ^[a-z0-9][a-z0-9_-]*$ — lowercase alphanumeric, hyphens, and underscores. Must start with a letter or digit.

Page types: entity, concept, source-summary, comparison, synthesis, overview, index (or any custom type).

Embeddings

# Store/update a page embedding (with optional metadata)
wiki embed page <slug> '[0.1, 0.2, ...]' [--meta='{"key":"value"}']

# Store/update a source embedding
wiki embed source <id> '[0.1, 0.2, ...]' [--meta='{"key":"value"}']

Embeddings are stored in vector collections (page_embeddings, source_embeddings) for semantic search. The embedding dimension must match what was set in wiki init --dim=N. The optional --meta JSON is attached to the vector record and returned by vec get/wiki search; useful for round-tripping titles, model identifiers, or chunk indices alongside the vector. Position-independent — --meta=... may appear before or after the vector argument.

Search

# Search pages by vector similarity
wiki search '[0.1, 0.2, ...]' --k=10

# Search sources only
wiki search '[0.1, 0.2, ...]' --k=5 --type=sources

# Search across pages AND sources
wiki search '[0.1, 0.2, ...]' --k=10 --type=all

Returns results sorted by similarity using the metric configured in wiki init --metric=<cosine|euclidean|dot> (default: cosine). Each hit includes the score and any metadata stored alongside the vector via wiki embed --meta=....

Lint

wiki lint

Runs a health check and reports issues:

Issue	Severity	Description
`orphan`	warning	Page has no inbound links
`broken-link`	error	Page links to non-existent slug
`empty-content`	warning	Page's `content` is missing, `null`, or `""` (pure-whitespace content is not flagged — see note below)
`no-tags`	info	Page has no tags
`no-sources`	info	Page has no source references
`missing-embeddings`	warning	Some pages lack vector embeddings
`unreferenced-source`	info	Source not referenced by any page

Note on empty-content: since v1.2.0 lint stops projecting the full content field on every page (it can be MBs each) and runs an $exists: false / null / "" query instead. Pure-whitespace content (" \n") is therefore not detected — accepted tradeoff for capping the lint payload at metadata size. If you need that check, run a one-shot db pages aggregate query that materialises content for the few pages you care about.

Log

# View recent log entries
wiki log [--last=20] [--type=ingest]

# Add a custom log entry
wiki log add '{"type":"note","summary":"Started research on topic X"}'

# Trim the log to the N most recent entries (older ones are deleted)
wiki log trim --keep=1000

All wiki operations are automatically logged with timestamps. For long-running agents, set WikiOptions.logMaxEntries to enable opportunistic auto-trim — the plugin samples the log size every 16 commands and trims back to the cap when the count exceeds 1.5× the cap. The log size therefore oscillates between cap (just after a trim) and approximately 1.5 × cap + 16 (just before the next sample fires); pick a cap of ≤ ⅔ of the largest acceptable log size. wiki log trim --keep=N is always available for explicit trims regardless of the option.

Stats

wiki stats

Returns a comprehensive overview: page/source/log counts, pages grouped by type, vector collection stats, and recent activity.

Index

wiki index [--limit=N] [--offset=M]
wiki index --rebuild

Default: returns pages grouped by type with their slugs, titles, tags, and last update timestamps. --limit / --offset paginate this view.

--rebuild re-derives all linked_from arrays from the links_to graph and ignores pagination flags (correctness over scalability).

Direct Access to db and vec

The wiki plugin includes all just-bash-data commands. You can use db and vec directly for advanced operations:

# Aggregation: find most-linked pages
db pages aggregate '[{"$unwind":"$linked_from"},{"$group":{"_id":"$slug","inbound":{"$sum":1}}},{"$sort":{"inbound":-1}}]'

# Complex queries
db pages find '{"$or":[{"tags":{"$contains":"ai"}},{"type":"overview"}]}'

# Cross-collection vector search
vec search-across "page_embeddings,source_embeddings" '[0.1,...]' --k=10

Example: LLM Agent Workflow

A typical LLM agent session using just-bash-wiki:

1. Agent receives a new article to process
2. wiki source add '{"title":"...","content":"..."}'
3. Agent reads the source content and extracts key entities/concepts
4. wiki page create '{"slug":"entity-name",...}'  (for each entity)
5. wiki page update existing-page '{"$set":{"content":"updated..."}}'  (update related pages)
6. wiki embed page entity-name '[...]'  (store embeddings)
7. wiki lint  (check for issues)
8. wiki stats  (verify state)

When answering questions:

1. Generate embedding for the question
2. wiki search '[...]' --k=5  (find relevant pages)
3. wiki page get relevant-slug  (read full content)
4. Synthesize answer from page content
5. Optionally: wiki page create '{"slug":"analysis-topic",...}'  (save the answer as a new page)

Architecture

┌─────────────────────────────────────────┐
│              LLM Agent                  │
│         (generates bash commands)       │
└─────────────┬───────────────────────────┘
              │ tool_use
┌─────────────▼───────────────────────────┐
│            just-bash                    │
│      (sandboxed bash interpreter)       │
├─────────────────────────────────────────┤
│          just-bash-wiki                 │
│    ┌──────────┬──────────┬────────┐     │
│    │  wiki    │   db     │  vec   │     │
│    │ command  │ command  │command │     │
│    └────┬─────┴────┬─────┴───┬────┘     │
│         │          │         │          │
│    ┌────▼──────────▼─────────▼────┐     │
│    │      just-bash-data          │     │
│    │  ┌──────────┐ ┌───────────┐  │     │
│    │  │ DocStore │ │VectorStore│  │     │
│    │  │(js-doc-  │ │(js-vector-│  │     │
│    │  │  store)  │ │  store)   │  │     │
│    │  └──────────┘ └───────────┘  │     │
│    └──────────────────────────────┘     │
├─────────────────────────────────────────┤
│         InMemoryFs / ReadWriteFs        │
│          (virtual filesystem)           │
└─────────────────────────────────────────┘

Collections Created by `wiki init`

Collection	Type	Description
`sources`	db	Raw source documents (indexed on `title` unique)
`pages`	db	Wiki pages (indexed on `slug` unique, `type`)
`log`	db	Operation log
`page_embeddings`	vec	Page vector embeddings
`source_embeddings`	vec	Source vector embeddings

Validation

npm test runs the unit suite (86 tests, in-memory). For an end-to-end run against real embeddings, npm run e2e exercises the full plugin surface against Cloudflare Workers AI (@cf/baai/bge-base-en-v1.5, 768 dim by default).

The script auto-detects credentials in this order:

CLOUDFLARE_API_TOKEN + CLOUDFLARE_ACCOUNT_ID env vars (preferred).
~/.wrangler/config/default.toml OAuth token + WRANGLER_ACCOUNT_ID.

If neither is available it exits 0 with a "skipped" notice, so the script is safe to wire into CI without making the build red on credential-less environments. Override the model with E2E_MODEL=… and E2E_DIM=….

License

MIT

just-bash-wiki

Package Exports

Readme

just-bash-wiki

Install

Quick Start

How It Works

API

createWikiPlugin(opts?)

Exported types

Command Reference

Initialization

Sources

Pages

Embeddings

Search

Lint

Log

Stats

Index

Direct Access to db and vec

Example: LLM Agent Workflow

Architecture

Collections Created by wiki init

Validation

License

`createWikiPlugin(opts?)`

Collections Created by `wiki init`