JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 100
  • Score
    100M100P100Q89569F
  • License MIT

just-bash plugin: LLM-maintained wiki with semantic search. Implements Karpathy's LLM Wiki pattern over just-bash-data.

Package Exports

  • just-bash-wiki

Readme

just-bash-wiki

A just-bash plugin that implements Karpathy's LLM Wiki pattern — a persistent, LLM-maintained knowledge base with semantic search.

Built on top of just-bash-data (db + vec).

Install

npm install just-bash-wiki

Peer dependency: just-bash >= 2.14.0

Quick Start

import { Bash, InMemoryFs } from "just-bash";
import { createWikiPlugin } from "just-bash-wiki";

const bash = new Bash({
  fs: new InMemoryFs({}),
  customCommands: createWikiPlugin({ rootDir: "/wiki" }),
});

// Initialize wiki (creates collections + vector indexes)
await bash.exec(`wiki init --dim=1536`);

// Add a source
await bash.exec(`wiki source add '{"title":"AI Overview","type":"article","content":"..."}'`);

// Create pages
await bash.exec(`wiki page create '{"slug":"ai","title":"Artificial Intelligence","type":"concept","content":"# AI\\n...","tags":["ai"],"links_to":["ml"]}'`);

// Store embeddings for semantic search
await bash.exec(`wiki embed page ai '[0.1, 0.2, ...]'`);

// Search by vector similarity
await bash.exec(`wiki search '[0.1, 0.2, ...]' --k=5`);

// Health check
await bash.exec(`wiki lint`);

How It Works

The LLM uses wiki as a bash tool to maintain a structured knowledge base. The pattern has three layers:

Layer Storage Description
Sources db sources Immutable raw documents (articles, papers, notes). The LLM reads them but never modifies them.
Pages db pages + vec page_embeddings LLM-generated wiki pages with content, cross-references, and vector embeddings. The LLM writes and maintains these.
Log db log Chronological record of all wiki operations.

Three core operations:

  • Ingest — add a source, extract information, create/update wiki pages, store embeddings
  • Query — semantic search across pages, read relevant content, synthesize answers
  • Lint — find orphan pages, broken links, missing embeddings, unreferenced sources

API

createWikiPlugin(opts?)

Returns an array of Command objects (includes db, vec, and wiki commands).

interface WikiOptions {
  rootDir?: string;        // Data directory (default: "/wiki")
  encryptionKey?: string;  // AES-256-GCM encryption key
  authSecret?: string;     // JWT signing secret
  salt?: string;           // PBKDF2 salt prefix
  embeddingDim?: number;   // Vector dimension (default: 1536)
  metric?: "cosine" | "euclidean" | "dot";  // default: "cosine"
  quantize?: "float32" | "int8";            // default: "float32"
  logMaxEntries?: number;                   // cap on db log; auto-trims past 1.5×
}

Exported types

The package exports TypeScript types for all data structures:

import type { Page, Source, LogEntry, LintIssue, LintResult } from "just-bash-wiki";

Command Reference

Initialization

wiki init [--dim=1536] [--metric=cosine] [--quantize=float32]

Creates all collections and indexes. Safe to call multiple times — skips existing collections.

Sources

# Add a source document
wiki source add '{"title":"...","type":"article","content":"...","url":"...","author":"..."}'

# List sources (optionally filter, paginate)
wiki source list [--type=article] [--status=raw] [--limit=50] [--offset=0]

# Get a source by ID
wiki source get <id>

# Count sources
wiki source count

# Update a source (MongoDB-style update operators)
wiki source update <id> '{"$set":{"status":"processed"}}'

# Delete a source (also removes its embedding)
wiki source delete <id>

Source fields: _id (auto-generated), title (required), type, content, url, author, date. Auto-added: ingested_at, status (default: "raw").

Pages

# Create a page
wiki page create '{"slug":"ai","title":"Artificial Intelligence","type":"concept","content":"# AI\n...","tags":["ai","tech"],"links_to":["ml","neural-nets"],"source_ids":["src-id-1"]}'

# Update a page (MongoDB-style update operators)
wiki page update <slug> '{"$set":{"content":"...","tags":["ai","updated"]}}'

# Get a page by slug
wiki page get <slug>

# List pages with optional filters and pagination
wiki page list [--type=concept] [--tag=ai] [--status=draft] [--limit=50] [--offset=0]

# Delete a page (cleans up cross-references and embeddings)
wiki page delete <slug>

# Rename a page (updates all cross-references and re-keys embedding)
wiki page rename <old-slug> <new-slug>

# Find pages with no inbound links (paginated)
wiki page orphans [--limit=50] [--offset=0]

Page fields: _id (auto-generated), slug (required, unique), title (required), type (defaults to "concept"), content (defaults to ""), tags (defaults to []), links_to (defaults to []), source_ids (defaults to []), status (optional free-form lifecycle marker, e.g. "draft" / "published"; not set automatically — callers manage it; filterable via --status). Auto-managed: linked_from, created_at, updated_at.

Slug format: must match ^[a-z0-9][a-z0-9_-]*$ — lowercase alphanumeric, hyphens, and underscores. Must start with a letter or digit.

Page types: entity, concept, source-summary, comparison, synthesis, overview, index (or any custom type).

Embeddings

# Store/update a page embedding (with optional metadata)
wiki embed page <slug> '[0.1, 0.2, ...]' [--meta='{"key":"value"}']

# Store/update a source embedding
wiki embed source <id> '[0.1, 0.2, ...]' [--meta='{"key":"value"}']

Embeddings are stored in vector collections (page_embeddings, source_embeddings) for semantic search. The embedding dimension must match what was set in wiki init --dim=N. The optional --meta JSON is attached to the vector record and returned by vec get/wiki search; useful for round-tripping titles, model identifiers, or chunk indices alongside the vector. Position-independent — --meta=... may appear before or after the vector argument.

# Search pages by vector similarity
wiki search '[0.1, 0.2, ...]' --k=10

# Search sources only
wiki search '[0.1, 0.2, ...]' --k=5 --type=sources

# Search across pages AND sources
wiki search '[0.1, 0.2, ...]' --k=10 --type=all

Returns results sorted by similarity using the metric configured in wiki init --metric=<cosine|euclidean|dot> (default: cosine). Each hit includes the score and any metadata stored alongside the vector via wiki embed --meta=....

Lint

wiki lint

Runs a health check and reports issues:

Issue Severity Description
orphan warning Page has no inbound links
broken-link error Page links to non-existent slug
empty-content warning Page's content is missing, null, or "" (pure-whitespace content is not flagged — see note below)
no-tags info Page has no tags
no-sources info Page has no source references
missing-embeddings warning Some pages lack vector embeddings
unreferenced-source info Source not referenced by any page

Note on empty-content: since v1.2.0 lint stops projecting the full content field on every page (it can be MBs each) and runs an $exists: false / null / "" query instead. Pure-whitespace content (" \n") is therefore not detected — accepted tradeoff for capping the lint payload at metadata size. If you need that check, run a one-shot db pages aggregate query that materialises content for the few pages you care about.

Log

# View recent log entries
wiki log [--last=20] [--type=ingest]

# Add a custom log entry
wiki log add '{"type":"note","summary":"Started research on topic X"}'

# Trim the log to the N most recent entries (older ones are deleted)
wiki log trim --keep=1000

All wiki operations are automatically logged with timestamps. For long-running agents, set WikiOptions.logMaxEntries to enable opportunistic auto-trim — the plugin samples the log size every 16 commands and trims back to the cap when the count exceeds 1.5× the cap. The log size therefore oscillates between cap (just after a trim) and approximately 1.5 × cap + 16 (just before the next sample fires); pick a cap of ≤ ⅔ of the largest acceptable log size. wiki log trim --keep=N is always available for explicit trims regardless of the option.

Stats

wiki stats

Returns a comprehensive overview: page/source/log counts, pages grouped by type, vector collection stats, and recent activity.

Index

wiki index [--limit=N] [--offset=M]
wiki index --rebuild

Default: returns pages grouped by type with their slugs, titles, tags, and last update timestamps. --limit / --offset paginate this view.

--rebuild re-derives all linked_from arrays from the links_to graph and ignores pagination flags (correctness over scalability).

Direct Access to db and vec

The wiki plugin includes all just-bash-data commands. You can use db and vec directly for advanced operations:

# Aggregation: find most-linked pages
db pages aggregate '[{"$unwind":"$linked_from"},{"$group":{"_id":"$slug","inbound":{"$sum":1}}},{"$sort":{"inbound":-1}}]'

# Complex queries
db pages find '{"$or":[{"tags":{"$contains":"ai"}},{"type":"overview"}]}'

# Cross-collection vector search
vec search-across "page_embeddings,source_embeddings" '[0.1,...]' --k=10

Example: LLM Agent Workflow

A typical LLM agent session using just-bash-wiki:

1. Agent receives a new article to process
2. wiki source add '{"title":"...","content":"..."}'
3. Agent reads the source content and extracts key entities/concepts
4. wiki page create '{"slug":"entity-name",...}'  (for each entity)
5. wiki page update existing-page '{"$set":{"content":"updated..."}}'  (update related pages)
6. wiki embed page entity-name '[...]'  (store embeddings)
7. wiki lint  (check for issues)
8. wiki stats  (verify state)

When answering questions:

1. Generate embedding for the question
2. wiki search '[...]' --k=5  (find relevant pages)
3. wiki page get relevant-slug  (read full content)
4. Synthesize answer from page content
5. Optionally: wiki page create '{"slug":"analysis-topic",...}'  (save the answer as a new page)

Architecture

┌─────────────────────────────────────────┐
│              LLM Agent                  │
│         (generates bash commands)       │
└─────────────┬───────────────────────────┘
              │ tool_use
┌─────────────▼───────────────────────────┐
│            just-bash                    │
│      (sandboxed bash interpreter)       │
├─────────────────────────────────────────┤
│          just-bash-wiki                 │
│    ┌──────────┬──────────┬────────┐     │
│    │  wiki    │   db     │  vec   │     │
│    │ command  │ command  │command │     │
│    └────┬─────┴────┬─────┴───┬────┘     │
│         │          │         │          │
│    ┌────▼──────────▼─────────▼────┐     │
│    │      just-bash-data          │     │
│    │  ┌──────────┐ ┌───────────┐  │     │
│    │  │ DocStore │ │VectorStore│  │     │
│    │  │(js-doc-  │ │(js-vector-│  │     │
│    │  │  store)  │ │  store)   │  │     │
│    │  └──────────┘ └───────────┘  │     │
│    └──────────────────────────────┘     │
├─────────────────────────────────────────┤
│         InMemoryFs / ReadWriteFs        │
│          (virtual filesystem)           │
└─────────────────────────────────────────┘

Collections Created by wiki init

Collection Type Description
sources db Raw source documents (indexed on title unique)
pages db Wiki pages (indexed on slug unique, type)
log db Operation log
page_embeddings vec Page vector embeddings
source_embeddings vec Source vector embeddings

Validation

npm test runs the unit suite (86 tests, in-memory). For an end-to-end run against real embeddings, npm run e2e exercises the full plugin surface against Cloudflare Workers AI (@cf/baai/bge-base-en-v1.5, 768 dim by default).

The script auto-detects credentials in this order:

  1. CLOUDFLARE_API_TOKEN + CLOUDFLARE_ACCOUNT_ID env vars (preferred).
  2. ~/.wrangler/config/default.toml OAuth token + WRANGLER_ACCOUNT_ID.

If neither is available it exits 0 with a "skipped" notice, so the script is safe to wire into CI without making the build red on credential-less environments. Override the model with E2E_MODEL=… and E2E_DIM=….

License

MIT