JSPM

just-bash-harness

0.2.5
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 59
  • Score
    100M100P100Q76783F
  • License MIT

Single-agent harness on top of just-bash and the agent-skills ecosystem. Sandboxed tool execution, derived approval gates, persisted sessions, swappable LLM providers (Anthropic + Cloudflare Workers AI), cross-session memory, optional AES-256-GCM at rest.

Package Exports

  • just-bash-harness

Readme

just-bash-harness

Single-agent loop on top of just-bash and the agent-skills ecosystem. Sandboxed tool execution, derived approval gates, persisted sessions, swappable LLM providers.

Version: 0.2.2 · Status: v0 contract complete + packaged + CI'd + polished + applicable_when filter + cross-session memory + search/stats/export + compaction + AES-256-GCM at rest + retrieval bench + interactive REPL + chains + Hermes-style <tool_call> parser (works with Hermes 2 Pro alongside Granite/Gemma/Llama). 134/134 unit tests pass. End-to-end validated against real Gemma 4 26B and Hermes 2 Pro on Cloudflare Workers AI subscribing the public agent-skills-pack@v2.2.0. Distributable as a harness binary via npm run build.

What it is

A thin orchestrator (~1700 LOC TypeScript) that:

  • Runs a turn loop: prompt → tool calls → results → next turn → end.
  • Resolves tool calls to agent-skills subscribed in a local FileBank.
  • Executes each skill in runExec's per-skill sandboxed just-bash instance (FS scratch, network allowlist, env scoping — already provided by @rckflr/agent-skills-cli).
  • Categorizes each tool call as prohibited / explicit / regular derived from existing skill metadata (no new spec field) and applies the policy matrix.
  • Persists session state via db sessions/turns/approvals collections in a dedicated just-bash-data bank.
  • Speaks to Anthropic Messages API or Cloudflare Workers AI (default model: @cf/google/gemma-4-26b-a4b-it).

What it is not

  • A multi-agent orchestrator. Single agent only.
  • A multi-tenant deployment. Single user assumption is intact.
  • A sandbox for untrusted user code. The user is trusted; the LLM and skills are not (see DESIGN.md §2).
  • A web UI. CLI / TTY only.

Quickstart

git clone <this-repo> harness
cd harness
npm install
npm run build                                         # → dist/cli.js, dist/index.js

# Pick a provider via env (auto-detected). Either of these works:
export CF_ACCOUNT_ID=...   CF_API_TOKEN=...           # → Gemma 4 26B
export ANTHROPIC_API_KEY=...                          # → claude-opus-4-7

# Optional: real semantic retrieval (else stub embedder)
export OLLAMA_MODEL=nomic-embed-text                  # or OPENAI_*, CF_*

# Subscribe a skill pack (signed git tag enforced by default)
node dist/cli.js skills add github.com/foo/bar@v1.0.0

# Run a chat turn
SID=$(node dist/cli.js new)
echo "say hi using the available tools" | node dist/cli.js chat "$SID"

# Resume later
node dist/cli.js resume "$SID"

Install globally

npm link                       # makes `harness` available on PATH
harness --help

Or run directly through tsx during development without building:

npx tsx src/cli.ts --help

Architecture in one diagram

┌──────────────────────────────────────────┐
│  cli (TTY)                               │  user-facing
├──────────────────────────────────────────┤
│  loop                                    │  turn protocol
├──────────────────────────────────────────┤
│  provider   approval   session   policy  │  cross-cutting
├──────────────────────────────────────────┤
│  toolbox  ←  FileBank + runQuery/runExec │  skill resolution + execution
└──────────────────────────────────────────┘
                 │
                 ▼
   agent-skills-cli (handles per-skill sandbox)
                 │
                 ▼
            just-bash + just-bash-data

There is no Sandbox layer of our own. runExec already builds a per-skill sandboxed just-bash instance with the skill's declared network / filesystem / required_env constraints from the spec. Re-implementing that here would diverge from the canonical enforcement.

See DESIGN.md for full layer contracts and DESIGN.md §4 for the turn protocol.

Providers

Two LLM providers ship today; the factory auto-detects from env:

Provider Default model Required env
Anthropic Messages API claude-opus-4-7 ANTHROPIC_API_KEY
Cloudflare Workers AI @cf/google/gemma-4-26b-a4b-it CF_ACCOUNT_ID, CF_API_TOKEN

Auto-detect prefers Cloudflare when both sets of credentials are present. Override via HARNESS_PROVIDER=anthropic|cloudflare. Override the model via --model <id> flag, HARNESS_DEFAULT_MODEL (Anthropic), or CF_LLM_MODEL (Cloudflare).

See PROVIDERS.md for adding a new provider.

Approval categories — derived, not declared

The harness derives a category for every tool call from existing spec fields:

Signal Effect
provenance.signature_status !== "valid" while policy.signature.require_signed: true prohibited (hard deny)
network[] non-empty → escalate to explicit
filesystem[] non-empty → escalate to explicit
idempotent: false → escalate to explicit
Override map matches by full id or shortId → forced category (escape hatch)
Otherwise regular

Default policy matrix:

prohibited → deny       (hard, never asks)
explicit   → ask        (TTY prompt unless host injects custom gate)
regular    → allow      (auto-approved, audit-only)

This means no spec changes were needed to ship the harness — the security category is a function of fields the spec already defines (network, filesystem, idempotent, provenance).

Sessions

Each session lives under <sessionsRoot>/<sessionId>/ and is backed by a dedicated just-bash-data bank with three collections:

db sessions    — one document with policy snapshot + metadata
db turns       — append-only history; each Turn includes user message,
                 LLM output, tool calls, approvals
db approvals   — flat audit of every approval decision (allow/deny,
                 source: policy or user, derivation reasons)

harness resume <id> re-opens the dir; db turns find '{}' --sort ts:1 rehydrates history. db <coll> export produces JSON snapshots; db <coll> import round-trips them.

The skills FileBank and the session bank live on separate dirs. They never share state.

Testing

Layer Tests Where
Unit 100 in 6 suites src/*.test.ts
Integration (no LLM) 4 (slice) + 5 (e2e scripted) scratch/{slice,e2e}.ts
Live LLM (Gemma) 1 PASS scratch/e2e-cf-driven.ts
Live LLM (CF, real fetch) listed, opt-in scratch/e2e-cloudflare.ts
npm run test               # all unit tests, compact reporter
npm run test:list          # all unit tests, spec reporter
npm run smoke:slice        # FileBank + runExec round-trip
npm run smoke:e2e          # full loop, 5 approval scenarios
npm run smoke:cf-driven    # full loop, replayed Gemma decisions

CI runs the typecheck + tests + build + the three credential-free smokes on every push to main and every PR. See .github/workflows/ci.yml and TESTING.md for what's covered and what's intentionally not unit-tested.

Layout

src/
  index.ts                library barrel — programmatic API
  types.ts                shared interface contracts (DESIGN §3)
  toolbox.ts              FileBank + runQuery + runExec
  provider.ts             provider barrel + env factory
  provider-anthropic.ts   Anthropic Messages API adapter
  provider-cloudflare.ts  Cloudflare Workers AI (OpenAI-compat endpoint)
  approval.ts             gate + deriveCategory + TTY prompt
  session.ts              createBankBash-backed db wrappers
  policy.ts               YAML loader + DEFAULT_POLICY
  loop.ts                 turn orchestrator
  cli.ts                  entry point — built into bin/harness
  cli-args.ts             argv parser (extracted for testability)
  *.test.ts               70 unit tests (cli-args, approval, policy,
                          provider factory, cloudflare provider)
scratch/                  smoke/integration scripts
examples/                 example policy YAML
dist/                     build output (gitignored, npm-published)
  cli.js                  the harness binary (shebang preserved)
  index.js                programmatic library entry
  *.d.ts                  TypeScript declarations
tsup.config.ts            build config (ESM, node22 target)
DESIGN.md                 contract — read first
PROVIDERS.md              provider abstraction + how to add one
TESTING.md                test layout + coverage notes
COEVOLUTION.md            upstream changes plan (mostly cancelled — see file)
CHANGELOG.md              project journey, v0 + v0.1

Stack version pins

Package Version pinned to Notes
just-bash ^2.14.3 beta but stable surface
@rckflr/agent-skills-cli local file:../agent-skills-cli uses STABLE-tier exports + one INTERNAL (createBankBash)
just-bash-data local file:../just-bash-data bash-first; db/vec subcommands
@anthropic-ai/sdk ^0.40.0 for Anthropic provider
yaml ^2.5.0 policy parsing
Node >=22 required by agent-skills-cli + native fetch/ReadableStream

License

Same as the surrounding ecosystem (MIT). Copy attribution from contributing repos when forking.