Package Exports
- just-bash-harness
Readme
just-bash-harness
Single-agent loop on top of just-bash and the agent-skills ecosystem. Sandboxed tool execution, derived approval gates, persisted sessions, swappable LLM providers.
Version: 0.2.2 · Status: v0 contract complete + packaged + CI'd + polished + applicable_when filter + cross-session memory + search/stats/export + compaction + AES-256-GCM at rest + retrieval bench + interactive REPL + chains + Hermes-style <tool_call> parser (works with Hermes 2 Pro alongside Granite/Gemma/Llama). 134/134 unit tests pass. End-to-end validated against real Gemma 4 26B and Hermes 2 Pro on Cloudflare Workers AI subscribing the public agent-skills-pack@v2.2.0. Distributable as a harness binary via npm run build.
What it is
A thin orchestrator (~1700 LOC TypeScript) that:
- Runs a turn loop: prompt → tool calls → results → next turn → end.
- Resolves tool calls to agent-skills subscribed in a local
FileBank. - Executes each skill in
runExec's per-skill sandboxedjust-bashinstance (FS scratch, network allowlist, env scoping — already provided by@rckflr/agent-skills-cli). - Categorizes each tool call as prohibited / explicit / regular derived from existing skill metadata (no new spec field) and applies the policy matrix.
- Persists session state via
db sessions/turns/approvalscollections in a dedicatedjust-bash-databank. - Speaks to Anthropic Messages API or Cloudflare Workers AI (default model:
@cf/google/gemma-4-26b-a4b-it).
What it is not
- A multi-agent orchestrator. Single agent only.
- A multi-tenant deployment. Single user assumption is intact.
- A sandbox for untrusted user code. The user is trusted; the LLM and skills are not (see DESIGN.md §2).
- A web UI. CLI / TTY only.
Quickstart
git clone <this-repo> harness
cd harness
npm install
npm run build # → dist/cli.js, dist/index.js
# Pick a provider via env (auto-detected). Either of these works:
export CF_ACCOUNT_ID=... CF_API_TOKEN=... # → Gemma 4 26B
export ANTHROPIC_API_KEY=... # → claude-opus-4-7
# Optional: real semantic retrieval (else stub embedder)
export OLLAMA_MODEL=nomic-embed-text # or OPENAI_*, CF_*
# Subscribe a skill pack (signed git tag enforced by default)
node dist/cli.js skills add github.com/foo/bar@v1.0.0
# Run a chat turn
SID=$(node dist/cli.js new)
echo "say hi using the available tools" | node dist/cli.js chat "$SID"
# Resume later
node dist/cli.js resume "$SID"Install globally
npm link # makes `harness` available on PATH
harness --helpOr run directly through tsx during development without building:
npx tsx src/cli.ts --helpArchitecture in one diagram
┌──────────────────────────────────────────┐
│ cli (TTY) │ user-facing
├──────────────────────────────────────────┤
│ loop │ turn protocol
├──────────────────────────────────────────┤
│ provider approval session policy │ cross-cutting
├──────────────────────────────────────────┤
│ toolbox ← FileBank + runQuery/runExec │ skill resolution + execution
└──────────────────────────────────────────┘
│
▼
agent-skills-cli (handles per-skill sandbox)
│
▼
just-bash + just-bash-dataThere is no Sandbox layer of our own. runExec already builds a per-skill sandboxed just-bash instance with the skill's declared network / filesystem / required_env constraints from the spec. Re-implementing that here would diverge from the canonical enforcement.
See DESIGN.md for full layer contracts and DESIGN.md §4 for the turn protocol.
Providers
Two LLM providers ship today; the factory auto-detects from env:
| Provider | Default model | Required env |
|---|---|---|
| Anthropic Messages API | claude-opus-4-7 |
ANTHROPIC_API_KEY |
| Cloudflare Workers AI | @cf/google/gemma-4-26b-a4b-it |
CF_ACCOUNT_ID, CF_API_TOKEN |
Auto-detect prefers Cloudflare when both sets of credentials are present. Override via HARNESS_PROVIDER=anthropic|cloudflare. Override the model via --model <id> flag, HARNESS_DEFAULT_MODEL (Anthropic), or CF_LLM_MODEL (Cloudflare).
See PROVIDERS.md for adding a new provider.
Approval categories — derived, not declared
The harness derives a category for every tool call from existing spec fields:
| Signal | Effect |
|---|---|
provenance.signature_status !== "valid" while policy.signature.require_signed: true |
→ prohibited (hard deny) |
network[] non-empty |
→ escalate to explicit |
filesystem[] non-empty |
→ escalate to explicit |
idempotent: false |
→ escalate to explicit |
| Override map matches by full id or shortId | → forced category (escape hatch) |
| Otherwise | → regular |
Default policy matrix:
prohibited → deny (hard, never asks)
explicit → ask (TTY prompt unless host injects custom gate)
regular → allow (auto-approved, audit-only)This means no spec changes were needed to ship the harness — the security category is a function of fields the spec already defines (network, filesystem, idempotent, provenance).
Sessions
Each session lives under <sessionsRoot>/<sessionId>/ and is backed by a dedicated just-bash-data bank with three collections:
db sessions — one document with policy snapshot + metadata
db turns — append-only history; each Turn includes user message,
LLM output, tool calls, approvals
db approvals — flat audit of every approval decision (allow/deny,
source: policy or user, derivation reasons)harness resume <id> re-opens the dir; db turns find '{}' --sort ts:1 rehydrates history. db <coll> export produces JSON snapshots; db <coll> import round-trips them.
The skills FileBank and the session bank live on separate dirs. They never share state.
Testing
| Layer | Tests | Where |
|---|---|---|
| Unit | 100 in 6 suites | src/*.test.ts |
| Integration (no LLM) | 4 (slice) + 5 (e2e scripted) | scratch/{slice,e2e}.ts |
| Live LLM (Gemma) | 1 PASS | scratch/e2e-cf-driven.ts |
| Live LLM (CF, real fetch) | listed, opt-in | scratch/e2e-cloudflare.ts |
npm run test # all unit tests, compact reporter
npm run test:list # all unit tests, spec reporter
npm run smoke:slice # FileBank + runExec round-trip
npm run smoke:e2e # full loop, 5 approval scenarios
npm run smoke:cf-driven # full loop, replayed Gemma decisionsCI runs the typecheck + tests + build + the three credential-free smokes on every push to main and every PR. See .github/workflows/ci.yml and TESTING.md for what's covered and what's intentionally not unit-tested.
Layout
src/
index.ts library barrel — programmatic API
types.ts shared interface contracts (DESIGN §3)
toolbox.ts FileBank + runQuery + runExec
provider.ts provider barrel + env factory
provider-anthropic.ts Anthropic Messages API adapter
provider-cloudflare.ts Cloudflare Workers AI (OpenAI-compat endpoint)
approval.ts gate + deriveCategory + TTY prompt
session.ts createBankBash-backed db wrappers
policy.ts YAML loader + DEFAULT_POLICY
loop.ts turn orchestrator
cli.ts entry point — built into bin/harness
cli-args.ts argv parser (extracted for testability)
*.test.ts 70 unit tests (cli-args, approval, policy,
provider factory, cloudflare provider)
scratch/ smoke/integration scripts
examples/ example policy YAML
dist/ build output (gitignored, npm-published)
cli.js the harness binary (shebang preserved)
index.js programmatic library entry
*.d.ts TypeScript declarations
tsup.config.ts build config (ESM, node22 target)
DESIGN.md contract — read first
PROVIDERS.md provider abstraction + how to add one
TESTING.md test layout + coverage notes
COEVOLUTION.md upstream changes plan (mostly cancelled — see file)
CHANGELOG.md project journey, v0 + v0.1Stack version pins
| Package | Version pinned to | Notes |
|---|---|---|
just-bash |
^2.14.3 |
beta but stable surface |
@rckflr/agent-skills-cli |
local file:../agent-skills-cli |
uses STABLE-tier exports + one INTERNAL (createBankBash) |
just-bash-data |
local file:../just-bash-data |
bash-first; db/vec subcommands |
@anthropic-ai/sdk |
^0.40.0 |
for Anthropic provider |
yaml |
^2.5.0 |
policy parsing |
| Node | >=22 |
required by agent-skills-cli + native fetch/ReadableStream |
License
Same as the surrounding ecosystem (MIT). Copy attribution from contributing repos when forking.