Package Exports
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (cclaw-cli) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
cclaw
cclaw is a lightweight harness-first flow toolkit for coding agents. Three slash commands. Seven hops (Detect → Triage → Pre-flight → Dispatch → Pause → Compound → Finalize). Four stages (plan → build → review → ship, where build IS a TDD cycle: RED → GREEN → REFACTOR). Six on-demand specialists, every one of them dispatched with a mandatory contract read (.cclaw/lib/agents/<name>.md + wrapper skill) before it acts, all running as isolated sub-agents and emitting a calibrated Confidence: high | medium | low signal. Two read-only research helpers (repo-research, learnings-research) that every plan dispatch invokes before authoring, so plans are grounded in real repo signals and prior shipped lessons rather than training memory. Three Acceptance-Criteria modes (inline / soft / strict) so trivial edits do not pay the price of risky migrations. A five-axis review (correctness · readability · architecture · security · performance) with a five-tier severity vocabulary, a strict-mode adversarial pre-mortem before ship, and a source-driven mode that grounds framework code in current docs. A deep content layer of skills, templates, runbooks, patterns, examples, and recovery playbooks wrapped around a runtime under 1 KLOC — so Claude Code, Cursor, OpenCode, or Codex can move from idea to shipped change with a clear plan, the right amount of ceremony, and almost no orchestrator bloat.
idea
│
▼
/cc <task>
│
┌─────────┴──────────────────────────────────────────┐
│ Hop 1: Detect — fresh start? or resume active flow? │
└─────────┬──────────────────────────────────────────┘
│ fresh
▼
┌────────────────────────────────────────────────────┐
│ Hop 2: Triage — auto-classify task, │
│ recommend path + acMode, runMode (step/auto) │
└─────────┬──────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────┐
│ Hop 2.5: Pre-flight — surface 3-7 assumptions │
│ (stack, conventions, defaults, out-of-scope); │
│ user confirms; persisted to triage.assumptions. │
│ skipped on inline + on resume │
└─────────┬──────────────────────────────────────────┘
│
trivial │ small-medium │ large-risky
acMode │ acMode soft │ acMode strict
inline │ │
▼ ▼ ▼
edit + commit plan → build → review → ship brainstorm? → architect? → plan → build → review → ship
(no plan) each stage in a fresh sub-agent each stage in a fresh sub-agent, parallel-build allowed
│ │ five-axis review · adversarial pre-mortem
└─────────┬────────────┘
▼
compound (auto, gated by quality)
│
▼
active flows → shipped/<slug>/Three slash commands (/cc, /cc-cancel, /cc-idea). Four stages (plan → build → review → ship). Six specialists, all on-demand, all running as sub-agents, all emitting Confidence: high | medium | low. Seventeen skills including the always-on triage-gate, flow-resume, pre-flight-assumptions, tdd-cycle, conversation-language, anti-slop, and the strict-mode-default source-driven. Ten templates including plan-soft.md and build-soft.md for the soft-mode path. Four runbooks. Eight reference patterns. Three research playbooks. Five recovery playbooks. Eight worked examples. Two mandatory gates in strict mode (AC traceability + TDD phase chain); soft mode keeps both as advisory; inline mode skips both.
What changed in 8.11
8.11 is a non-breaking orchestrator-spec cleanup release on top of 8.10.1. Five concrete UX regressions from a real session log got fixed:
- Discovery sub-phase always pauses regardless of
runMode. Inlarge-riskyflows the brainstormer → architect → planner chain used to blow through inautomode — the user never saw the brainstormer'sselected_directionbefore architect's tradeoffs landed on top. Now each discovery step renders its slim summary and ends the turn; the user types/ccto advance. The auto-mode chain only applies to plan → build → review → ship transitions, never to discovery-internal handoffs. Cancelis no longer a clickable option in any picker. Hop 1 detect, Hop 2.5 pre-flight, Hop 4 hard gates, flow-resume picker, and interpretation forks all dropped theirCancelrow./cc-cancelis a separate explicit user-typed command for nuking flow state — the orchestrator surfaces it only in plain prose, only when the user looks stuck. Putting a destructive command behind a one-keystroke option was a footgun./ccis the single resume verb. Step mode used to sayI type "continue" to advance(three places: start-command, triage-gate, flow-resume) — two competing magic words for the same action. Nowstepmode = render slim summary, end the turn; the user sends/cc(the same verb that resumes any other paused flow). One mechanic, one verb.- Slug naming format is
YYYYMMDD-<semantic-kebab>. Hop 2 Triage now mandates a date prefix on every minted slug (20260510-billing-rewrite). Same-day collisions resolve by appending-2,-3, etc. The date prefix is mandatory and ASCII regardless of conversation language.orchestrator-routing.semanticSlugTokens(slug)strips the prefix before Jaccard matching, so same-topic flows on different days are still reliably matched against shipped artefacts. - Structured asks render in the user's conversation language. Every fenced
askUserQuestion(...)example instart-command.ts,skills.ts(triage-gate, pre-flight-assumptions, interpretation-forks, flow-resume), andconversation-language.mdnow uses<option label conveying: ...>placeholder notation instead of literal English option strings. The agent cannot copy a literal English string because there isn't one — the slot describes the intent and the agent must verbalise it in the user's language. Mechanical tokens (/cc,/cc-cancel, stage names, mode names, slugs, file paths, JSON keys,AC-N, complexity / acMode keywords) stay in their original form. Theconversation-languageskill's worked example was rewritten as a language-neutral schema. Thebrainstormer.tsandarchitect.tsspecialist prompts now explicitly requirecheckpoint_question,What changed,Notes, andopen_questionsvalues to render in the user's language.
23 new tests (tests/unit/v811-cleanup.test.ts) cover all five fixes; the existing start-command resume test was updated to match the new placeholder shape. 385 tests across 40 files, all green.
No breaking changes, no new CLI commands, no new config keys, no new dependencies. Existing flows with non-dated slugs continue to work; the date prefix is only required for new slugs minted on or after 8.11.
What changed in 8.10.1
A real install run on 8.10.0 surfaced two regressions:
- The picker's full-screen clear (
\u001b[2J\u001b[H) was wiping the freshly-printed banner and welcome card on first render. Users never saw the new ASCII logo when the picker was involved. - The picker frame stayed in the terminal scrollback after
Enter—cleanupnever erased it. Visual leftover sitting above install progress.
8.10.1 fixes both with the same idea: stop using full-screen escapes, let the picker manage only its own region. New pure helpers eraseLines(count) and frameLineCount(frame) in harness-prompt.ts. runPicker tracks lastFrameHeight and erases only that many rows on redraw and on cleanup. Banner / welcome stay intact above the picker; after confirm the picker vanishes and install progress lines start at the row where the picker frame began. As a bonus, renderBanner / renderSummary / renderWelcome now end with \n\n (was \n) so there's exactly one blank line of breathing room between every two sections — between banner and Usage:, between summary and the final completion line, between welcome and the first ✓ step on the auto-detect path.
What changed in 8.10
8.10 is a non-breaking install-UX polish release on top of 8.9. cclaw init / sync / upgrade were silent for ~2 seconds and the help / version surface looked dated next to peer tools. Worse, cclaw --version was reporting 8.7.0 even on the npm-shipped 8.9.0 — the CCLAW_VERSION constant was hardcoded in src/constants.ts and nobody bumped it during the 8.8 / 8.9 releases. A tool that lies about its own version is a tool you can't debug.
- ASCII logo banner. New
src/ui.tsmodule renders a block-letterCCLAWbanner (Unicode box-drawing) followed bycclaw vX.Y.Z — harness-first flow toolkit for coding agents. Shown oninit/sync/upgrade/uninstall/help. Not shown onversion. HonoursNO_COLOR,FORCE_COLOR, andstdout.isTTY— piped output, CI logs, andNO_COLOR=1get plain ASCII; live TTYs get cyan. - Per-step progress feedback.
syncCclawnow acceptsonProgress: (event) => voidinSyncOptions; the CLI wires it to a✓ <step> — <detail>line printer. Twelve major steps emit progress (runtime root, specialists, hooks, skills, templates, runbooks, patterns, research, recovery, examples, anti-patterns, harness assets, config). Programmatic callers (smoke, MCP wrappers, tests) leaveonProgressundefined to stay silent. - Final summary block.
init/sync/upgradeprint anInstalledblock at the end with one row per asset family (Agents, Skills, Templates, Runbooks, Patterns, Research, Recovery, Examples, Hooks, Commands) plus counts.uninstallreports which harnesses were removed. - First-run welcome. On
initwhen.cclaw/config.yamldoes not exist yet, a two-line welcome card prints before the picker / sync starts. Suppressed on re-init, sync, upgrade. - Polished harness picker. Cyan header, dim description column per harness, green
[x]selected, dim[ ]unselected, cyan>cursor pointer, dim cyan(detected)tag, dim hotkey legend. Frame renderer is now a purerenderPickerFrame(state, detected, useColor): stringfunction so tests can assert on layout without a TTY. - Coloured help body.
cclaw helprenders cyan flag names, dim descriptions, yellow section headings. Auto-aligned per section. CCLAW_VERSIONis single-source-of-truth.src/constants.tsno longer hardcodes the version. The constant now reads frompackage.jsonat module-load time. Version bumps require updating exactly one file (package.json).
No breaking changes. No new dependencies (raw ANSI escapes — no chalk / kleur / picocolors). No new commands, no new config keys, no new flags. Drop-in upgrade — the existing [cclaw] init complete. line is unchanged; CI parsers reading that line still work.
What changed in 8.9
8.9 is a non-breaking improvement release on top of 8.8. Three concrete additions distilled from a parallel audit of addyosmani-skills, everyinc-compound, and gsd-v1 against cclaw v8.8 — most of those references' ideas were rejected (multi-flow factories, marketplace converters, 30+ specialists, prose-lock contract tests, version-archaeology rhetoric). The three that survived address concrete failure modes already happening in real flows.
knowledge.jsonlnear-duplicate detection on append.KnowledgeEntrycarries optionaltouchSurface[]anddedupeOffields. NewfindNearDuplicate(projectRoot, candidate, options?)helper computes Jaccard similarity (default 0.6 threshold) overtags ∪ touchSurfaceagainst the most recent 50 entries.runCompoundAndShipruns dedup before append and stampsdedupeOf: <earlier-slug>on near-duplicates. The append stays append-only —learnings-researchand human readers see the chain viadedupeOf, the file never gets rewritten, concurrent-write safety preserved. Stopsknowledge.jsonlfrom snowballing across 50+ shipped flows with near-identical "rate-limit middleware bug fixed insrc/auth/" entries.- Slice-builder coverage-assess beat between GREEN and REFACTOR. New hard rule 17 in
slice-builder.ts: after GREEN passes the full suite and before REFACTOR is committed, the slice-builder writes one explicit Coverage line per AC tobuild.mdwith verdictfull/partial/refactor-only. Silence and "looks fine" are not acceptable. StrictBUILD_TEMPLATEgains a## Coverage assessmenttable; softBUILD_TEMPLATE_SOFTgains a**Coverage**:bullet. The slice-builder'sself_review[]array now carries five rules (was four): the newcoverage-assessedrule joinstests-fail-then-pass/build-clean/no-shims/touch-surface-respected. The orchestrator's pause hop bounces the slice in fix-only mode if the rule hasverified: falseor empty evidence — no reviewer cycle paid for an under-evidenced slice. - Flow-pressure advisory in
session-start.mjs. The hook now sums the byte size of everyflows/<slug>/*.mdartefact for the active slug and emits an advisory message at three thresholds:≥30 KB(elevated → "let the orchestrator dispatch a fresh sub-agent for the next AC"),≥60 KB(high → "finish the active slice and resume from a clean session"),≥100 KB(critical → "consider/cc-canceland resplitting"). Advisory only — never blocks. Folded into the existingsession-start.mjsso installer surface stays the same; no new hook file, no new harness wiring.
No breaking changes. Drop-in upgrade — touchSurface / tags / dedupeOf are optional fields, slice-builders running on 8.8 prompts will catch up on the first fix-only bounce, session-start advisory appears after cclaw sync.
What changed in 8.8
8.8 is a non-breaking cleanup release on top of 8.7. Seven concrete bugs found in an audit against mattpocock/skills post-30-Apr commits and cclaw's own codebase, paired with two pruning passes the user explicitly asked for: "too many tests and they're not useful" + "the tool is cluttered with v8 / v7 / A1 / etc all over the place".
- B1 —
interpretationForksis wired (no longer a no-op). 8.7 addedtriage.interpretationForkstoflow-state.jsonand the structured-ask UI but never plumbed it through the specialist prompts.brainstormer,planner,architect, andslice-buildernow all read it from the dispatch envelope and respect the chosen reading; planner and architect copy it verbatim next to assumptions inplan.md/decisions.mdand surface conflicts as feasibility / decision blockers.flow-state.tsassertTriageOrNullnow validates the field shape; newinterpretationForksOf(triage)helper mirrorsassumptionsOf. - B2 — TDD anti-patterns rebuilt against
antipatterns.ts. The## Anti-patternssection intdd-cyclenow cites the actual catalogued A-numbers (A-2 phase integrity, A-3git add -A, A-12 single-test green, A-13 horizontal slicing, A-14 pushing past failing test, A-15 mocking-what-should-not-be-mocked) — phantom A-18 / A-19 / A-20 references removed (those numbers don't exist or got renumbered with totally different meanings in 8.7). New A-N parity test scans every skill body, every specialist prompt, the start-command, all stage playbooks, and the recovery playbook to catch this from regressing. - B3 — Slice-builder hard rule 6 scoped to strict mode. Used to read "use
commit-helper, nevergit commitdirectly" unconditionally — which contradicted the soft-mode commit table earlier in the same prompt. Now reads "In strict mode:commit-helper. In soft mode: plaingit commitis fine." Matches the table. - B4 — Severity scale aligned with the reviewer. Slice-builder env-shim rule, planner edge-case finding, security-reviewer Output section, worked example, and JSON summary all migrated from the legacy
block/warn/info/security-blockvocabulary to the canonical 5-tiercritical/required/consider/nit/fyiscale. Security-reviewer JSON summary now reportsby_axis+by_severityinstead of legacy 3-tier counts. - B5 — v7 paths replaced everywhere (47 occurrences across 6 files).
plans/<slug>.md/decisions/<slug>.md/builds/<slug>.md/reviews/<slug>.md/ships/<slug>.md/learnings/<slug>.md→flows/<slug>/<artifact>.md. The active flow lives atflows/<slug>/, shipped flows atflows/shipped/<slug>/— the v7 directory layout is gone, this release just aligns the prompt text with the actual layout. - B6 — Architect
Sub-agent contextnumbering fixed. Two bullets numbered "6." → renumbered 6 / 7. - B7 — TDD gate name unified to
red_test_written. Wasred_test_recordedintdd-cycleandred_test_writteninstage-playbooks— picked the latter (more accurate; the gate verifies the RED commit exists, not just that "a test was recorded"). Test added to lock it. - Tier 2 — Test suite pruned 569 → 298. Six version-snapshot regression files (
v82-throughv87-) were almost entirely prose-locks (expect(skill.body).toMatch(/v8\.7\+/)) that froze wording without protecting behaviour. Removed wholesale; the 7 tests that did protect behaviour (discriminator narrowing, schema validation) were extracted and consolidated intoflow-state.test.ts. Newv88-cleanup.test.ts(42 tests) replaces the deleted suites with targeted regression guards: B1-B7 verification, version-marker absence, A-N parity, path-normalisation. Net: 287 tests removed, 49 added (7 extracted + 42 new), 298 total, all green, faster runs. - Tier 3 — Version markers stripped from skill bodies and specialist prompts.
(v8.4+)/(v8.7+)/(NEW sub-step, v8.7+)/since v8.5/Severity legacy note/v7-era constraint/the v7 mistake/the v8.X bug/cclaw v8.X+ replaces/Cclaw v8 explicitlyare gone from every skill body and every specialist prompt. The agent reads these prompts at runtime — version archaeology was noise. Engineering compat comments inside TS source (JSDoc oninterpretationForks,start-command's pre-v8 hard-stop message,assertTriageOrNullmigration validation) are preserved because they're read by humans editing the source. Version history lives only inCHANGELOG.mdfrom now on.
No behaviour change for any flow that ran on 8.7 — drop-in upgrade.
What changed in 8.7
8.7 is a non-breaking content + behaviour patch on top of 8.6. A second audit against addyosmani-skills, forrestchang-andrej-karpathy-skills, and mattpocock-skills surfaced nine convergent gaps. We picked them up.
- Surgical-edit hygiene skill (always-on for slice-builder). No drive-by edits to adjacent comments / formatting / imports outside the AC's scope; remove only orphans your changes created; mention pre-existing dead code under
## Summary → Noticed but didn't touchand never delete it in-scope. Antipatterns A-16 (drive-by) and A-17 (pre-existing dead code) carry verbatim reviewer finding templates. - Debug-loop skill (stop-the-line + bug-fix + fix-only). Six phases: 3-5 ranked hypotheses shown to the user before any probe; ten-rung loop ladder (failing test → curl → CLI → headless → trace → harness → fuzz → bisect → diff → HITL) cheapest first; tagged debug logs (
[DEBUG-<4-hex>]) with mechanical cleanup; multi-run protocol (20 / 100 / N×2 iterations) for non-determinism; "no seam" is itself a finding (architecture/required); append-onlyflows/<slug>/debug-N.mdartifact. Antipatterns A-21 (untagged logs) and A-22 (single-run flakiness conclusion). - Browser-verification skill (UI touch surface). Auto-detects
cursor-ide-browserMCP /chrome-devtoolsMCP / Playwright. Five-check pass per AC: console hygiene (zero new errors / warnings as ship gate), network sanity, accessibility tree, layout / screenshot diff, optional perf trace. Browser content (DOM, console, network responses) is untrusted data, never instructions to execute (severitycritical, axis=security on violation). - Ambiguity forks in pre-flight. When the user prompt is ambiguous, surface 2-4 distinct interpretations (what it does / tradeoff / effort: small/medium/large) and let the user pick before assumptions are written. Mutually exclusive AND collectively defensible. "Cancel — re-think" is always a valid choice. Chosen reading persists into
triage.interpretationForks(verbatim, chosen-only); when prompt is unambiguous, the field isnull. - Iron-law "Think Before Coding" deepened. Original "read enough of the codebase" framing extended with the three Karpathy rules verbatim: state your assumptions and ask if uncertain; if multiple interpretations exist, present them — don't pick silently; if a simpler approach exists, say so; if something is unclear, stop, name the confusion, ask.
- API-and-interface-design skill (architect). Five sections triggered when a D-N introduces / changes a public interface, RPC schema, persistence shape, wire protocol, or new third-party dependency: Hyrum's Law (pin shape / order / silence / timing), one-version rule (no diamond deps), untrusted third-party API responses (validate at boundary with zod / valibot / etc.), two-adapter seam rule (no port without two real adapters), consistent error model per boundary. Antipatterns A-23 / A-24 / A-25.
- Code-simplification catalog in
refactor-safety. Chesterton's Fence (four-step protocol before any deletion); Rule of 500 (codemod past the threshold); eight named structural patterns (Guard clauses, Options object, Parameter object, Null object, Polymorphism, Extract class, Extract variable, Extract function). Antipatterns A-26 / A-27. - Test-design checklist in
tdd-cycle. One logical assertion per test; SDK-style boundary APIs over generic-fetcher mocks; primitive obsession + feature envy as named smells surfaced under## Summary → Noticed but didn't touch. Antipatterns A-28 / A-29 / A-30. - Deprecation & migration in
breaking-changes. Churn Rule (deprecator owns migration); Strangler Pattern (five phases with canary + parity); Zombie Code lifecycle (assign owner OR deprecate with concrete plan; never silently extend). Antipatterns A-31 / A-32 / A-33.
What changed in 8.6
8.6 is a non-breaking content + behaviour patch on top of 8.5. Two reference libraries — addyosmani-skills and chachamaru127-claude-code-harness — pointed at six things 8.5 still didn't do. We picked them up.
- Three-section Summary block in every primary artifact. A new always-on skill
summary-format.mddefines the canonical block. Everyplan.md,decisions.md,build.md, andreview.mdnow ends with## Summary — <specialist>[ — iteration N]containing### Changes made,### Noticed but didn't touch, and### Potential concerns. Each is a bullet list. The reviewer adds one block per iteration. - Anti-sycophancy reviewer + verification story. The reviewer's iteration output now carries
### What's done well(≥1 evidence-backed item per iteration, with file:line / hunk / test name; no generic praise) and### Verification story(three explicit yes/no rows: tests run, build/typecheck run, security pre-screen run, each with concrete evidence). An iteration without all three Verification rows or with zeroWhat's done wellbullets is a contract violation. - Self-review gate before reviewer dispatch. Slice-builder's strict-mode summary block now carries
self_review[]with four mandatory rules:tests-fail-then-pass,build-clean,no-shims,touch-surface-respected. Each carriesverified: <true | false>andevidence. The orchestrator's Hop 4 — Pause — bounces the slice back to the slice-builder infix-onlymode (no reviewer dispatched) if any rule hasverified: falseor emptyevidence. Saves a full reviewer round-trip on incomplete slices. - Repo-wide ADR catalogue. A new skill
documentation-and-adrs.mddescribes Architectural Decision Records living atdocs/decisions/ADR-NNNN-<slug>.md. Lifecycle isPROPOSED → ACCEPTED → SUPERSEDED(plusREJECTEDon flow cancel). The architect proposes (writesstatus: PROPOSEDADRs alongsidedecisions.mdduringlarge-riskyflows). The orchestrator promotesPROPOSED → ACCEPTEDduring Hop 6 — Finalize. The orchestrator rewritesPROPOSED → REJECTEDon/cc-cancel. ADRs are NEVER deleted. - SDD doc cache for source-driven mode.
source-driven.mdgrew a "Cache lookup before fetch" section. Cache lives at.cclaw/cache/sdd/<host>/<url-path>.{html,etag,last-modified}(per-project, gitignored). Lookup rules: fresh (< 24h) →cache_status: fresh-cache; stale + 304 →cache_status: revalidated-cache; stale + 200 →cache_status: refetched; network failure with stale cache →cache_status: stale-cache(treated as aconsiderfinding by the reviewer); miss →cache_status: fetched..cclaw/cache/added toREQUIRED_GITIGNORE_PATTERNS. - Mandatory pre-task read order in architect and planner (brownfield). Both prompts gained a
Phase 2.5 — Pre-task read orderstep that runs before any authoring on brownfield repos: target file → tests → neighbour pattern → types. Architect's self-review checklist now requires everyD-Ndecision to cite which read produced the supporting evidence. Planner's self-review checklist now requires every AC'stouchSurfacepath to have been physically read in step 1, NOT picked fromrepo-research.md's summary. Greenfield writes "no existing files — N/A" against each step and continues.
What changed in 8.5
8.5 picked up six things that broke in a real test run: ship duplicating the flow dir, specialists being dispatched without their full contracts, discovery rendering as both a stage entry and a sub-phase, pre-mortem.md not being archived, lastSpecialist not updating mid-discovery, and no mechanism for grounding plans in repo signals + prior shipped lessons.
- Hop 6 — Finalize (orchestrator-only,
git mvsemantics). A new explicit hop replaces the one-line ship-finalize instruction. The orchestrator runsgit mv(ormvwhen files aren't tracked) on every artifact, asserts the active dir empties out, and resetsflow-state.json. The word "copy" is forbidden anywhere in finalize. - Mandatory contract reads in every dispatch envelope. Every dispatch envelope now starts with two non-negotiable reads:
.cclaw/lib/agents/<specialist>.md(the contract) +.cclaw/lib/skills/<wrapper>.md(the wrapping skill). A sub-agent that skips either is acting on a hallucinated contract. - Brainstormer rewritten as an explicit 8-phase workflow. Bootstrap → Posture pick → Repo signals scan → (optional) repo-research dispatch → Clarifying questions (one at a time, max 3) → Author → 9-item self-review checklist → Return slim summary + JSON.
- Two read-only research helpers —
repo-researchandlearnings-research. Lightweight on-demand sub-agents the planner / architect / brainstormer dispatch before authoring.repo-researchscans manifests,AGENTS.md/CLAUDE.md, focus-surface dirs, test conventions.learnings-researchscansknowledge.jsonl, scores entries, picks 1-3 with score ≥ 4, opens each candidate'slearnings.md. They never becomelastSpecialist. discoveryis a sub-phase ofplan, never atriage.pathentry.triage.path ⊆ {plan, build, review, ship}. Onlarge-risky, the plan stage expands intobrainstormer → checkpoint → architect → checkpoint → plannerinstead of dispatchingplannerdirectly. Pre-v8.5 state files containing"discovery"in the path are normalised on read.pre-mortem.mdis a first-class artifact stage.ArtifactStagewidens;compound.runCompoundAndShip'sallStagesarray gains"pre-mortem"; the Hop 6 finalize move list includes it.lastSpecialistwidened fromDiscoverySpecialistIdtoSpecialistId. Updated after every dispatch, not only at end-of-stage.
What changed in 8.4
8.4 is a non-breaking content + behaviour patch on top of 8.3, picking up seven things three reference skill libraries do that cclaw 8.3 didn't.
- Confidence calibration in slim summaries. Every specialist emits
Confidence: high | medium | low. The orchestrator's Hop 4 — Pause — treatsConfidence: lowas a hard gate in bothstepandautomodes: it pauses, refuses to chain, and offersexpand <stage>(re-dispatch with a richer envelope),show,override, orcancel. - Pre-flight assumptions (Hop 2.5). A new orchestrator hop runs after triage, before the first specialist dispatch, on every fresh non-inline flow. It surfaces 3-7 numbered assumptions (stack + version, repo conventions, architecture defaults, out-of-scope items) using the harness's structured ask, persists them to
triage.assumptions(string array), and makes them immutable for the lifetime of the flow. Bothplannerandarchitectread them verbatim before authoring; a decision that would break an assumption surfaces as a feasibility blocker, not a silent override. - Five-axis review. The reviewer's
codemode now mandates five axes —correctness,readability,architecture,security,performance— every iteration. Findings carryaxisand a five-tierseverity: critical | required | consider | nit | fyi. Ship gates:strictblocks on any opencriticalorrequired;softblocks only oncritical. Legacyblock | warn | infoledgers are migrated forward by the reviewer prompt. - Source-driven mode. A new always-on skill
source-driven.mdinstructsarchitectandplanner(and indirectlyslice-builder) to detect stack + versions, fetch the version-pinned official doc page, implement against documented patterns, and cite URLs indecisions.mdand code comments. Default in strict mode for framework-specific work, opt-in forsoft. Integrates with theuser-context7MCP tool when available, falls back toWebFetch. When docs are unreachable: writeUNVERIFIED — implementing against training memorynext to the affected line. - Adversarial pre-mortem before ship (strict only). Hop 5 — Ship + Compound — now dispatches
reviewermode=adversarialin parallel withreviewermode=release. The adversarial reviewer picks the most pessimistic plausible reading and writesflows/<slug>/pre-mortem.mdlisting 3-7 likely failure modes (data-loss, race, regression, blast-radius, rollback-impossibility, accidental-scope, hidden-coupling). Uncovered risks becomerequired/criticalfindings, escalating the ship gate. - Cross-flow learning in the planner. The planner reads
.cclaw/knowledge.jsonlat every dispatch and surfaces 1-3 relevant prior entries — lessons captured bycompoundfrom past shipped slugs — in a new## Prior lessonssection inplan.md, citinglearnings/<slug>.md. Filtering: surface-area overlap, tag overlap, recency. - Test-impact-aware GREEN. The
tdd-cycle.mdskill's GREEN phase now distinguishes a fast inner loop (affected-test pattern) from a safe outer loop (full project suite). REFACTOR still always runs the full suite. Mandatory gategreen_two_stage_suiteis added tocommit-helper.mjs --phase=greenguidance.
What changed in 8.3
8.3 is a non-breaking content + UX patch on top of 8.2.
- Triage as a structured ask, not a code block. The orchestrator now uses the harness's structured question tool (
AskUserQuestion/AskQuestion/prompt) to render the triage. Two questions, in order: pick the path, then pick the run mode. The fenced form remains as a fallback only. - Run mode:
step(default) vsauto.steppauses after every stage and waits forcontinue(8.2 behaviour).autochains plan → build → review → ship without pausing; stops only on block findings, cap-reached, security findings, or beforeship. New optional fieldtriage.runModeinflow-state.json. - Explicit parallel-build fan-out in Hop 3. The
/ccbody now carries a full ASCII fan-out diagram for the strict-mode parallel-build path —git worktreeper slice, max 5 slices, oneslice-buildersub-agent per slice, integration reviewer, merge sequence. The skillparallel-build.mdalready had this; the orchestrator now sees it at the dispatch site. - TDD cycle deepening. Four new sections in
tdd-cycle.md: vertical slicing / tracer bullets, stop-the-line rule, Prove-It pattern for bug fixes, writing-good-tests rules (state-not-interactions, DAMP over DRY, real-over-mock, test pyramid). Three new antipatterns: A-13 horizontal slicing, A-14 pushing past a failing test, A-15 mocking what should not be mocked.
What changed in 8.2
8.2 is a non-breaking redesign of the /cc orchestrator on top of 8.1.
- Triage gate. Every fresh flow runs the
triage-gateskill, which classifies the task astrivial/small-medium/large-riskyfrom six heuristics, recommends a path and anacMode, and asks the user to accept or override. The decision is persisted intoflow-state.jsonso resumes never re-prompt. - Graduated AC. Acceptance Criteria are no longer one-size-fits-all.
inline(trivial) skips them entirely.soft(small-medium) uses a bullet list of testable conditions with no AC IDs and an advisory commit-helper.strict(large-risky) is the 8.1 behaviour byte-for-byte: AC IDs, mandatorycommit-helper.mjs --ac-id=AC-N --phase=red|green|refactor, per-AC TDD chain. - Sub-agent dispatch.
plan,build,review, andshipeach run in a fresh sub-agent invocation. The orchestrator hands a slim envelope (slug / stage / acMode / artifact paths) and gets back a fixed 5-to-7-line summary plus the artifact on disk. No specialist reasoning leaks into the orchestrator context. - Resume. Invoking
/ccwhile a flow is active triggers theflow-resumeskill: 4-line summary plusrresume /sshow /ccancel /nstart new. The triage decision is preserved across sessions. - Schema bump.
flow-state.jsonis nowschemaVersion: 3with atriagefield. Existing v2 files are auto-migrated on first read withacMode: strictso existing flows behave exactly as in 8.1.
What changed in v8
cclaw v8.0 was a breaking redesign of the v7 stage machine. We dropped the 7.x stage machine: no more brainstorm / scope / design / spec / tdd mandatory stages, no more 18 specialists, no more 9 state files, no more 30 stage gates. v7.x runs are not migrated; see docs/migration-v7-to-v8.md.
What we kept and made deeper:
- plans with acceptance criteria + YAML frontmatter (
slug,stage,status,ac[],last_specialist,refines,shipped_at,ship_commit,review_iterations,security_flag); - build is a TDD stage — every AC goes through RED → GREEN → REFACTOR;
commit-helper.mjs --phase=red|green|refactorenforces the cycle (production files in RED are rejected, GREEN without prior RED is rejected, REFACTOR is mandatory); - AC ↔ commit traceability enforced by
commit-helper.mjs; - artifact templates for every stage (
plan,build,review,ship,decisions,learnings,manifest,ideas,iron-laws); - twelve auto-trigger skills — plan-authoring, AC traceability, refinement, parallel-build, security-review, review-loop, commit-message-quality, AC-quality, refactor-safety, breaking-changes, conversation-language (always-on), anti-slop (always-on), plus a meta-skill that ties them together;
- stage runbooks (
.cclaw/lib/runbooks/{plan,build,review,ship}.md) — strict checklists per stage with common pitfalls; - reference patterns (
.cclaw/lib/patterns/) — eight task-type playbooks (api-endpoint, auth-flow, schema-migration, ui-component, perf-fix, refactor, security-hardening, doc-rewrite) the orchestrator opens before authoring AC; - research playbooks (
.cclaw/lib/research/) — reading the codebase (files + tests + integration boundaries), time-boxing, using prior shipped slugs; - recovery playbooks (
.cclaw/lib/recovery/) — AC traceability break, review hard cap reached, parallel-build slice conflict, frontmatter corruption, schemaVersion mismatch; - examples library (
.cclaw/lib/examples/) — eight real-looking plan / build / review / ship / decision / learning / commit-helper artifacts; - antipatterns (
.cclaw/lib/antipatterns.md) — twelve known failure modes the reviewer cites as findings; - decision protocol (
.cclaw/lib/decision-protocol.md) — short-form digest of "is this even a decision?"; full D-N schema lives inlib/agents/architect.md, worked decisions inlib/examples/; - resumable refinement via frontmatter on shipped slugs (
refines: <old-slug>); - durable artifacts your team and graph tools (Graphify, GitNexus, etc.) can index.
First 5 minutes
Requirements: Node.js 20+ and a git project.
cd /path/to/your/repo
npx cclaw-cli init # interactive picker; auto-detected harness pre-selected
npx cclaw-cli init --harness=claude,cursor,opencode,codex # explicit, no pickerinit resolves harnesses in this order:
--harness=<id>[,<id>]flag if passed.- Existing
.cclaw/config.yaml(so subsequentinit/sync/upgradeare deterministic). - Interactive picker when stdin/stdout are a TTY: a checkbox over the four harnesses with auto-detected ones pre-selected and tagged
(detected). Up/Down or k/j to move, Space to toggle,ato select all,nto deselect all, Enter to confirm, Esc/Ctrl-C to cancel. - Non-TTY (CI, piped input,
npm exec --yes): auto-detect from project root markers:.claude/,.cursor/,.opencode/,.codex/,.agents/skills/,CLAUDE.md,opencode.json,opencode.jsonc. - If nothing detected and no flag passed → exit with an actionable error. cclaw never silently picks a harness for you.
Then work entirely inside your harness:
/cc <task> plan / build / review / ship — orchestrator routes everything
/cc-cancel stop the active run cleanly (artifacts move to .cclaw/flows/cancelled/<slug>/)
/cc-idea drop a half-formed idea into .cclaw/ideas.md (no flow started)There is no cclaw plan, cclaw status, cclaw ship, or cclaw migrate CLI command. Flow control lives in /cc inside the harness.
Six specialists, all on demand
| id | modes | when |
|---|---|---|
brainstormer |
frame / scope / alternatives | ambiguous request, need a frame and scope |
architect |
architecture / feasibility | structural decisions or feasibility check |
planner |
research / work-breakdown / topology | breaking work into AC and choosing topology |
reviewer |
code / text-review / integration / release / adversarial | reviews of any kind |
security-reviewer |
threat-model / sensitive-change | auth / secrets / supply chain / data exposure |
slice-builder |
build / fix-only | implementing AC and applying scoped fixes |
Specialists are proposed only when the task is large, abstract, risky, security-sensitive, or spans multiple components. Trivial and small/medium tasks run inline. Each prompt is 150-280 lines and includes an explicit output schema, two or more worked examples, edge cases, common pitfalls, and hard rules (see .cclaw/lib/agents/*.md after install). The orchestrator pulls additional context from runbooks, patterns, examples, and recovery playbooks as needed; see docs/skills.md for the auto-trigger layer that wraps every invocation.
Plan artifact, by example
---
slug: approval-page
stage: plan
status: active
ac:
- id: AC-1
text: "User sees an approval status pill on the dashboard."
status: pending
- id: AC-2
text: "Pending approvals show a tooltip with the approver's name."
status: pending
last_specialist: null
refines: null
shipped_at: null
ship_commit: null
review_iterations: 0
security_flag: false
---
# approval-page
> One paragraph: what we are doing and why.
## Acceptance Criteria
| id | text | status | commit |
| --- | --- | --- | --- |
| AC-1 | User sees an approval status pill on the dashboard. | pending | — |
| AC-2 | Pending approvals show a tooltip with the approver's name. | pending | — |The same shape applies to build.md (commit log), review.md (findings + Five Failure Modes pass), ship.md (release notes + push/PR refs), decisions.md (architect output), learnings.md (compound output). Templates live in .cclaw/lib/templates/.
Artifact tree
.cclaw/
config.yaml cclaw config (harness, flow defaults)
ideas.md append-only idea backlog (/cc-idea)
knowledge.jsonl cross-feature learnings index, append-only
state/
flow-state.json ~500 bytes, schemaVersion: 2
hooks/
session-start.mjs rehydrates flow state on harness boot
stop-handoff.mjs short reminder when stopping mid-flow
commit-helper.mjs atomic commit per AC + traceability + TDD phase gate
flows/ everything that comes out of a /cc run
<slug>/ one folder per active flow
plan.md current work + AC
build.md implementation log + TDD evidence
review.md Concern Ledger + iteration logs
ship.md preflight + AC↔commit map + rollback + finalization
decisions.md architect output (optional; only when architect ran)
learnings.md compound output (optional; only when gated)
shipped/<slug>/ plan.md, build.md, review.md, ship.md,
decisions.md, learnings.md, manifest.md
cancelled/<slug>/ when /cc-cancel is invoked
lib/ reference content shipped by the installer
agents/ 6 specialist prompts (each ends with a Composition footer
locking it to its lane — no nested orchestration)
skills/ 12 auto-trigger skills (2 always-on: conversation-language,
anti-slop; 10 stage- or event-gated)
templates/ 9 templates (plan, build, review, ship, decisions,
learnings, manifest, ideas, iron-laws)
runbooks/ 4 stage runbooks (plan, build, review, ship)
patterns/ 8 task-type playbooks
research/ 3 research playbooks
recovery/ 5 recovery playbooks
examples/ 8 worked examples
antipatterns.md 12 named failure modes
decision-protocol.md short-form digest; full schema in lib/agents/architect.md.cclaw/state/ and .cclaw/worktrees/ are appended to .gitignore on init (transient per-session data). The rest of .cclaw/ is committable; graphify, team review, and the next agent all need it.
The split is deliberate. Active and archived flow artifacts go under flows/ so the orchestrator never confuses them with the read-only library under lib/. Runtime (state/, hooks/) stays at the top so harness hooks can find it without traversal. Active flows are grouped by slug — open flows/<slug>/ and every artifact for that flow is right there, instead of scattered across six per-stage subdirectories.
AC traceability gate (mandatory)
Ship is blocked unless every AC in the active plan is status: committed with a real commit SHA. The commit-helper.mjs hook is the only supported way to commit during /cc:
git add path/to/changed/file
node .cclaw/hooks/commit-helper.mjs --ac=AC-1 --message="implement approval pill"The hook checks that AC-1 is declared in plan.md, refuses to run when flow-state.json schemaVersion is not 2, runs git commit, captures the new SHA, and writes it back into flow-state.json. If you commit by hand, AC traceability breaks and ship will refuse.
Compound learnings (automatic, gated)
After ship, cclaw automatically checks whether the run produced something worth remembering:
- a non-trivial decision was recorded by
architectorplanner, or - review needed three or more iterations, or
- a security review ran or
security_flagis true, or - the user explicitly asked to capture (
/cc <task> --capture-learnings).
If yes → flows/<slug>/learnings.md is written from the template, and one line is appended to knowledge.jsonl recording the slug, ship_commit, signals, and refines chain. If no → silently skipped, so the index stays signal-rich. Then everything moves to flows/shipped/<slug>/ with a manifest.md.
Parallel-build (cap: 5 slices, git worktree)
Inline is the default. Parallel-build is opt-in and only when planner declares it. Pre-conditions: ≥4 AC, ≥2 distinct touchSurface clusters, every AC parallelSafe: true, no AC depends on outputs of another AC in the same wave.
A slice = 1+ AC with a shared touchSurface. If planner produces more than 5 slices, planner must merge thinner slices into fatter ones — never generate "wave 2", "wave 3". The 5-slice cap is the v7-era constraint kept on purpose: orchestration cost grows non-linearly past 5 sub-agents, and 5 fits comfortably under every harness's sub-agent quota.
When the harness supports sub-agent dispatch, each parallel slice runs in its own worktree:
git worktree add .cclaw/worktrees/<slug>-slice-1 -b cclaw/<slug>/slice-1
git worktree add .cclaw/worktrees/<slug>-slice-2 -b cclaw/<slug>/slice-2
git worktree add .cclaw/worktrees/<slug>-slice-3 -b cclaw/<slug>/slice-3Each slice-builder runs RED → GREEN → REFACTOR for every AC it owns sequentially inside its worktree. After the wave, reviewer in integration mode reads from each worktree's branch and the orchestrator merges them in. If the harness does not support sub-agent dispatch (or worktree creation fails), parallel-build degrades silently to inline-sequential — recorded but not an error.
For ≤4 AC the orchestrator picks inline even when AC look "parallelSafe". Dispatch overhead is not worth saving 1-2 AC of wall-clock.
When sub-agents help (and when they don't)
Use a sub-agent for:
- Parallel slice dispatch during
parallel-build(cap: 5). - Specialist context isolation for
architect,security-reviewer, integrationreviewerwhen the harness supports it. A fresh sub-agent reads a small focused filebag instead of the orchestrator's full history.
Don't use a sub-agent for:
- Trivial / small / medium slugs (≤4 AC). Run inline.
- Sequential work that doesn't actually parallelize.
- Routine work the orchestrator can finish in 1-2 turns.
Five Failure Modes + review Ralph loop
Reviews check the Five Failure Modes — hallucinated actions, scope creep, cascading errors, context loss, tool misuse — every iteration. The Five Failure Modes pass is wrapped by the review-loop auto-trigger skill so the agent cannot skip it.
Reviews are not single-shot. They are a Ralph loop with an explicit ledger:
- Iteration 1 lists every finding as F-1, F-2, … in an append-only Concern Ledger at the top of
flows/<slug>/review.md. Each row carries severity (block/warn), status (open/closed/superseded), and afile:linecitation. - Iteration N+1 must reread every open row, mark it
closed | open | superseded by F-K, and append new findings as F-(max+1). It cannot delete or rewrite earlier rows. - The loop ends when (a) every row is
closed, (b) two consecutive iterations record zero newblockfindings AND every open row iswarn, or (c) the 5-iteration hard cap fires with at least one open block row — at which point/ccstops and reports instead of looping forever.
A typical run converges in 1-3 iterations. The hard cap is a circuit breaker, not a target.
Conversation language
cclaw replies in the user's language for prose. It NEVER translates wire-protocol identifiers — slugs, AC-N, D-N, F-N, frontmatter keys, file paths, hook output, specialist names, or commit tags. This is enforced by the always-on conversation-language skill so a Russian-speaking user, for example, gets Russian explanations but still sees flow-state.json and AC-1 verbatim.
Hooks (default profile: minimal)
Three hooks ship by default and only commit-helper.mjs is mandatory:
session-start.mjs— rehydrates flow state and prints active slugstop-handoff.mjs— short reminder when stopping mid-flowcommit-helper.mjs— atomic commit per AC + traceability check
CLI commands
cclaw init # install assets in the current project
cclaw sync # reapply assets to match the current code
cclaw upgrade # sync after upgrading the npm package
cclaw uninstall # remove cclaw assets from the project
cclaw version # print version
cclaw help # short helpFlow-control commands (plan, status, ship, migrate, build, review) are intentionally not part of the CLI. They live as /cc instructions inside the harness.
More docs
- docs/v8-vision.md — locked decisions, full kill-list, references review
- docs/scheme-of-work.md — flow walk-through with all checkpoints
- docs/skills.md — six auto-trigger skills and what they enforce
- docs/config.md —
.cclaw/config.yamlreference - docs/harnesses.md — what each harness installs
- docs/quality-gates.md — AC traceability + Five Failure Modes
- docs/migration-v7-to-v8.md — from cclaw 7.x
License
MIT. See LICENSE.