Package Exports
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (cclaw-cli) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
cclaw
A multi-stage planning + review harness for coding agents.
cclaw drops a /cc slash command into Claude Code, Cursor, OpenCode, or Codex. It routes the task, picks the right amount of ceremony, and runs the work through a fixed pipeline: triage → plan → build → qa → review → critic → ship. Each stage emits a slim summary back to the harness and writes a tracked artifact under .cclaw/flows/<slug>/. Sub-agents are isolated; the orchestrator keeps the slug's history.
cclaw installs /cc and /cc-cancel into each harness. Inside /cc, three entry modes cover task work, research, and continuation flows.
Why cclaw
- One pipeline, depth scales. Every task runs
triage → architect → builder → reviewer → critic → ship. Plan-stage depth scales withceremonyMode(lite for soft, rich for strict) instead of branching to a different specialist stack. Five gated specialists insert without changing the spine:investigator(v8.77; before architect whentriage.taskShape == "debug"),plan-critic(between architect and builder on strict + complexity≠trivial + AC≥2),plan-design(after plan-critic on UI / design / frontend / UX surfaces in soft+strict),plan-devex(v8.82; after plan-design on SDK / API / CLI / library surfaces in soft+strict), andqa-runner(before reviewer on UI / web surfaces, ceremonyMode≠inline). - Always-auto, hard stops on failure. The flow runs end-to-end without approval pickers at plan / review / critic gates. Hard failures stop and report with a plain-prose status block; resume with
/cc, discard with/cc-cancel. - Two-model review. A read-only reviewer walks twelve axes; an adversarial critic falsifies what the reviewer cleared. They share no context and write to separate artifacts (
review.md,critic.md). On high-stakes work (security_flag / irreversible D-N) or when invoked with--critic-cross-model, the critic optionally runs a second adversarial pass via a different model (Codex / Gemini via MCP) for an independent second opinion (v8.72). - Right-sized ceremony. Trivial edits run inline (one commit, no plan). Small/medium tasks get a soft-mode plan + a single TDD cycle. Large-risky tasks get a per-slice build with a pre-implementation plan-critic gate.
- Pre-build design audit on UI surfaces. On any flow whose triage detects a UI / design / frontend / UX surface in soft or strict ceremony, the v8.75
plan-designspecialist walksplan.mdagainst a 7-dimension design-quality rubric (visual hierarchy / type system / color / spacing / interaction affordances / accessibility (WCAG AA) / responsive) before the build runs. Below-6 grades becomePD-Nfindings appended to plan.md; severity ≥ medium blocks ship in strict mode. The same rubric the post-build reviewer applies to the rendered diff is applied here to the plan's design commitments — design bets are challenged once at plan-time and once after the diff lands, with the rubric pinned in a single shared TypeScript const so the two surfaces never drift. - Pre-build DevEx audit on SDK / API / CLI / library surfaces (v8.82). On any flow whose triage detects an SDK / API / CLI / library / public-interface surface in soft or strict ceremony, the v8.82
plan-devexspecialist walksplan.mdagainst a 6-dimension developer-experience rubric (Getting Started (TTHW) / API ergonomics / Error messages / Docs / Upgrade path / Measurement) before the build runs. Below-6 grades becomeDX-Nfindings appended to plan.md's## Plan-devex findingssection; severity ≥ medium blocks ship in strict mode. Getting-started escalates one tier (TTHW is load-bearing for first impression); upgrade-path on breaking changes caps at high regardless of mode (ships-a-regression baseline). The rubric is pinned insrc/content/devex-quality-rubric.tsso a future post-build reviewerdevexaxis orresearch-devexlens can consume the same dimensions. plan-devex runs after plan-critic and plan-design when their gates fire — sequential, not parallel — keeping prompt budget manageable. - Parallel by default. Independent slices in a plan run in parallel — N independent slices finish in the time of the longest, not the sum.
- Research as a separate entry point.
/cc research <topic>runs an open-ended discovery dialogue, surfaces an Approaches Gate (Phase 1.5; 2-3 candidate framings of the question, user picks one / many /alldefault), and dispatches research lenses in parallel (engineer / product / architecture / history / skeptic / design — six lenses onstandard+depth when the topic touches UI/UX,--lens=designforce-includes /--lens=-designforce-excludes). Depth tiers (--light/--standard/--deep-product) gate the lens set; lenses dispatch first-class web search via MCP (user-exa,user-context7); synthesis runs a four-scan self-review beforeresearch.mdlands. Optional handoff into a follow-up/cc <task>that consumes it as context. - Continuation flow.
/cc extend <slug> <task>loads a previously-shipped slug'splan.md/build.md/learnings.md(andreview.md/critic.md/qa.mdwhen present) as load-bearing context. - Same runtime, four harnesses. Claude Code, Cursor, OpenCode, and Codex all read the same
.cclaw/install. Each harness gets the same/ccbody plus harness-namespaced ambient rules. - Compound learnings. Non-trivial slugs emit a
learnings.md. Future runs read prior shipped lessons throughknowledge.jsonlbefore authoring a plan; outcome signals (good/unknown/manual-fix/follow-up-bug/reverted) down-weight priors that didn't hold up.
When to use which command
| Intent | Command | What it does |
|---|---|---|
| Execute a task end-to-end (code change) | /cc <task> |
Full flow: triage → plan → build → review → critic → ship |
| Think / brainstorm / research a topic without committing to a task | /cc research <topic> |
Open-ended discovery dialogue + Approaches Gate (2-3 framings) + up to 6 parallel research lenses (design lens fires on UI/UX topics at standard+) + synthesised research.md; optional handoff to /cc <task> |
| Extend a previously-shipped slug with related work | /cc extend <slug> <task> |
New flow with parent's plan/build/learnings loaded as context |
| Cancel the active flow | /cc-cancel |
Discards current .cclaw/flows/<slug>/, frees the orchestrator |
Quickstart
cd /path/to/your/repo
npx cclaw-cli@latest
# Inside your harness:
/cc add caching to the search endpoint
ls .cclaw/flows/20260515-search-caching/
# plan.md build.md review.md critic.md ship.mdThe flow runs end-to-end. cclaw stops only on a hard failure (build broken, reviewer can't converge in 3 fixes, critic block-ship, catastrophic git/dispatch failure). Resume with /cc; discard with /cc-cancel. See Failure handling below.
For CI / scripted installs, use the non-interactive escape hatch:
npx cclaw-cli@latest --non-interactive install --harness=cursorThere is no cclaw plan, cclaw build, or cclaw status. Flow control lives inside /cc.
/cc invocation matrix
| Invocation | Active flow? | Behaviour |
|---|---|---|
/cc (no args) |
yes | Continue the active flow silently. No "resume?" picker. |
/cc (no args) |
no | Error: "No active flow. Start with /cc <task>, /cc research <topic>, or /cc extend <slug> <task>." |
/cc <task> |
yes | Error: "Active flow: <slug> (stage: <stage>). Continue with /cc. Cancel with /cc-cancel." cclaw does NOT auto-cancel or queue. |
/cc <task> |
no | Start a new flow (dispatch triage). |
/cc research <topic> |
yes / no | Same pattern — error when active, start when not. |
/cc research go |
yes (research, in discovery dialogue) | v8.78 force-exit verb: ends the research-mode discovery dialogue immediately regardless of the per-dimension ambiguity score; the orchestrator jumps to the Approaches Gate with whatever context the dialogue has gathered. |
/cc extend <slug> <task> |
yes / no | Same pattern — error when active, start when not. |
/cc-cancel |
yes | Cancel active flow (move artifacts to flows/cancelled/<slug>/, reset state). |
/cc-cancel |
no | Error: "No active flow to cancel." |
Modes
Three top-level entry points share the /cc slash-command surface. The orchestrator picks the mode from the first token after /cc.
/cc <task> — task mode
Runs the full triage → plan → build → review → critic → ship pipeline. The triage sub-agent picks complexity × ceremonyMode × path from heuristics, announces the choice in one line, and dispatches the first specialist. No clarifying questions; no structured ask.
Pin a ceremony level explicitly with mutually-exclusive flags:
/cc --inline <task> # forces inline edit (one commit, no plan)
/cc --soft <task> # forces soft-mode plan → build → review → ship
/cc --strict <task> # forces strict + architect's full Frame → Compose pass + per-slice commits + plan-critic gateWhen the project has no .git/, the router auto-downgrades strict → soft even with --strict (per-slice commits need a SHA chain to be useful).
Classification work — surface detection, assumption capture, prior-learnings lookup, interpretation forks — lives inside the architect's Bootstrap + Frame phases on strict + soft, nothing on inline.
Debug-shape routing (v8.77 + v8.81)
Debug-branch routing (v8.77; triage.taskShape == "debug"). When the task is a bug report — bug-shape keyword (regression / error / broken / failing / wrong / incorrect / slow / crash / bug / fix paired with bug intent) PLUS repo-anchored evidence (file:line, commit SHA, log excerpt, stack trace, test name with failure verb) — the triage sub-agent emits Task shape: debug and the orchestrator inserts a new investigator specialist hop BEFORE the architect. The investigator is a read-only diagnostic sub-agent that fans out three parallel hypothesis lanes — cause-code (regression bisect / dependency analysis), cause-config (env / feature-flag / version drift), cause-measurement (instrumentation gap / flakiness / log gap) — collects evidence per lane (file:line / command output / log excerpt / commit SHA / config snippet — the five canonical evidence shapes), synthesises a single working root-cause hypothesis, and emits a slim summary whose Next step: line drives the post-investigator routing:
Next step: |
Action |
|---|---|
direct-fix |
Skip architect entirely. Dispatch builder with priorInvestigation envelope; builder reads investigation.md as plan-substitute; RED-before-GREEN against the cited symptom; fix(<scope>): commit prefix; bounded to the investigation's ## Fix scope file:line refs. |
needs-plan |
Dispatch architect with priorInvestigation envelope; architect's Frame phase copies the cited root cause verbatim ("given root cause X, frame the fix at design level"). plan-critic / plan-design gates fire as normal afterwards; full plan → build → review → critic → ship path. |
more-investigation |
Re-dispatch investigator with iteration 1 (capped at 1; second more-investigation triggers stop-and-report). |
not-a-bug |
Stop-and-report. The investigator's ## Next step recommendation paragraph (cited spec / docs / test that proves the symptom is intended behaviour) surfaces to the user verbatim. |
Orthogonality. taskShape is independent of complexity — a debug task can be any complexity tier. The investigator hop inserts BEFORE architect regardless of complexity / ceremonyMode (the existing complexity classifier is NOT modified). The investigator writes investigation.md to the same flow dir as plan.md; on needs-plan both artifacts coexist; on direct-fix only investigation.md exists. Pre-v8.77 flows lack the taskShape field; the validator accepts absent values as build (default), so legacy flows run the pre-v8.77 path verbatim. Full procedure (gate, dispatch envelope, verdict matrix, iteration cap, priorInvestigation propagation, builder direct-fix protocol, architect priorInvestigation read protocol, legacy migration, anti-rationalization) lives in runbooks/debug-branch.md. Reference patterns: obra-superpowers deep-dive (3 parallel trace lanes), everyinc-compound ce-debug (code-path / config / measurement partition with synthesis), gstack /investigate (hypothesis-before-probe + verdict-driven routing).
Investigator v2 (v8.81; three conditional disciplines layered onto v8.77). The v8.77 three-lane discipline is unchanged; v8.81 layers three additional sections onto investigation.md to harden the investigator's diagnostic reach:
- Phase 0.5 — Assumption audit (ALWAYS runs; BEFORE the three lanes). Investigator catalogues the "this must be true" beliefs the symptom rests on — framework behaves as expected here, function returns what its name implies, config loads before this runs, caller passes a non-null value, database is in the state the test implies, error message points at the actual failure, symptom description itself is correct. Each row marked
verified(cited file:line / command output / commit SHA / config snippet) ORassumed(with a one-line probe command to run in Phase 1). When the audit's probe output unambiguously proves the symptom is misread (e.g. reported "endpoint returns 500" but actual response is 200), the investigator may short-circuit toNext step: not-a-bugwith the audit row as the reframe evidence. The audit section is written even when short-circuiting — silent skips are forbidden. Borrowed from everyinc-compoundce-debugPhase 2 + obra-superpowerssystematic-debuggingPhase 1. - Phase 4 — Defense-in-depth tier (CONDITIONAL; fires on recurring or catastrophic patterns). Fires when either signal is true: (a) the root-cause pattern appears in ≥3 OTHER files (verified via a literal
rgcount probe — strict, not approximate), OR (b) the symptom is catastrophic-if-prod (data loss / security breach / payment failure / data integrity). When fired, the investigator writes a## Defense-in-depth (4 layers)section naming each layer'sWhat:/Where:/How it catches the class:: Layer 1 Entry validation (reject obviously invalid input at the API boundary) / Layer 2 Invariant check (enforce that data makes sense for THIS operation) / Layer 3 Environment guard (refuse dangerous operations in contexts where they make no sense) / Layer 4 Diagnostic breadcrumb (capture forensic context before the risky operation; rarely truly n/a — the breadcrumb earns its keep for the NEXT bug). The slim summary gains aDefense-in-depth: yesline; the orchestrator copies it onto the builder dispatch envelope asdefense-in-depth: yes(persisted onflow-state.json > builderEnvelope.defenseInDepth; absent defaults tono); whenyesthe builder implements all named (non-n/a) layers as part of the root-cause fix commit (NOT a follow-up commit). Borrowed fromeveryinc-compound/ce-debug/references/defense-in-depth.md+obra-superpowers/systematic-debugging/defense-in-depth.md. - Phase 5 — Post-mortem (CONDITIONAL; fires on prod-discovered symptoms). Fires when the symptom source includes a production / live / users-reported / incident keyword in the original bug report — canonical vocabulary:
production/prod/live/shipped/deployed/users reported/customer reported/incident/outage/P0/P1/SEV-1/SEV-2/pager/paged/rollback/hotfix/emergency. When fired, the investigator writes a## Post-mortemsection covering four questions: How was this introduced? (commit SHA fromgit log/git blame, author, date — evidence-only, no motive speculation) / How did this survive review? (cite the introducing commit'sreview.md/critic.mdpaths + which axis flagged or missed it) / What review axis would have caught it? (exactly ONE axis from the reviewer's 14-axis surface + the specific finding text the axis should have produced) / Prevent-recurrence (one specific testable reviewer-axis check to add, framed as "the<axis>axis MUST scan for<pattern>when<gate>"). The post-mortem is advisory — it surfaces for human pattern-recognition and does NOT change the orchestrator's routing decision (theNext step:verdict is still chosen per the canonical v8.77 rubric).
The three v8.81 sections are additive — the v8.77 three-lane discipline, four-verdict vocabulary, slim-summary shape, and back-compat with pre-v8.77 state files are all preserved. Pre-v8.81 state files lack builderEnvelope.defenseInDepth; the validator defaults to "no" on absent. References: everyinc-compound/plugins/compound-engineering/skills/ce-debug/SKILL.md lines 104-130 (assumption audit + 4-layer defense-in-depth + conditional post-mortem) + obra-superpowers/skills/systematic-debugging/SKILL.md Phase 1 (assumption audit pattern) + obra-superpowers/skills/systematic-debugging/defense-in-depth.md (the four layers).
Anti-slop graded reviewer axis (v8.86). The fourteenth reviewer axis, anti-slop, is the cclaw projection of Andrej Karpathy's Simplicity First principle (forrestchang/andrej-karpathy-skills > CLAUDE.md): "Minimum code that solves the problem. Nothing speculative. No features beyond what was asked. No abstractions for single-use code. No 'flexibility' or 'configurability' that wasn't requested." Karpathy's litmus test — "Would a senior engineer say this is overcomplicated? If yes, simplify." — is operationalised as a four-dimension rubric the reviewer grades 0-10 on every diff: senior-test (would a senior engineer call this overcomplicated?), speculative-flexibility (extension points / config layers / abstractions added without a current consumer), single-use-abstraction (helpers used exactly once but parameterized as if they had ≥2 callers), orphan-cleanup-discipline (orphans the diff created not removed; pre-existing dead code drive-by-deleted). Below-6 grades become AS-N: <dimension> at <grade>: <description> findings (severity ladder: 5/10 → consider; 3-4/10 → required; ≤2/10 → required with one-tier escalation to critical on critical-complexity slugs); ≥ medium blocks ship in strict. The axis is default-on: the orchestrator stamps walkAntiSlopAxis: true on every reviewer dispatch unless the user / project config explicitly disables it via walkAntiSlopAxis: false. Unlike the surface-driven gated axes (qa-evidence / design-quality / scope-drift / assumption-coverage), anti-slop fires once per slug regardless of triage surface — Karpathy simplicity is a per-diff aesthetic, not a per-surface concern. The rubric lives in a new shared TypeScript const (src/content/anti-slop-rubric.ts) mirroring the v8.75 design-quality and v8.82 devex-quality rubric shape; the full grading protocol, severity ladder, finding shape, and anti-rationalizations live in the companion skill reviewer-axis-anti-slop.md (~13k chars). reviewer.ts retains only a 5-line stub naming the skill — same v8.83 companion-skill pattern as the other gated reviewer axes. The axis sits alongside the existing complexity-budget axis but is distinct: complexity-budget asks "is this change pulling its weight?" (per-AC ROI), anti-slop asks "is the shape of this change Karpathy-simple?" (per-diff aesthetic). A clean diff that earns full marks on complexity-budget can still fail anti-slop (the implementation overshoots — extension points, single-use abstractions, leftover scaffolding); the two axes catch different failure modes and the orchestrator never collapses them.
Assumption-validation lite (v8.85). v8.80 introduced ## Key assumptions to validate as a first-class plan-template section, but the loop was open-ended: rows landed at Status: unvalidated and nothing in the post-build pipeline closed them. v8.85 closes the loop with three additive moves. First, every bullet now leads with a stable KA-N id (Key Assumption N — KA-1, KA-2, ..., monotonically numbered) so the builder, reviewer, and ship template can cross-reference rows by id; legacy bullets without ids silently disable the closure but the plan-critic's new key-assumptions-no-id class surfaces the gap. Second, the builder's verify(AC-N): passing commits MAY carry an optional validates: KA-N [KA-M ...] payload in the commit message body when the AC's verification evidence ALSO proves the matching KA bullet's bet; the post-build flow-state validator (src/assumption-validation.ts) scans the build range's verify commits, extracts the payload, and rewrites matching rows to Status: validated by <sha> (idempotent; first validation wins; unknown ids silently dropped; manual flips to validated / invalidated are never overridden). Third, the reviewer's new gated assumption-coverage axis (the thirteenth axis; gated on walkAssumptionCoverageAxis: true — set when plan.md > ## Key assumptions to validate carries ≥1 KA-N-shaped bullet) walks each KA-N row and files KA-N: not validated by any commit despite high-stakes label (severity=required on rows carrying the (high-stakes) label after the assumption clause; severity=consider otherwise). The axis ALSO catches false-positive payloads (validates: KA-N on a verify commit whose diff doesn't touch the validation method's anchor; severity=required) and unknown-id payloads (validates: KA-99 against a 3-row section; severity=consider). The ship template grows a new ## Unvalidated assumptions section that lists every KA-N row whose status is still unvalidated at ship time, surfacing known-unmeasured bets so the user signs off knowingly rather than silently. Reference pattern: gstack /devex-review boomerang (a verify commit can "validate" an earlier assumption row, flipping status atomically). Legacy pre-v8.80 plans without the section skip the gate; inline ceremonies skip structurally.
Not-doing and key assumptions to validate (v8.80). Every plan that ships excludes something, and every plan rests on bets — but pre-v8.80 cclaw plans had no first-class home for either. v8.80 promotes two new sections to first-class status in both plan.md and research.md: ## Not Doing (and why) (3-5 bullets, each pairing a scope exclusion with a one-sentence rationale) and ## Key assumptions to validate (2-5 bullets, each pairing a bet — latency budget, user behaviour, market state, downstream-system behaviour — with a validation method and a unvalidated | validated | invalidated status). The architect's new Phase 7.5 (Bets and exclusions) populates both sections after Decisions land; the plan-critic's new §6.5 Bets and exclusions audit gates that both are present and non-empty before build dispatches (block-ship on strict; iterate on per-bullet rationale / validation-method / status gaps). The two sections are distinct from ## Assumptions (correct me now) (v8.67 surface-area inferences — which library X over Y); the v8.80 sections capture bets-that-need-validation and exclusions-with-rationale, which are different concepts at different lenses. Research-mode synthesis populates both sections too, so the follow-up /cc <task> flow's architect can carry the bets and exclusions forward as load-bearing context. The v8.85 follow-up adds stable KA-N ids on every bullet, a validates: KA-N builder-commit payload, and a new reviewer axis (assumption-coverage) that gates high-stakes bets must have a closing commit before ship (see above). Reference patterns: addyosmani idea-refine (lines 113-135 — "Not Doing" and "Key Assumptions to Validate" called out as the most valuable two sections) and everyinc-compound ce-brainstorm (Phase 3 — Deferred for later / Outside this product's identity / Not Doing). Legacy pre-v8.80 plans silently pass the audit (detected via absence of both sections); inline ceremony has no plan.md, so the audit skips structurally.
One-way door gate (v8.79). After the architect's slim summary returns AND before plan-critic dispatch, the orchestrator scans the freshly-written plan.md > ## Decisions block for any D-N marked Reversibility: one-way (v8.74 vocabulary — data migration / public-API removal / schema rewrite / destructive auth / payment-side commit). When ≥1 hit is found, the orchestrator surfaces a structured user-facing pause with three options: confirm (proceed to plan-critic + build), edit (open plan.md so you can soften reversibility or split the decision), cancel (abort the flow). The ask payload lists every one-way D-N's title + rationale + the User Sovereignty principle that motivates the pause. The gate fires ONLY when the scan returns ≥1 hit — two-way / mostly-two-way plans pass through silently (0 hits = no gate, no pause); soft-ceremony plans without a Decisions section silently pass too; inline (lite) ceremony skips the gate structurally (the path is just ["build"], no plan stage). Architect's slim-summary Recommended next field returns the new awaiting-one-way-confirmation value when one-way D-Ns are present (replacing build / plan-critic); the orchestrator's hard-gate logic routes that signal through the gate. Flow-state transition: architect-complete → awaiting-one-way-confirmation → (plan-critic | architect-revision | aborted); the user's choice + decision IDs + timestamp persist on flow-state.json > oneWayDoorConfirmation. Complementary to the v8.74 cross-model critic — both fire on the same condition (Reversibility: one-way), but the v8.79 gate is the human-in-the-loop counterpart firing before build burns context, and the v8.74 cross-model critic is the model-in-the-loop counterpart firing after build lands. Reference patterns: gstack ETHOS.md > User Sovereignty (the principle is already in cclaw's v8.74 ethos preamble; v8.79 wires the user-facing surface that puts the principle into practice); Bezos's one-way / two-way door framing (cited in src/types.ts > Reversibility).
Iterative-clarify per-dimension scoring (v8.78). Both architect Phase −1 Clarify and research-mode Phase 1 discovery dialogue now re-score 4 orthogonal dimensions after every user answer: goal (weight 0.4), constraints (0.3), criteria (0.3), context (0.0 — informational, not gating). The scalar ambiguity = 1 - (goal*0.4 + constraints*0.3 + criteria*0.3 + context*0.0) drives a math-gated exit at ambiguity < 0.25; the weakest dimension targets the next question; the per-round table (Score / Weight / Why / Next target) surfaces to the user before each follow-up so the dialogue's progress is visible. Round caps: 5 (architect Clarify) / 8 (research-mode discovery). Challenge-mode rotation: round 4 = Contrarian ("what if the opposite were true?"); round 5 = Simplifier ("what's the simplest version that still ships value?"). Earlier exit (math-gated or in-prose "ready" signal or /cc research go force-exit) skips the rotation. Per-round audit trail lands on flow-state.json > clarifyRounds[] (append-only; pre-v8.78 state files lack the field and validate unchanged). Reference patterns: oh-my-claudecode deep-interview (mathematical scoring + challenge-mode rotation); everyinc-compound ce-brainstorm Phase 1.2 gap lenses (specificity / evidence / counterfactual / attachment — re-projected as the four canonical dimensions).
/cc research <topic> — research mode
A separate entry point for pre-task uncertainty: brainstorming, scope exploration, architecture comparison. Runs as a main-context orchestrator in four phases. Output: .cclaw/flows/<slug>/research.md. No build / review / critic / ship.
/cc research storage strategy for shared agent memory
/cc --research auth library trade-offs # equivalent
/cc research --light is fastify still maintained # 2 lenses, fast clarification
/cc research --deep-product should we replace our calendar # 5-6 lenses + extra probes (design lens on UI topics)
/cc research --lens=design rework the settings modal # force-include design lens
/cc research --lens=-design pure backend cache substrate # force-exclude design lensDepth tiers (v8.69). The orchestrator picks one of three tiers at the Detect-hop research-mode fork — either explicitly via --light / --standard / --deep-product or auto-classified from topic wording (clarification → light; technical exploration → standard; greenfield / pivot wording → deep-product). Mutually-exclusive flags collapse last-wins.
| Depth | Lenses | Extra probes | When to use |
|---|---|---|---|
--light |
research-engineer + research-skeptic (2) |
none | Clarification — "is X still maintained?", "which library does Y?", "what does the team currently use?" |
--standard (default) |
all 5 lenses + research-design when topic touches UI/UX (6 lenses total) |
none | Technical exploration — "evaluate Redis vs in-memory cache", "should we move auth to JWT?" |
--deep-product |
all 5 lenses + research-design when topic touches UI/UX |
Thesis + Adjacent product (product), Durability (skeptic), Adjacent design surfaces (design), Founder mode — premise challenge (right problem? actual outcome? what if we did nothing? inversion) + strategic consequences (trajectory / identity / adoption / opportunity cost / compounding) + 10-star reframing (product) |
Greenfield / pivot — "should we build...", "what if we replace...", "evaluate switching from..." |
Phase 1 — open-ended discovery dialogue. The orchestrator opens with "Hi. What are you researching? Tell me what you know and what you don't." and runs an uncapped dialogue (no fixed question budget; no auto-advance). You refine the topic, name constraints, surface prior attempts, name stakeholders, mark scope edges. The orchestrator proceeds only when you signal ready / go ahead / finalize.
Phase 1.5 — Approaches Gate (v8.76). Between the discovery dialogue and lens dispatch, the orchestrator distils 2-3 candidate framings of the research question (NOT 2-3 implementation candidates — each framing changes WHICH dimensions every lens emphasises). You pick a single framing (A), multiple (A B), substring-match a framing title, or accept all / every / silent default (every framing flows to every lens — the canonical broad-coverage case). Selected framings carry forward into every lens envelope as a new Framing: field. Reframings mid-research route through the existing /cc research push-back machinery. Reference patterns: obra-superpowers brainstorming Phase 2-3, addyosmani idea-refine Phase 1.3 Cluster + Stress-test.
Phase 2 — parallel lens dispatch. The orchestrator distils the dialogue into a 5-15 bullet summary, stamps the selected framings, and dispatches the depth-determined lens subset in parallel:
| Lens | What it covers |
|---|---|
research-engineer |
Technical feasibility, stack fit, blockers, implementation paths, risks, effort estimate |
research-product |
User / product value, who benefits, alternatives (always including "do nothing"), market context. Deep-product depth folds in Thesis + Adjacent product probes plus Founder mode — premise challenge + strategic consequences + 10-star reframing |
research-architecture |
System fit, surface impact, coupling, boundaries, scalability, in-repo precedents |
research-history |
Prior attempts via .cclaw/knowledge.jsonl + git log; lessons; outcome signals |
research-skeptic |
Failure modes, edge cases, abuse cases, hidden costs, don't-proceed triggers. Deep-product depth folds in Durability probe |
research-design (v8.76; standard+ only) |
Seven-dimension design-quality rubric (shared with v8.75 plan-design + v8.70 reviewer.design-quality): grades each dimension for relevance (load-bearing / relevant / tangential / out-of-scope), surfaces existing patterns to study, anti-patterns to avoid, open design questions for the follow-up architect. Fires when the topic touches UI / UX / design / frontend / accessibility / a11y (or the user passes --lens=design); suppressed by --lens=-design; never on --light depth. Deep-product depth folds in Adjacent design surfaces probe |
Engineer + architecture lenses may dispatch the repo-research helper for brownfield context. Engineer / product / architecture / skeptic / design lenses dispatch first-class web search via MCP (v8.69 for the first four; v8.76 for design) — user-exa for general web search, user-context7 for library / framework / API docs — and cite hits inline in their ### Sources block. The design lens treats web search as default-on (the design space — shadcn, Radix, Material 3 — is high-churn). When no MCP tool is wired, lenses fall back to training knowledge with a Notes: tag stamping the fallback. History lens is memory-only (web search is out of scope). Lenses run independently — no lens cites or chains into another.
Phase 3 — synthesis + self-review. The orchestrator pastes each lens's findings verbatim into the matching ## <Lens> lens section of research.md, then runs a cross-lens synthesis covering convergence (where 2+ lenses agree), divergence, the trade-off space, and confidence + coverage gaps. Before research.md is written to disk, the orchestrator walks the draft through a four-scan self-review (placeholder / contradiction / scope drift / ambiguity) and fixes findings inline; cleanups land in ## Synthesis > ### Self-review notes. It then writes a recommended next step: exactly one of plan with /cc <task>, more research needed (specific area), or don't proceed (skeptic blocked: <reason>).
Phase 3.5 — awaiting user review (v8.71). After Phase 3 lands research.md, the orchestrator stamps flow-state.json > researchState: "awaiting-user-review" and surfaces three follow-ups in plain prose. The user picks one:
/cc research revise <area>— re-dispatches the lens(es) covering<area>(engineer/product/architecture/history/skeptic/design/synthesis/all), re-runs synthesis + self-review, rewritesresearch.md. Cycles back toawaiting-user-reviewso the user can iterate again./cc research push-back <claim>— re-dispatchesresearch-skepticplus the lens that authored the cited claim. The skeptic surfaces counter-arguments under### Counter-arguments to <claim>; the authoring lens reaffirms / qualifies / retracts. Synthesis re-runs focused on the divergence paragraph. Same cycle-back./cc research accept— terminal. Appends anacceptrow to## Revision history, runs Phase 4 finalize, emits the handoff.
Each revise / push-back / accept appends a row to research.md > ## Revision history (and a matching entry to flow-state.json > revisions[]). The table is the append-only audit trail — five columns: timestamp, kind, area-or-claim, lenses re-dispatched, change. Lifecycle states stamped at every Phase boundary: discovery → lens-dispatch → synthesis → awaiting-user-review ⇄ revising → accepted. Reference patterns: obra-superpowers brainstorming User Review Gate, addyosmani idea-refine divergent-then-converge, everyinc-compound ce-brainstorm Phase 2.5 confirmation gate. Full procedure in runbooks/research-revision.md.
Phase 4 — finalize. Fires only after /cc research accept. git mv the artifact into .cclaw/flows/shipped/<slug>/research.md and emit a plain-prose handoff. The next /cc <task> invocation on the same project reads flow-state.json > priorResearch and consumes the most-recent shipped research (including the ## Revision history block) as input to its plan stage. The handoff is optional — if research finalises and you never run a follow-up /cc, nothing else fires.
Research mode skips the router entirely — no triage gate, no complexity / ceremonyMode heuristic. The six lenses live in src/content/research-lenses/, install to .cclaw/lib/research-lenses/<lens>.md, and are NOT in the core SPECIALISTS array.
/cc extend <slug> <task> — continuation mode
Initialises a new flow that explicitly extends a previously-shipped slug. The orchestrator loads the parent's plan.md, build.md, learnings.md, and (when present) review.md / critic.md / qa.md as flowState.parentContext and surfaces them to architect / reviewer / critic as load-bearing context. Things already settled by the parent are not re-decided.
/cc extend 20260514-auth-flow add SAML login # canonical
/cc extend 20260514-auth-flow --strict refactor session store # ceremony override wins over inheritance
/cc extend 20260514-cli-help fix typo in --help # inheritanceThe orchestrator runs the same pipeline as /cc <task>; the only difference is at init:
- Parent validation.
loadParentContextconfirms the slug is shipped + has a non-emptyplan.md. Four failure modes are explicit:in-flight,cancelled,corrupted,missing. Each surfaces a one-line error and ends the turn. - State stamp.
flow-state.json > parentContextcarries the parent's slug + status + shippedAt + structured artifact paths. Plan.md frontmatter carriesparent_slug: <parent>andrefines: <parent>. - Triage inheritance.
ceremonyMode/surfacesdefault to the parent's values. Explicit--strict/--soft/--inlineflags override. A security-keyword heuristic (security/auth/migration/schema/payment/gdpr/pci) auto-escalates a soft/inline parent → strict for the new flow. - Specialist consumption. The architect's Bootstrap reads the parent's
## Spec/## Decisions/## Selected Directionand authors a mandatory## Extendssection at the top ofplan.md. The reviewer runs a parent-contradictions cross-check (silent reversals of a parent decision arerequiredfindings). The critic adds a skeptic question on parent decision contradictions. - Knowledge-store integration. When
parentContextis set,findNearKnowledgeprepends the parent'sknowledge.jsonlentry to the top of the prior-learnings result (load-bearing context overrides Jaccard ranking).
Only the immediate parent is auto-loaded. Specialists may use findRefiningChain on demand when transitive context is needed.
flowchart LR
A[/cc input/] -->|"research <topic>" or --research| R[Research mode]
A -->|"extend <slug> <task>"| E[Extend mode]
A -->|"<task>"| T[Task mode]
R --> RD[Open-ended discovery dialogue]
RD --> RA[Approaches Gate: pick framings]
RA --> RL[Dispatch up to 6 lenses in parallel]
RL --> RM["research.md (synthesis + recommendation)"]
RM --> H{Handoff?}
H -->|"accept research"| END[Finalize]
H -->|next /cc| T
E --> LC[loadParentContext<br/>shipped + plan.md required]
LC -->|ok| EI[Stamp parentContext<br/>+ refines + parent_slug]
EI --> EHE[Triage inheritance<br/>ceremonyMode/surfaces]
EHE --> T
LC -->|in-flight/cancelled<br/>/missing/corrupted| EERR[Surface error<br/>end turn]
T --> RT[Triage sub-agent<br/>complexity × ceremonyMode × path]
RT -->|inline| INL[Build inline]
RT -->|soft| PL[architect Bootstrap+Frame]
RT -->|strict| DS[architect full ceremony]
PL --> BD[plan → build → review → critic → ship]
DS --> BDWorked example
You type:
/cc add caching to the search endpointThe orchestrator runs through these stages, chaining automatically. Slim-summary blocks land under ## Triage, ## Plan, ## Build, ## QA, ## Review, ## Critic, ## Ship headers in chat. Artifacts land on disk.
- Triage. The orchestrator dispatches
triagewith the raw/ccargument. Triage returns an 8-field decision in one slim summary — complexity: small-medium · ceremony mode: soft · path: plan → build → review → critic → ship · runMode: auto · mode: task · taskShape: build · designSurface: false · devexSurface: false — plus anambiguity_score: <0-100>line computed from four signals (vague verbs, missing AC, multiple interpretations, no concrete names). Slug:20260515-search-caching. Zero clarifying asks at the triage Hop itself. Persisted toflow-state.json > triage; immutable for the slug. - Plan. The
architectopens a Clarify phase (v8.67) whentriage.ambiguityScorecrossesconfig.clarify.ambiguity_threshold(default 60) ANDceremonyMode != "inline"— one question per turn, max 5, early-exit ongo/ready/proceed. Below threshold or on inline path: skipped silently. Either way, the architect writesplan.mdopening with a mandatory## Assumptions (correct me now)block — Clarify answers are bare bullets, architect-silent inferences carry the(architect inference)tag. After plan.md lands, the orchestrator emits a one-line ack-window prose pointing the user at that section by name (/ccto continue with assumptions as-is,/cc-cancelto discard, or edit the plan in place). The rest of the plan body — Spec, Frame, Plan / Slices (SL-N work units — how we build), Acceptance Criteria (AC-N verification rows back-referencing slices), Edge cases, Topology, Feasibility, Traceability — is unchanged from pre-v8.67. 4 slices, 3 AC, 2 prior lessons surfaced vialearnings-research. Confidence: high. - Build.
builderruns one TDD cycle per slice: RED → GREEN → REFACTOR. Each work commit carries anSL-Nprefix (red(SL-1):/green(SL-1):/refactor(SL-1):) the reviewer reads viagit log --grep="(SL-N):". Slices are dispatched in topological layers with independent slices running in parallel by default; a task with N independent slices finishes in the time of the longest slice, not Σ. Single-slice layers run inline (zero overhead); plan-critic verifies independence claims against surface overlap so parallel sub-builders never race. On strict mode the builder runs a two-stage per-slice review (spec-compliance first, then code-quality) inside its own context after each slice — issues surface where the diff is one slice wide, not where the chain has compounded. Two-attempt cap per stage; on persistent failure the slice's status isBLOCKED. The builder emits a structuredStatus:(DONE/DONE_WITH_CONCERNS/NEEDS_CONTEXT/BLOCKED) the orchestrator routes deterministically —DONE_WITH_CONCERNSlogs tobuild.md > ## Concerns;NEEDS_CONTEXT/BLOCKEDstop and report. After all slices land,builderwrites oneverify(AC-N): passingcommit per AC (empty diff when slice tests already cover the AC; test-files-only diff when the AC needs broader verification — perf budget, integration, contract). Tests: 14 passing (was 11). Coverage delta: +2.3%. Build failures trigger an auto-fix loop (up to 3 iterations); failure after 3 stops and reports. - Review. Fourteen-axis reviewer opens 2 findings on the first iteration: cache-key collision on case-sensitive queries (
correctness,required) and missing TTL refresh on stale entries (architecture,consider). Reviewercritical/required-no-fixtriggers an auto-dispatch fix-only loop (up to 3 iterations); failure after 3 stops and reports. Onsecurity_flag: trueslugs the reviewer walks the threat-model checklist verbatim in thesecurityaxis. - Critic. Adversarial falsificationist pass — predictions, gap analysis, Criterion check across AC + Edge cases + NFR rows, goal-backward verification, realist check. Verdict: pass.
block-shipstops immediately (no auto-iteration — re-running on unchanged code returns the same verdict). - Ship. All 3 AC committed.
ship.mdcarries the release-notes draft and the AC↔commit map. Chains automatically to push.
After ship, the orchestrator moves the artifacts to .cclaw/flows/shipped/<slug>/ and (when the slug earned capture) appends one row to .cclaw/state/knowledge.jsonl.
Failure handling
cclaw stops at hard failures per a fixed matrix. The recovery loop is always the same: read the status block, decide, type /cc (continue) or /cc-cancel (discard). No in-chat picker, no [y/n] ask, no "approve this?" gate.
| Failure | Behaviour |
|---|---|
| Build failure | Auto-fix loop, up to 3 iterations. After 3 unresolved → stop and report. |
Reviewer critical / required-no-fix |
Auto-dispatch fix-only loop, up to 3 iterations. After 3 unresolved → stop and report. |
Critic block-ship |
Stop immediately and report. No auto-iteration. |
| Catastrophic (git op fail, dispatch fail, missing tool) | Stop and report. |
Confidence: low from any specialist |
Stop and report. The specialist's Notes: line is surfaced verbatim. |
v8.79 One-way Door Gate (≥1 D-N marked Reversibility: one-way in plan.md after architect) |
Stop and surface a structured user-facing pause: confirm (proceed to plan-critic + build) / edit (open plan.md so you can soften reversibility or split the decision) / cancel (abort the flow). User decision persists to flow-state.json > oneWayDoorConfirmation. |
The status block names the stage, the reason, and the recovery options. Example:
Stopped at review (iteration 3). Reason: reviewer returned 2 critical findings after fix-only loop hit the 3-iteration cap.
To proceed: /cc to continue (continues from the saved state), or /cc-cancel to discard the slug.What you get
cclaw ships a single-source-of-truth ethos preamble at .cclaw/lib/cclaw-ethos.md — five cross-cutting principles (Boil the Lake: gather evidence before deciding; Search Before Building: check existing patterns first; Surgical Edits: smallest diff that delivers; User Sovereignty: user decisions overrule defaults; Three knowledge layers: Layer 1 tried-and-true, Layer 2 popular, Layer 3 first-principles — question Layer 1/2 when stakes warrant). The preamble is auto-prepended to every specialist dispatch envelope as the Required ethos read, one position above the agent contract. Specialist contracts refine HOW each principle applies to their stage; they do not restate the principles themselves. When a specialist's local rule appears to contradict an ethos principle, the ethos wins and the local rule is the bug.
Strict-mode plans classify every D-N decision by reversibility — one of one-way (irreversible-or-effectively-so: data migration, public-API removal, schema rewrite, destructive auth/cryptography, payment commit), two-way (cheaply reversible: feature flag, internal-API behind compat shim, behaviour tweak behind kill switch), or mostly-two-way (the middle ground: schema column add, new dependency, UI surface shipped to users). The classification is borrowed from Bezos's one-way/two-way door framing. plan-critic §A blocks ship on a missing field; the cross-model critic's §3.5 second opinion auto-fires on any one-way decision regardless of triage.securityFlag (keyword detection on Blast-radius prose is preserved as a fallback for plans with no ## Decisions section). Both critic (adversarial mode) and plan-critic open with an explicit force-stance clause that flips the default cognitive posture from balanced review to "find disqualifying evidence first".
| Surface | Count + detail |
|---|---|
| Specialists | 10 sub-agents: triage (routing dispatch at Hop 2 of every fresh /cc <task>; emits an 8-field slim summary the orchestrator parses — adds taskShape, designSurface, devexSurface over the pre-v8.77 5-field shape; detects design surface from ui/design/frontend/ux keywords for the reviewer's design-quality gate AND the v8.75 plan-design pre-build gate), investigator (v8.77 — debug-branch read-only diagnostic dispatched BEFORE architect when triage.taskShape == "debug"; runs five phases — Phase 0.5 assumption audit + three parallel hypothesis lanes (cause-code / cause-config / cause-measurement) + Phase 4 conditional defense-in-depth tier on recurring or catastrophic patterns + Phase 5 conditional post-mortem on prod-discovered symptoms; writes investigation.md; emits a Next step: verdict — direct-fix (skip architect, dispatch builder with priorInvestigation envelope) / needs-plan (dispatch architect with priorInvestigation envelope) / more-investigation (iteration cap 1) / not-a-bug (stop-and-report). Capped at 2 investigator dispatches per slug), architect (the only plan-stage authoring specialist; runs as a single on-demand dispatch on every non-inline path, covers Bootstrap → Frame → Approaches → Decisions → Pre-mortem → Compose silently; absorbs classification work — assumption capture, surface detection, prior-learnings dispatch, interpretation forks; authors both ## Plan / Slices and ## Acceptance Criteria tables), builder (per-slice RED → GREEN → REFACTOR cycles on strict with <type>(SL-N): prefixes plus one verify(AC-N): passing commit per AC after slices land; single-cycle on soft), plan-critic (pre-implementation gate on strict + complexity≠trivial + AC≥2; checks slice-AC separation, slice quality, AC verifiability, coverage gaps), plan-design (v8.75 — pre-implementation design-coherence pass on UI / design / frontend / UX surfaces in soft + strict ceremony; walks plan.md against the same 7-dimension rubric the reviewer's design-quality axis applies post-build; below-6 grades become PD-N findings appended to plan.md's ## Plan-design findings section; severity ≥ medium blocks ship in strict mode — design bets are challenged before the build runs, not after), plan-devex (v8.82 — pre-implementation DevEx audit on SDK / API / CLI / library surfaces in soft + strict ceremony; walks plan.md against a 6-dimension rubric (Getting Started / API ergonomics / Error messages / Docs / Upgrade path / Measurement); below-6 grades become DX-N findings appended to plan.md's ## Plan-devex findings section; severity ≥ medium blocks ship in strict mode; runs sequentially AFTER plan-critic and plan-design when their gates fire), qa-runner (UI/web surfaces, ceremonyMode≠inline), reviewer (fourteen-axis review with both slice + AC traceability chains on strict; walks the threat-model checklist verbatim in the security axis on security_flag: true; activates the gated design-quality axis on UI / design / frontend / ux surfaces — grades the same 7 design dimensions 0-10 against the rendered diff with explicit "what a 10 looks like" references and runs the AI-slop check; cross-references plan.md's open PD-N rows when plan-design ran), critic (post-implementation adversarial pass with slice + AC coverage check). Each runs in isolation with a mandatory contract read. The seven-dimension design-quality rubric lives in src/content/design-quality-rubric.ts as a single source of truth — both plan-design (pre-build) and reviewer.design-quality (post-build) embed the same rendered markdown so the two surfaces never drift. |
| Research helpers | repo-research (brownfield scan) and learnings-research (prior shipped lessons) dispatched in parallel before every plan. |
| Research lenses | 6 research-only sub-agents dispatched in parallel by the main-context research orchestrator on /cc research <topic> after the open-ended discovery dialogue + Approaches Gate (v8.76) complete: research-engineer (feasibility + paths + risks), research-product (user value + alternatives + market context), research-architecture (system fit + coupling + boundaries + scalability), research-history (prior attempts via knowledge.jsonl + git log; outcome signals), research-skeptic (failure modes + edge cases + abuse cases + hidden costs), research-design (v8.76 — seven-dimension design-quality rubric shared with plan-design + reviewer.design-quality; activates on standard+ depth when the topic touches UI/UX, force-toggle via --lens=design / --lens=-design). NOT in SPECIALISTS; install to .cclaw/lib/research-lenses/<lens>.md. |
| Ceremony modes | strict (per-slice RED → GREEN → REFACTOR with <type>(SL-N): prefixes + per-AC verify(AC-N): passing commits — dual-chain reviewer cross-check), soft (single feature-level TDD cycle, plain commit), inline (one commit, no plan). Triage picks the mode. |
| Plan template | 17 sections strict (Frame, Non-functional, Approaches, Selected Direction, Decisions, Pre-mortem, Assumptions (correct me now) (v8.67), Not Doing (and why) (v8.80 — renamed from Not Doing), Key assumptions to validate (v8.80; stable KA-N ids on every bullet in v8.85), Plan, Spec, Plan / Slices, Acceptance Criteria (verification), Feasibility stamp, Edge cases, Topology, Traceability block) — plus two conditionally-appended sections Plan-design findings (v8.75; appended by plan-design when its gate fires) and Plan-devex findings (v8.82; appended by plan-devex when its gate fires) bringing the strict count to 19 when both gates fire; plus the extend-mode Extends section at the top of plan.md on every /cc extend <slug> <task> flow. 8 sections soft (Plan, Spec, Testable conditions, Verification, Touch surface, Notes, Assumptions (correct me now), Not Doing (and why)). Work-units (slices) are separate from verification (AC). |
| Postures | 6 per-criterion postures (test-first, characterization-first, tests-as-deliverable, refactor-only, docs-only, bootstrap). Each maps to a fixed commit-shape recipe the reviewer enforces ex-post. |
| Review | 14 axes — 8 base (correctness, readability, architecture, security, perf, test-quality, complexity-budget, edit-discipline) plus 6 gated (qa-evidence when qa-runner ran, nfr-compliance when ## Non-functional is non-empty, design-quality (v8.70) when triage detects a UI / design / frontend / UX surface, scope-drift (v8.84) when plan.md > ## Not Doing (and why) is non-empty — always true post-v8.80 since plan-critic §6.5 blocks ship on empty, assumption-coverage (v8.85) when plan.md > ## Key assumptions to validate carries ≥1 KA-N-shaped bullet, anti-slop (v8.86) default-on / fires on every reviewer iteration unless explicitly disabled via walkAntiSlopAxis: false — Karpathy "Simplicity First" check per diff). Append-only findings table, convergence detector, severity-aware ship gate. |
| Critic step | Falsificationist pass after review clears: §1 predictions, §2 gap analysis, §3 four adversarial techniques + 6 human-perspective lenses (executor / stakeholder / skeptic for plan-stage, security / new-hire / ops for code-stage), §4 Criterion check (AC + Edge cases + NFR), §5 goal-backward, §6 realist check, §7 verdict, §8 summary. |
| Auto-trigger skills | 35 skills (triage-gate, plan-authoring, tdd-and-verification, review-discipline, design-quality-discipline, devex-quality-discipline, investigation-discipline, commit-hygiene, completion-discipline, pre-edit-investigation, qa-and-browser, debug-and-browser, ac-discipline, source-driven, summary-format, documentation-and-adrs, parallel-build, refinement, flow-resume, receiving-feedback, anti-slop, conversation-language, api-evolution, pre-flight-assumptions, slice-discipline, ambiguity-discipline, structured-status, plus the five v8.83 reviewer-axis companion skills reviewer-axis-qa-evidence, reviewer-axis-design-quality, reviewer-axis-security, reviewer-axis-nfr-compliance, reviewer-axis-edit-discipline, plus the v8.84 sixth reviewer-axis companion skill reviewer-axis-scope-drift, plus the v8.85 seventh reviewer-axis companion skill reviewer-axis-assumption-coverage, plus the v8.86 eighth reviewer-axis companion skill reviewer-axis-anti-slop). Auto-applied per stage. design-quality-discipline (v8.75) is shared by plan-design (plan stage) and reviewer.design-quality (review stage); devex-quality-discipline (v8.82) is shared by plan-devex and a future post-build reviewer devex axis; investigation-discipline (v8.77) wraps the investigator's read-only diagnostic discipline; the seven reviewer-axis-* skills (v8.83 five + v8.84 scope-drift + v8.85 assumption-coverage) carry the lifted bodies of the gated reviewer axes (load only when the axis's gate fires). |
| On-demand runbooks | 16 runbooks loaded by trigger (dispatch-envelope, parallel-build, finalize, cap-reached-recovery, adversarial-rerun, handoff-gates, handoff-artifacts, compound-refresh, pause-resume, critic-steps, qa-stage, extend-mode, always-auto-failure-handling, research-depth-and-self-review, research-revision, debug-branch). Kept out of the orchestrator body to hold the prompt budget. |
| Anti-rationalization catalog | .cclaw/lib/anti-rationalizations.md carries the cross-cutting rebuttal table (posture-bypass, completion-discipline, edit-discipline, verification rows). Each specialist's prompt cites the catalog and adds specialist-specific rows. |
| Outcome signals | 5-value enum (good, unknown, manual-fix, follow-up-bug, reverted) recorded on knowledge.jsonl rows. Three capture paths (orchestrator scans on every /cc for follow-up-bug references; compound time scans for revert commits and same-touch-surface manual-fix commits). Prior-learnings lookup multiplies similarity by signal weight before threshold filtering. |
| Discipline skills | completion-discipline (no ✅ complete without paired fresh evidence), pre-edit-investigation (three-probe gate before any edit), receiving-feedback (builder fix-only response protocol), structured-status (builder emits one of four canonical statuses — DONE / DONE_WITH_CONCERNS / NEEDS_CONTEXT / BLOCKED — orchestrator routes deterministically), plus edit-discipline as a reviewer axis. |
| Harness-embedded rules | Every supported harness installs cclaw's Iron Laws + anti-rationalizations + antipatterns into its own ambient surface (.cursor/rules/, .claude/, .codex/, .opencode/). cclaw never touches root AGENTS.md, CLAUDE.md, or GEMINI.md. |
| Parallel build | Independent slices run in parallel by default, each in its own sibling git worktree (../<projectName>-<slug>-<sliceId> on disposable branch cclaw/<slug>-<sliceId>). Sub-builders TDD + commit in isolation; the parent fast-forward merges each slice back in topological-layer order with a typecheck + test CI gate between merges. Refused fast-forwards land in flow-state.slice_merge_failures[] and contaminate the dispatch-level status to BLOCKED. Triggered automatically when a layer has ≥2 slices with independent: true; single-slice layers run inline. ceremonyMode: strict required. Plan-critic blocks-ship any independence claim that fails literal zero-file-overlap across Surface columns. |
| Multi-harness install | Claude Code, Cursor, OpenCode, Codex — same .cclaw/ runtime, different harness adapters. |
Harnesses supported
| Harness | Detection | Status |
|---|---|---|
| Claude Code | CLAUDE.md or .claude/ |
Supported |
| Cursor | .cursor/ |
Supported |
| OpenCode | opencode.json[c] or .opencode/ |
Supported |
| Codex | .codex/ or .agents/skills/ |
Supported |
Run npx cclaw-cli@latest and the TUI auto-detects whatever you have. For CI / scripted installs, pass --non-interactive install --harness=<id>[,<id>] (comma-separated; supported ids: claude, cursor, opencode, codex).
Configuration
.cclaw/config.yaml is optional. Defaults are good. Common knobs:
harnesses: [claude, cursor]
reviewerTwoPass: false # opt-in: spec-review + code-quality-review split
compoundRefreshEvery: 5 # how often to dedup knowledge.jsonl
compoundRefreshFloor: 10 # minimum entries before refresh kicks in
captureLearningsBypass: false # true = silent skip on non-trivial slugs
legacy-artifacts: false # true brings back legacy extra artifacts
architect:
ambiguity_threshold: 0.2 # ambiguity soft-warning threshold
critic:
cross_model: false # opt-in (v8.72): second adversarial pass via a
# different model through MCP on high-stakes
# slugs. Always available on demand via the
# `/cc --critic-cross-model` flag regardless
# of this knob. Graceful fallback when no
# MCP cross-model tool is wired.Synthesis confidence + priorResearch cite-back
v8.88 makes the research-→-plan handoff load-bearing in two new ways:
Per-finding numeric confidence at the lens layer. Each of the six research lenses (engineer, product, architecture, history, skeptic, design) now ships a top-level ### Findings (with confidence) block: 3-7 distilled findings, each tagged #### F-N (confidence: 0.0-1.0). Reference: obra-style numeric confidence per finding from gsd-research-synthesizer.
Confidence summary in synthesis. Phase 3 of the research orchestrator aggregates the per-lens findings into a mandatory ### Confidence summary subsection of ## Synthesis:
- Weighted averages per finding-equivalent — when 2+ lenses cover the same underlying claim, weighted average of their confidences (weight = 1/lens-count contributing); contributing
F-Nids cited inline. - Confidence cliffs — every finding-equivalent where two lenses disagree by ≥0.5 (e.g. engineer 0.9 vs. skeptic 0.2) is surfaced as a dedicated
**Cliff:**bullet so the follow-up architect can see lens disagreement at a glance. - Per-lens rollup — mean confidence per dispatched lens (rounded to two decimals) so lenses whose top-line confidence is misleading vs. the per-finding distribution become visible.
Cites: research.md §<section> on every D-N when priorResearch was loaded. The architect's Phase 0 Bootstrap reads flowState.priorResearch (the pointer wired in v8.65 / v8.76 / v8.78 / v8.81 when a prior /cc research <topic> shipped). When that pointer is non-null, every plan-stage Decision (D-N) in plan.md > ## Decisions MUST carry a Cites: research.md §<section> field naming 1-3 sections of the loaded research.md that grounded the choice. Examples: Cites: research.md §Engineer lens > Implementation paths, Cites: research.md §Synthesis > Confidence summary, Cites: research.md §Recommended next step.
plan-critic §A blocks ship on missing citations in that mode (finding class decision-missing-research-cite), flags malformed cites without the § anchor (decision-bad-research-cite), and flags orphan cites authored on cold-start flows where no research was loaded (decision-orphan-research-cite). On cold-start /cc <task> flows (priorResearch null), the Cites: field is omitted entirely — its absence is the expected shape and plan-critic emits no finding.
Model-tier policy
Every specialist dispatch carries a Model tier: hint (v8.87) so harnesses that route on tier (custom OpenCode profiles, Claude Code agent.toml, etc.) can pick the right model per stage. Harnesses that don't route on the hint ignore the line and fall back to their own default model. Tier values are the literal union fast | balanced | powerful — anything else is dropped at resolve time and the default tier survives.
Defaults shipped with v8.87 (reference: obra's subagent-driven-development model-selection block):
| Specialist | Default tier |
|---|---|
builder (formerly slice-builder pre-v8.62) |
fast |
learnings-research / repo-research |
fast |
triage / investigator / architect |
balanced |
plan-critic / plan-design / plan-devex |
balanced |
qa-runner / reviewer |
balanced |
critic |
powerful |
Override any of them per project via .cclaw/config.yaml > modelPreferences. User entries are merged onto the defaults field-by-field:
modelPreferences:
critic: balanced # downgrade from powerful for cheaper iterations
builder: balanced # bump builder from fast on a perf-sensitive codebase
reviewer: powerful # bump reviewer for adversarial codebasesThe defaults map lives in src/config.ts > DEFAULT_MODEL_PREFERENCES; the merge helper is resolveModelPreferences(config).
Architecture deep dive
The runtime is under 1 KLOC. The prompt content is where the work lives. To understand how /cc actually works, read the source under src/content/:
src/content/start-command.ts— orchestrator body (detect, dispatch, always-auto chain, critic step, ship, compound, finalize).src/content/specialist-prompts/— 10 specialist contracts.src/content/skills/— 32 auto-trigger skill bodies.src/content/runbooks-on-demand.ts— 16 on-demand runbooks loaded by trigger.src/content/artifact-templates.ts— plan / build / qa / review / critic / plan-critic / ship / learnings templates.src/content/anti-rationalizations.ts— cross-cutting rebuttal catalog.CHANGELOG.md— release history.
Artifact tree (after install)
.cclaw/
config.yaml flow defaults
state/
flow-state.json active flow state (~500 bytes)
knowledge.jsonl compound learnings index
triage-audit.jsonl routing audit log
flows/
<slug>/ one folder per active task
plan.md
build.md
qa.md (UI/web slugs only)
review.md
critic.md
plan-critic.md (strict + complexity≠trivial + AC≥2)
ship.md
research.md (/cc research <topic> only)
(plan-design findings appear inline in plan.md
under `## Plan-design findings` when the v8.75
design-surface gate fires; no separate file)
shipped/<slug>/ finalized tasks (including research-mode flows)
cancelled/<slug>/ /cc-cancel destination
lib/
agents/ 10 specialist contracts + 2 read-only research helpers (learnings-research / repo-research)
research-lenses/ 6 research-only lens contracts
skills/ 32 auto-trigger skill bodies
templates/ artifact templates
runbooks/ 16 on-demand runbooks
patterns/ reference patterns
anti-rationalizations.md
antipatterns.mdCLI surface
Two invocations cover every use case. There is no cclaw plan / cclaw status / cclaw build / cclaw ship — flow control lives inside /cc.
# Interactive (humans): opens a TUI menu — Install / Uninstall / Quit
npx cclaw-cli@latest
# Non-interactive (CI / scripts): explicit command, no TUI
npx cclaw-cli@latest --non-interactive install [--harness=<id>[,<id>]]
npx cclaw-cli@latest --non-interactive uninstall
npx cclaw-cli@latest --non-interactive knowledge [--tag=<tag>] [--surface=<sub>] [--type=<kind>] [--all] [--json]
npx cclaw-cli@latest --version
npx cclaw-cli@latest --helpinstall is idempotent and runs orphan cleanup, so it handles first-time setup, re-sync after a package upgrade, and stale-file cleanup in one command. The TUI menu and the --non-interactive install path share the same installer code — they are byte-for-byte identical in write behaviour.
Contributing
cclaw is dogfooded — every release is shipped via /cc against itself. To contribute:
- Fork and clone.
npm install && npm run build && npm test(the test suite is the spec; PRs without test updates are rare).- Run
/cc <your change>inside a cclaw-installed harness, or write tests + code directly. - Open a PR. CI runs lint, typecheck, unit tests, integration tests, and a smoke runtime test.
The runtime stays under 1 KLOC; new behaviour usually means new prompt content under src/content/, not new code under src/.
License
MIT. See LICENSE.