JSPM

  • Created
  • Published
  • Downloads 1411
  • Score
    100M100P100Q128675F
  • License MIT

A structured framework for AI-driven artefact creation with deterministic routing, quality gates, and iterative refinement cycles.

Package Exports

  • @really-knows-ai/foundry
  • @really-knows-ai/foundry/.opencode/plugins/foundry.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@really-knows-ai/foundry) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

Foundry

A skill-driven framework for governed artefact generation with AI coding tools. Define your own artefact types, laws, and flows — Foundry handles the forge → quench → appraise pipeline with deterministic routing, quality gates, and iterative refinement.

npm version license


Table of contents


Why Foundry?

LLMs are excellent at producing artefacts — code, specs, docs, tests — but they are erratic about governing that production. They skip checks, silently ignore feedback, drift from constraints, and forget what stage they're in. Foundry is an opinionated framework that separates creative work (handled by LLMs via skills) from process work (handled by deterministic tools):

  • The pipeline is code, not prose. Routing, state transitions, commit discipline, and write invariants live inside tested plugin tools. LLMs can't rationalise their way past them.
  • Every artefact is governed by laws. Global and per-type pass/fail criteria are evaluated by a panel of independent appraisers before anything is considered done.
  • Nothing is silent. Feedback has a full lifecycle (open → actioned/wont-fix → approved/rejected). Wont-fix requires appraiser approval. Validation is non-negotiable.
  • Writes are enforced. Each stage is allowed to modify a specific, narrow set of files. Violations halt the cycle.
  • Humans can step in. Human-in-the-loop gates can run every iteration or only when LLM appraisers deadlock.

Compatibility

  • OpenCode — full support. Multi-model routing via file-based foundry-* agents. This is the primary target.
  • Other skill-aware AI tools — the skills and tools are portable. Multi-model stage routing is OpenCode-specific today because it relies on .opencode/agents/ files generated by refresh-agents.

Installation

Add @really-knows-ai/foundry to your OpenCode config:

// opencode.json
{
  "$schema": "https://opencode.ai/config.json",
  "plugin": ["@really-knows-ai/foundry"]
}

Quick start

  1. Install the package (above).
  2. Initialize — run the init-foundry skill to scaffold a foundry/ directory and generate foundry-* agent files.
  3. Define artefact typesadd-artefact-type walks you through identity, file patterns, output directory, laws, and optional CLI validation.
  4. Add lawsadd-law creates subjective pass/fail criteria, globally or per-type.
  5. Add appraisersadd-appraiser creates appraiser personalities with conflict detection.
  6. Define cyclesadd-cycle wires artefact types into a forge/quench/appraise loop with targets and input contracts.
  7. Define a flowadd-flow groups cycles and declares entry points.
  8. Run — invoke the flow skill with your goal. It creates a work branch, picks the right cycle, and hands off to orchestrate.

How it works

                     ┌─────────────────────────────┐
                     │  Flow  (entry points + set) │
                     └──────────────┬──────────────┘
                                    │ starting cycle picked
                                    ▼
 ┌────────────────────────────────────────────────────────────────┐
 │  Cycle  (outputs exactly one artefact type)                    │
 │                                                                │
 │    ┌─────────┐    ┌─────────┐    ┌─────────────┐               │
 │    │  forge  │ → │ quench  │ → │  appraise   │  ──┐           │
 │    └─────────┘    └─────────┘    └─────────────┘    │  loop    │
 │          ▲                                           │  until  │
 │          └───── unresolved feedback ─────────────────┘  clean  │
 │                                                                │
 │    [ optional: human-appraise  — every iter or on deadlock ]  │
 └──────────────┬─────────────────────────────────────────────────┘
                │ targets (may branch)
                ▼
          next cycle  →  …  →  done
  • A flow defines the set of cycles and their entry points.
  • A cycle produces exactly one artefact type and declares its own targets — Foundry follows a dependency graph, not a linear list.
  • Each cycle loops through forge → quench → appraise until there is no unresolved feedback, or an iteration limit is hit.
  • All inter-stage communication goes through WORK.md on a dedicated work branch; every stage ends with a micro-commit.

Core concepts

Flow

A flow lives in foundry/flows/. It declares:

  • starting-cycles — hints about where the flow can be entered.
  • The set of cycles it contains (routing between them is owned by cycles, not by the flow).

Starting a flow creates a work branch and a fresh WORK.md.

Cycle

A cycle lives in foundry/cycles/. It declares:

  • output — the artefact type the cycle produces (read-write).
  • inputs — a contract (any-of or all-of) over artefact types from other cycles. Inputs are discovered on disk by filesystem scan against each input type's file-patterns; they are read-only.
  • targets — which cycle(s) may run next after this one completes.
  • human-appraise / deadlock-appraise / deadlock-iterations — human-in-the-loop configuration.
  • models — optional per-stage model overrides for multi-model diversity.

Stage

A single step within a cycle. Stages are identified as base:alias (e.g. forge:write-haiku, quench:check-syllables). The base is one of:

  • forge — produce or revise the artefact.
  • quench — run deterministic CLI checks (skipped if the artefact type has no validation.md).
  • appraise — subjective evaluation by multiple independent appraiser sub-agents.
  • human-appraise — human quality gate, either every iteration or only on deadlock.

Artefact type

Defined in foundry/artefacts/<type>/:

  • definition.md — id, name, file patterns, output directory, appraiser configuration, prose description.
  • laws.md (optional) — type-specific subjective criteria.
  • validation.md (optional) — CLI commands with a {file} placeholder; non-zero exit = failure.

Laws

Subjective pass/fail criteria evaluated by appraisers.

  • foundry/laws/*.md — global laws (all files concatenated, apply everywhere).
  • foundry/artefacts/<type>/laws.md — type-specific laws.

Each law is a ## heading (its identifier, referenced in feedback as #law:<id>) with a description, passing criteria, and failing criteria.

Appraisers

Defined in foundry/appraisers/. Each appraiser is a named personality with an optional model override. Artefact types pick which appraisers may evaluate them:

appraisers:
  count: 3                       # how many appraisers (default: 3)
  allowed: [pedantic, pragmatic] # which personalities (default: all)

Appraisers are distributed evenly across the allowed set for maximum diversity.

WORK.md

Transient shared state on the work branch. Created when the flow starts, deleted before the branch is squash-merged. It contains:

  • Frontmatter — current position (flow, cycle, stage list, max iterations, model map, human-appraise config).
  • Goal — the prose request that kicked off the flow.
  • Artefacts — a table of every file produced by the flow and its status (draft, done, blocked).
  • Feedback — grouped by artefact file, every feedback item with its full lifecycle.

A sibling file WORK.history.yaml is an append-only log of every stage execution. See docs/work-spec.md.


The pipeline in depth

Stages run inside a token-gated lifecycle

Every dispatched stage (forge, quench, appraise, human-appraise) runs under a single-use HMAC token:

  1. The orchestrate tool mints a token and hands it to the sub-agent in the dispatch prompt.
  2. The sub-agent's first call must be foundry_stage_begin({stage, cycle, token}). The token is redeemed; mutation tools now check that the active stage matches.
  3. The sub-agent does its work (reads WORK.md, writes artefact files / feedback, etc.).
  4. The sub-agent's last call is foundry_stage_end({summary}).
  5. The orchestrator then calls foundry_stage_finalize, which:
    • Scans the git diff against the stage's allowed file-patterns.
    • Registers any new files matching the output artefact type as draft artefacts.
    • Returns {error: 'unexpected_files'} if the stage wrote anywhere it shouldn't have.
  6. The cycle is committed (foundry_git_commit internally) and routing advances.

Per-stage write rules:

Stage May write
forge Files matching the output artefact type's file-patterns, plus WORK.md / WORK.history.yaml
quench WORK.md / WORK.history.yaml only (feedback)
appraise WORK.md / WORK.history.yaml only (feedback)
human-appraise WORK.md / WORK.history.yaml only (feedback)

Input artefacts are read-only. Files outside any artefact type's patterns are read-only. Violations hard-stop the cycle.

Deterministic orchestration

The orchestrate skill is thin — a 3-line loop:

call foundry_orchestrate({lastResult})
switch on action:
  dispatch        → task tool (subagent) → report back
  human_appraise  → run human-appraise inline → report back
  done / blocked / violation → terminate the loop

foundry_orchestrate owns sort routing, history, commits, finalize, deadlock detection, and violation handling. Because the protocol lives in a plugin tool, the LLM can't skip steps, reorder them, or silently drop a commit.


Feedback lifecycle

Feedback is markdown checklists under each artefact in WORK.md, tagged to indicate source.

- [ ] issue #tag                                    → open          — needs forge action
- [x] issue #tag                                    → actioned      — needs appraise approval
- [~] issue #tag | wont-fix: <reason>               → wont-fix      — needs appraise approval
- [x] issue #tag | approved                         → resolved
- [~] issue #tag | wont-fix: <reason> | approved    → resolved
- [x] issue #tag | rejected: <reason>               → re-opened
- [~] issue #tag | wont-fix: <reason> | rejected    → re-opened

Tags:

Tag Source Notes
#validation quench (CLI command failed) Cannot be wont-fixed. Deterministic rules are not negotiable.
#law:<id> appraise (subjective law) May be wont-fixed with justification; an appraiser must approve.
#human human-appraise Takes absolute priority. Forge MUST address it — cannot wont-fix.

Feedback is append-only: items are never deleted, only resolved. Re-opened items show their full history.

Deadlock handling

If forge and appraise ping-pong on the same items for deadlock-iterations (default 5) iterations, and the cycle has deadlock-appraise: true (default), the router inserts a human-appraise stage. If deadlock-appraise: false, the cycle is marked blocked and control returns to the human.


Enforcement model

Foundry is designed around "trust the tool, not the LLM". The following guarantees are enforced in plugin code, not prose:

  • Stage-locked mutations. foundry_feedback_*, foundry_artefacts_*, and foundry_workfile_* tools require the caller's role to match the active stage. A forge sub-agent cannot add feedback; a quench sub-agent cannot register artefacts.
  • Single-use tokens. foundry_stage_begin verifies an HMAC token minted at dispatch time. Replays, forgery, and cross-stage reuse all fail closed. Keys live in .foundry/.secret (mode 0600, gitignored, one per worktree).
  • Commit-per-stage contract. foundry_orchestrate refuses to proceed if there are uncommitted changes to WORK.md, WORK.history.yaml, or anything under .foundry/ at the start of a sort call and history is non-empty.
  • Write invariants. foundry_stage_finalize scans the git diff and rejects stray writes with {error: 'unexpected_files'}.
  • Feedback state machine. Only legal transitions are accepted: approved is terminal; quench cannot approve/reject a wont-fix; validation cannot be wont-fixed.
  • Artefact-type glob uniqueness. add-artefact-type refuses to create a type whose file patterns overlap with an existing type; the enforcer can't determine file ownership otherwise.

Multi-model routing

Different stages can run on different models for genuine cognitive diversity (mitigating shared blind spots):

  • Cycle definitions can declare a models map, e.g. models: { forge: anthropic/claude-opus-4.7, appraise: openai/gpt-5 }.
  • Individual appraisers can override the cycle-level appraise model via a model field in their personality definition.
  • refresh-agents generates a foundry-<provider>-<model>.md agent file in .opencode/agents/ for every model available in the session. orchestrate picks the matching agent when dispatching.

Resolution order for a given stage: appraiser modelcycle models.<stage>session default.

Run list-agents to see what's available.


Skills

Foundry is a collection of skills. Skills are either atomic (do one thing) or composite (orchestrate other skills).

Pipeline

Skill Type Purpose
flow composite Entry point. Picks a starting cycle, creates the work branch, invokes orchestrate, follows targets between cycles.
orchestrate atomic Thin driver around foundry_orchestrate. Dispatches sub-agents, runs human-appraise inline, reports terminal states.
forge atomic Produce or revise the artefact. Discovers inputs by filesystem scan.
quench atomic Run the artefact type's CLI validation commands; write #validation feedback.
appraise atomic Dispatch the selected appraiser personalities as parallel sub-agents; consolidate #law:<id> feedback (union + dedup).
human-appraise atomic Human quality gate. Presents the artefact, collects #human feedback.

Authoring

Skill Purpose
init-foundry Scaffold the foundry/ directory and generate agent files.
add-artefact-type Create a new artefact type, with conflict and glob-overlap checks.
add-law Create a new law with conflict detection.
add-appraiser Create an appraiser personality with semantic-overlap checks.
add-cycle Create a cycle, validate its targets and input contract against the flow.
add-flow Create a flow definition with cycle-graph reachability checks.

Utility

Skill Purpose
list-agents List available foundry-* sub-agents (for multi-model routing).
refresh-agents Regenerate foundry-* agent files from the currently available models.
upgrade-foundry Analyse and migrate foundry/ config to the current version.

All authoring skills are interactive and conflict-aware — they explain what they're about to write and ask before writing.


Custom tools

The plugin registers 24 custom tools. Skills call these rather than manipulating files directly, which keeps format-parsing and state transitions out of LLM hands.

Category Tools
Orchestration foundry_orchestrate
Stage lifecycle foundry_stage_begin, foundry_stage_end
Workfile foundry_workfile_create, foundry_workfile_get, foundry_workfile_delete
Artefacts foundry_artefacts_set_status, foundry_artefacts_list
Feedback foundry_feedback_add, foundry_feedback_action, foundry_feedback_wontfix, foundry_feedback_resolve, foundry_feedback_list
History foundry_history_list
Config foundry_config_cycle, foundry_config_artefact_type, foundry_config_laws, foundry_config_validation, foundry_config_appraisers, foundry_config_flow
Validation foundry_validate_run, foundry_appraisers_select
Git foundry_git_branch, foundry_git_finish

A handful of internal tools (foundry_sort, foundry_history_append, foundry_stage_finalize, foundry_git_commit, foundry_workfile_set, foundry_workfile_configure_from_cycle) are intentionally not registered — they exist only inside foundry_orchestrate so they cannot be called out of band.

Tools are backed by shared modules in scripts/lib/ with injectable I/O for testability (see tests/).


Project layout

Package (this repo)

@really-knows-ai/foundry
├── .opencode/
│   └── plugins/
│       └── foundry.js          # plugin: skills + 24 custom tools
├── skills/                     # skill definitions
│   ├── flow/                   # pipeline
│   ├── orchestrate/
│   ├── forge/
│   ├── quench/
│   ├── appraise/
│   ├── human-appraise/
│   ├── init-foundry/           # authoring
│   ├── add-artefact-type/
│   ├── add-law/
│   ├── add-appraiser/
│   ├── add-cycle/
│   ├── add-flow/
│   ├── list-agents/            # utility
│   ├── refresh-agents/
│   └── upgrade-foundry/
├── scripts/
│   ├── lib/                    # shared libraries (injectable I/O)
│   │   ├── workfile.js         # WORK.md frontmatter
│   │   ├── artefacts.js        # artefact table ops
│   │   ├── history.js          # WORK.history.yaml ops
│   │   ├── feedback.js         # feedback lifecycle
│   │   ├── feedback-transitions.js
│   │   ├── finalize.js         # stage_finalize implementation
│   │   ├── stage-guard.js      # stage-lock preconditions
│   │   ├── token.js            # HMAC token mint/verify
│   │   ├── secret.js           # .foundry/.secret handling
│   │   ├── pending.js          # active-stage state
│   │   ├── state.js            # .foundry state dir
│   │   ├── config.js           # foundry/ config readers
│   │   ├── tags.js             # feedback tag extraction
│   │   └── slug.js
│   ├── orchestrate.js          # orchestration loop (exports runOrchestrate)
│   └── sort.js                 # routing engine (exports runSort)
├── tests/                      # node:test suite
├── docs/                       # concepts, getting-started, work-spec
├── CHANGELOG.md
└── README.md

User project (after init-foundry)

your-project/
├── foundry/
│   ├── flows/                  # flow definitions
│   ├── cycles/                 # cycle definitions
│   ├── artefacts/              # artefact type definitions
│   │   └── <type>/
│   │       ├── definition.md
│   │       ├── laws.md         # optional
│   │       └── validation.md   # optional
│   ├── laws/                   # global laws
│   └── appraisers/             # appraiser personalities
├── .foundry/                   # runtime state (gitignored)
│   └── .secret                 # per-worktree HMAC key (mode 0600)
├── .opencode/
│   └── agents/
│       └── foundry-*.md        # generated by refresh-agents
├── opencode.json
└── ...

During a flow, a work branch also contains WORK.md and WORK.history.yaml at the repo root. Both are ephemeral — delete them before squash-merging.


Design decisions

Everything is markdown

Flows, cycles, artefact types, laws, appraiser personalities, skills — all markdown with YAML frontmatter. Readable by humans, consumable by LLMs, diff-able in git. No bespoke formats, no databases.

Skills are the pipeline, tools are the machinery

Composition happens at the skill layer. flow reads a definition and invokes orchestrate. orchestrate calls foundry_orchestrate in a loop. The hard guarantees — routing, commits, state transitions, enforcement — live inside the plugin's custom tools and the libraries under scripts/lib/. Skills handle creative and subjective work; tools handle everything else.

WORK.md as shared state

All inter-stage communication goes through WORK.md via the foundry_workfile_*, foundry_artefacts_*, foundry_feedback_*, and foundry_history_* tools. No stage passes output directly to another. This gives a complete audit trail, makes flows resumable after a crash, and lets any stage be re-run independently.

Cycles own their routing

A flow declares starting points; individual cycles declare targets and input contracts. The flow skill walks the resulting graph. This keeps cycles composable across flows and prevents the flow file from becoming a procedural monolith.

Feedback as checklists

Markdown checkboxes with #validation, #law:<id>, or #human tags. Human-readable, trivially parseable, lifecycle encoded inline. Feedback is append-only; history is part of the artefact's story.

Wont-fix requires approval

A forge sub-agent can decline subjective feedback with a justification, but an appraiser must approve or reject that decision on the next iteration. Validation and human feedback cannot be wont-fixed.

Multi-model diversity

Cycle definitions specify per-stage models; individual appraisers may override. Different models catch different issues; consolidation is a union. One appraiser flagging an issue is enough to raise it.

Input artefacts are read-only

When a cycle reads from another cycle's output, those files cannot be modified. Enforced via stage_finalize and sort's diff check. Downstream cycles cannot corrupt upstream work.

Glob patterns must not overlap

Two artefact types cannot have file patterns that match the same files. Hard-blocked at creation time; the file-ownership rule doesn't have a meaningful answer otherwise.


Further reading


License

MIT