Package Exports
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (harness-evolver) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
Harness Evolver
Point at any LLM agent codebase. Harness Evolver will autonomously improve it — prompts, routing, tools, architecture — using multi-agent evolution with LangSmith as the evaluation backend.
Install
Claude Code Plugin (recommended)
/plugin marketplace add raphaelchristi/harness-evolver-marketplace
/plugin install harness-evolvernpx (first-time setup or non-Claude Code runtimes)
npx harness-evolver@latestWorks with Claude Code, Cursor, Codex, and Windsurf.
Quick Start
cd my-llm-project
export LANGSMITH_API_KEY="lsv2_pt_..."
claude
/harness:setup # explores project, configures LangSmith
/harness:health # check dataset quality (auto-corrects issues)
/harness:evolve # runs the optimization loop
/harness:status # check progress (rich ASCII chart)
/harness:deploy # tag, push, finalizeHow It Works
| LangSmith-Native | No custom scripts. Uses LangSmith Datasets, Experiments, and LLM-as-judge. Everything visible in the LangSmith UI. |
| Real Code Evolution | Proposers modify actual code in isolated git worktrees. Winners merge automatically. |
| Self-Organizing Proposers | Two-wave spawning, dynamic lenses from failure data, archive branching from losing candidates. Self-abstention when redundant. |
| Rubric-Based Evaluation | LLM-as-judge with justification-before-score, rubrics, few-shot calibration, pairwise comparison. |
| Smart Gating | Constraint gates, efficiency gate (cost/latency pre-merge), regression guards, Pareto selection, holdout enforcement, rate-limit early abort, stagnation detection. |
Evolution Loop
/harness:evolve
|
+- 1. Preflight (validate state + dataset health + baseline scoring)
+- 2. Analyze (trace insights + failure clusters + strategy synthesis)
+- 3. Propose (spawn N proposers in git worktrees, two-wave)
+- 4. Evaluate (canary → run target → auto-spawn LLM-as-judge → rate-limit abort)
+- 5. Select (held-out comparison → Pareto front → efficiency gate → constraint gate → merge)
+- 6. Learn (archive candidates + regression guards + evolution memory)
+- 7. Gate (plateau → target check → critic/architect → continue or stop)Detailed loop with all sub-steps
Agents
| Agent | Role |
|---|---|
| Proposer | Self-organizing — investigates a data-driven lens, decides own approach, may abstain |
| Evaluator | LLM-as-judge — rubric-aware scoring via langsmith-cli, few-shot calibration |
| Architect | ULTRAPLAN mode — deep topology analysis with Opus model |
| Critic | Active — detects evaluator gaming, implements stricter evaluators |
| Consolidator | Cross-iteration memory — anchored summarization, garbage collection |
| TestGen | Generates test inputs with rubrics + adversarial injection |
Requirements
- LangSmith account +
LANGSMITH_API_KEY - Python 3.10+ · Git · Claude Code (or Cursor/Codex/Windsurf)
Dependencies installed automatically by the plugin hook or npx installer.
LangSmith traces any AI framework: LangChain/LangGraph (auto), OpenAI/Anthropic SDK (wrap_*, 2 lines), CrewAI/AutoGen (OpenTelemetry), any Python (@traceable).
References
- Meta-Harness: End-to-End Optimization of Model Harnesses — Lee et al., 2026
- Self-Organizing LLM Agents Outperform Designed Structures — Dochkina, 2026
- Hermes Agent Self-Evolution — NousResearch
- Agent Skills for Context Engineering — Koylan
- A-Evolve: Automated Agent Evolution — Amazon (5-stage evolution loop, git-tagged mutations)
- Meta Context Engineering via Agentic Skill Evolution — Ye et al., Peking University, 2026
- EvoAgentX: Evolving Agentic Workflows — Wang et al., 2026
- Darwin Godel Machine — Sakana AI
- AlphaEvolve — DeepMind
- LangSmith Evaluation — LangChain
- Harnessing Claude's Intelligence — Martin, Anthropic, 2026
- Traces Start the Agent Improvement Loop — LangChain
License
MIT