JSPM

  • Created
  • Published
  • Downloads 1189
  • Score
    100M100P100Q129870F
  • License MIT

LangSmith-native autonomous agent optimization for Claude Code

Package Exports

    This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (harness-evolver) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

    Readme

    Harness Evolver

    Harness Evolver

    npm License: MIT Paper Built by Raphael Valdetaro

    LangSmith-native autonomous agent optimization. Point at any LLM agent codebase, and Harness Evolver will evolve it — prompts, routing, tools, architecture — using multi-agent evolution with LangSmith as the evaluation backend.

    Inspired by Meta-Harness (Lee et al., 2026). The scaffolding around your LLM produces a 6x performance gap on the same benchmark. This plugin automates the search for better scaffolding.


    Install

    npx harness-evolver@latest

    Works with Claude Code, Cursor, Codex, and Windsurf. Requires LangSmith account + API key.


    Quick Start

    cd my-llm-project
    export LANGSMITH_API_KEY="lsv2_pt_..."
    claude
    
    /evolver:setup      # explores project, configures LangSmith
    /evolver:evolve     # runs the optimization loop
    /evolver:status     # check progress
    /evolver:deploy     # tag, push, finalize

    How It Works

    LangSmith-Native No custom eval scripts or task files. Uses LangSmith Datasets for test inputs, Experiments for results, and Evaluators (openevals LLM-as-judge) for scoring. Everything is visible in the LangSmith UI.
    Real Code Evolution Proposers modify your actual agent code — not a wrapper. Each candidate works in an isolated git worktree. Winners are merged automatically.
    5 Adaptive Proposers Each iteration spawns 5 parallel agents: exploit, explore, crossover, and 2 failure-targeted. Strategies adapt based on per-task analysis. Quality-diversity selection preserves per-task champions.
    Production Traces Auto-discovers existing LangSmith production projects. Uses real user inputs for test generation and real error patterns for targeted optimization.
    Critic Auto-triggers when scores jump suspiciously fast. Checks if evaluators are being gamed.
    Architect Auto-triggers on stagnation. Recommends topology changes (single-call to RAG, chain to ReAct, etc.).

    Commands

    Command What it does
    /evolver:setup Explore project, configure LangSmith (dataset, evaluators), run baseline
    /evolver:evolve Run the optimization loop (5 parallel proposers in worktrees)
    /evolver:status Show progress, scores, history
    /evolver:deploy Tag, push, clean up temporary files

    Agents

    Agent Role Color
    Proposer Modifies agent code in isolated worktrees based on trace analysis Green
    Architect Recommends multi-agent topology changes Blue
    Critic Validates evaluator quality, detects gaming Red
    TestGen Generates test inputs for LangSmith datasets Cyan

    Evolution Loop

    /evolver:evolve
      |
      +- 1.  Read state (.evolver.json + LangSmith experiments)
      +- 1.5 Gather trace insights (cluster errors, tokens, latency)
      +- 1.8 Analyze per-task failures (adaptive briefings)
      +- 2.  Spawn 5 proposers in parallel (each in a git worktree)
      +- 3.  Evaluate each candidate (client.evaluate() -> LangSmith experiments)
      +- 4.  Compare experiments -> select winner + per-task champion
      +- 5.  Merge winning worktree into main branch
      +- 5.5 Test suite growth (add regression examples to dataset)
      +- 6.  Report results
      +- 6.5 Auto-trigger Critic (if score jumped >0.3)
      +- 7.  Auto-trigger Architect (if stagnation or regression)
      +- 8.  Check stop conditions

    Requirements

    • LangSmith account + LANGSMITH_API_KEY
    • Python 3.10+ with langsmith and openevals packages
    • Git (for worktree-based isolation)
    • Claude Code (or Cursor/Codex/Windsurf)
    export LANGSMITH_API_KEY="lsv2_pt_..."
    pip install langsmith openevals

    Framework Support

    LangSmith traces any AI framework. The evolver works with all of them:

    Framework LangSmith Tracing
    LangChain / LangGraph Auto (env vars only)
    OpenAI SDK wrap_openai() (2 lines)
    Anthropic SDK wrap_anthropic() (2 lines)
    CrewAI / AutoGen OpenTelemetry (~10 lines)
    Any Python code @traceable decorator

    References


    License

    MIT