Package Exports
- @agntk/agent-harness
Readme
Agent Harness
The first agent framework where the agent gets measurably better as you use it. You write markdown, not code. Tools come from MCP.
Agent Harness is a file-first framework for building agents by editing markdown. It's designed for anyone who wants to build an agent in 2026 — coder or non-coder — and its differentiator is a built-in learning loop: every interaction is journaled, patterns become instincts, and the agent improves without retraining or fine-tuning.
The code layer exists as an escape hatch, not the entry point. The folder IS the agent.
Why this is different
- Self-learning by default: every interaction is journaled, patterns become instincts, agents get measurably better with use — no retraining, no fine-tuning.
- File-first authoring: edit markdown, not code. The folder IS the agent.
- MCP-native tools: the entire MCP ecosystem is your toolbox. No custom adapter layer.
Quick Start
# Install globally
npm install -g @agntk/agent-harness
# Create a new agent
harness init my-agent
cd my-agent
# Set your API key
export OPENROUTER_API_KEY=sk-or-...
# Ask your agent to do something useful
harness run "Help me decide between two options: A or B"
# Or start an interactive chat
harness chat
# See what's loaded
harness info
# Watch for file changes
harness devThe Learning Loop
Agent Harness agents learn from experience through an automated pipeline:
Interaction → Session recorded → Journal synthesized → Instincts proposed → Auto-installed- Every interaction is saved as a session in
memory/sessions/ - Run
harness journalto synthesize sessions into a daily journal entry - Run
harness learn --installto detect behavioral patterns and install new instincts - On the next run, the agent loads its new instincts and behaves differently
This means your agent gets better over time — without you writing any code.
How It Works
When you run harness init my-agent, you get this directory:
my-agent/
├── CORE.md # Agent identity (frozen — who am I?)
├── SYSTEM.md # Boot instructions (how do I operate?)
├── config.yaml # Model, runtime, memory settings
├── state.md # Live state (mode, goals, last interaction)
├── rules/ # Human-authored operational boundaries
│ └── operations.md
├── instincts/ # Agent-learned reflexive behaviors
│ ├── lead-with-answer.md
│ ├── read-before-edit.md
│ └── search-before-create.md
├── skills/ # Capabilities with embedded expertise
│ └── research.md
├── playbooks/ # Adaptive guidance for outcomes
│ └── ship-feature.md
├── workflows/ # Cron-driven automations
├── tools/ # External service integrations
├── agents/ # Sub-agent roster
└── memory/
├── sessions/ # Auto-captured interaction records
├── journal/ # Daily synthesized reflections
└── scratch.md # Ephemeral working memoryEvery file is markdown with YAML frontmatter. Every file has three disclosure levels:
- L0 (~5 tokens): One-line summary in an HTML comment
- L1 (~50-100 tokens): Paragraph summary in an HTML comment
- L2 (full body): Complete content
The harness loads files intelligently based on token budget — L0 to decide relevance, L1 to work with, L2 only when actively needed.
The 7 Primitives
| Primitive | Owner | Purpose | Example |
|---|---|---|---|
| Rules | Human | Operational boundaries that don't change | "Never deploy on Fridays" |
| Instincts | Agent | Learned behaviors that evolve over time | "Lead with the answer, not reasoning" |
| Skills | Mixed | Capabilities with embedded judgment | "How to do research" |
| Playbooks | Mixed | Adaptive guidance for achieving outcomes | "How to ship a feature" |
| Workflows | Infra | Cron-driven deterministic automations | "Hourly health check" |
| Tools | External | Service integration knowledge | "GitHub API patterns" |
| Agents | External | Sub-agent roster and capabilities | "Code reviewer agent" |
Ownership matters
Every file has exactly one owner — human, agent, or infrastructure:
- Human writes rules and CORE.md. The agent respects these.
- Agent writes instincts and sessions. It learns from experience.
- Infrastructure writes indexes and journals. Bookkeeping is automated.
Customizing Your Agent
Change identity
Edit CORE.md — this is who your agent is, its values, and its ethics.
Add a rule
Create a file in rules/:
---
id: no-friday-deploys
tags: [rule, safety, deployment]
created: 2026-04-06
author: human
status: active
---
<!-- L0: Never deploy to production on Fridays. -->
<!-- L1: No production deployments on Fridays. Three incidents were traced to Friday
deploys with reduced weekend monitoring. Staging is fine. -->
# Rule: No Friday Deploys
Never deploy to production on Fridays. Staging deployments are acceptable.
If an emergency hotfix is needed, require explicit human approval.Add an instinct
Create a file in instincts/. Instincts are behaviors the agent learns — they have provenance (where the learning came from).
Add a skill
Create a file in skills/. Skills include not just what the agent can do, but how it thinks about doing it — judgment, red flags, when NOT to use the skill.
Add a playbook
Create a file in playbooks/. Playbooks are step-by-step guidance that the agent interprets and adapts, not rigid scripts.
Installing Content
Install skills, rules, agents, and more from the community:
# Install from any source — file, URL, or name
harness install https://raw.githubusercontent.com/.../skill.md
# Search community sources
harness discover search "code review"
# Browse available sources
harness sources listThe installer auto-detects format (Claude Code skills, raw markdown, bash hooks, MCP configs) and normalizes to harness convention with proper frontmatter.
Share your own primitives as bundles:
harness bundle my-skills.tar --types skills,rules
harness bundle-install my-skills.tarCLI Commands
Core
| Command | Description |
|---|---|
harness init [name] |
Create a new agent (interactive if no name given) |
harness run <prompt> |
Run a single prompt |
harness run <prompt> --stream |
Stream the response |
harness run <prompt> -m claude |
Use a model alias |
harness chat |
Interactive REPL with conversation memory |
harness chat --fresh |
Start a fresh conversation (clear history) |
harness info |
Show loaded context and token budget |
harness prompt |
Display the full assembled system prompt |
harness status |
Show primitives, sessions, config, state |
harness validate |
Validate harness structure and configuration |
harness doctor |
Validate and auto-fix all fixable issues |
Development
| Command | Description |
|---|---|
harness dev |
Watch mode + auto-index + scheduler + web dashboard |
harness dev --port 8080 |
Custom dashboard port (default 3000) |
harness index |
Rebuild all index files |
harness process |
Auto-fill missing frontmatter and L0/L1 summaries |
harness search <query> |
Search primitives by text and tags |
harness graph |
Analyze primitive dependency graph |
harness serve |
Start HTTP API server for webhooks and integrations |
Learning
| Command | Description |
|---|---|
harness journal |
Synthesize today's sessions into a journal entry |
harness learn |
Analyze sessions and propose new instincts |
harness learn --install |
Auto-install proposed instincts |
harness harvest --install |
Extract and install instinct candidates from journals |
harness auto-promote |
Promote instinct patterns appearing 3+ times |
Intelligence
| Command | Description |
|---|---|
harness suggest |
Suggest skills/playbooks for frequent uncovered topics |
harness contradictions |
Detect conflicts between rules and instincts |
harness dead-primitives |
Find orphaned primitives not used in 30+ days |
harness enrich |
Add topics, token counts, and references to sessions |
harness gate run |
Run verification gates (pre-boot, pre-run, post-session, pre-deploy) |
harness check-rules <action> |
Check an action against loaded rules |
MCP (Model Context Protocol)
| Command | Description |
|---|---|
harness mcp list |
List configured MCP servers and status |
harness mcp test |
Test server connections and list tools |
harness mcp discover |
Scan for servers from Claude Desktop, Cursor, VS Code, etc. |
harness mcp search <query> |
Search the MCP registry |
harness mcp install <query> |
Install an MCP server from the registry |
Installing and Sharing
| Command | Description |
|---|---|
harness install <source> |
Install from file, URL, or source name (auto-detects format) |
harness discover search <query> |
Search all community sources for content |
harness discover env |
Scan .env files for API keys, suggest MCP servers |
harness discover project |
Detect tech stack, suggest rules/skills |
harness bundle <output> |
Pack primitives into a shareable bundle |
harness bundle-install <source> |
Install from a bundle |
harness export [output] |
Export harness to a portable JSON bundle |
harness import <bundle> |
Import a harness bundle |
harness sources list |
List configured community content sources |
Monitoring
| Command | Description |
|---|---|
harness dashboard |
Unified view of health, costs, sessions, workflows |
harness health |
System health status and metrics |
harness costs |
View API spending |
harness analytics |
Session analytics and usage patterns |
harness metrics |
Workflow execution metrics |
Model Aliases
Use -m with a shorthand instead of full OpenRouter model IDs:
| Alias | Model |
|---|---|
gemma |
google/gemma-4-26b-a4b-it |
gemma-31b |
google/gemma-4-31b-it |
qwen |
qwen/qwen3.5-35b-a3b |
glm |
z-ai/glm-4.7-flash |
claude |
anthropic/claude-sonnet-4 |
gpt4o |
openai/gpt-4o |
Tools
Agent Harness uses three tool layers, in priority order:
MCP servers — the primary tool layer. Anything an agent reaches beyond its own files comes from an MCP server: web search, browsers, databases, file systems, code execution. Search the registry with
harness mcp search <query>or detect what's already on your machine withharness mcp discover.Markdown HTTP tools — for trivial REST APIs where spinning up an MCP server is overkill. Drop a markdown file in
tools/with frontmatter, an## Authenticationsection, and an## Operationssection. The harness's HTTP executor calls them directly.Programmatic tools — escape hatch. For latency-critical or harness-internal access where you need a real JS function. Register via
defineAgent().withTool(...).
You almost always want option 1.
MCP Integration
Agent Harness connects to MCP servers to give your agent tools — file access, APIs, databases, and more.
# config.yaml
mcp:
servers:
filesystem:
transport: stdio
command: npx
args: ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/dir"]
my-api:
transport: http
url: https://example.com/mcpAuto-discover servers already on your machine:
harness mcp discover # Scans Claude Desktop, Cursor, VS Code, Cline, etc.
harness mcp search github # Search the MCP registry
harness mcp install github # Install from registry into config.yamlDuring harness init, MCP servers are auto-discovered and added to your config.
Dev Mode and Dashboard
harness dev starts everything at once:
- File watcher — auto-rebuilds indexes when you edit primitives
- Auto-processor — fills missing frontmatter and L0/L1 summaries on save
- Scheduler — runs cron-based workflows
- Web dashboard — browse primitives, chat, view sessions at
localhost:3000
harness dev # Start everything
harness dev --port 8080 # Custom port
harness dev --no-web # Skip dashboard
harness dev --no-auto-process # Skip auto-processingThe dashboard includes: agent status, health checks, spending, sessions, workflows, primitives browser, file editor, MCP status, settings editor, and a chat interface.
Using as a Library
import { createHarness } from '@agntk/agent-harness';
const agent = createHarness({
dir: './my-agent',
apiKey: process.env.OPENROUTER_API_KEY,
});
await agent.boot();
// One-shot
const result = await agent.run('What should I work on today?');
console.log(result.text);
// Streaming
for await (const chunk of agent.stream('Explain this codebase')) {
process.stdout.write(chunk);
}
await agent.shutdown();Fluent Builder API
import { defineAgent } from '@agntk/agent-harness';
const agent = defineAgent('./my-agent')
.model('anthropic/claude-sonnet-4')
.provider('openrouter')
.onBoot(({ config }) => console.log(`Booted ${config.agent.name}`))
.onError(({ error }) => console.error(error))
.build();Configuration
config.yaml:
agent:
name: my-agent
version: "0.1.0"
model:
provider: openrouter # openrouter | anthropic | openai
id: anthropic/claude-sonnet-4 # Any model ID for your provider
max_tokens: 200000
# summary_model: google/gemma-4-26b-a4b-it # Cheap model for auto-generation
# fast_model: google/gemma-4-26b-a4b-it # Fast model for validation
runtime:
scratchpad_budget: 10000
timezone: America/New_York
auto_process: true # Auto-fill frontmatter/summaries on file changes
memory:
session_retention_days: 7
journal_retention_days: 365
# rate_limits:
# per_minute: 10
# per_hour: 100
# per_day: 500
# budget:
# daily_limit_usd: 5.00
# monthly_limit_usd: 100.00
# enforce: trueEnvironment Variables
| Variable | Required | Description |
|---|---|---|
OPENROUTER_API_KEY |
For OpenRouter | Your OpenRouter API key |
ANTHROPIC_API_KEY |
For Anthropic | Direct Anthropic API key |
OPENAI_API_KEY |
For OpenAI | Direct OpenAI API key |
Set one based on your model.provider in config.yaml. The .env file in your harness directory is auto-loaded.
How Context Loading Works
On every run, the harness:
- Loads CORE.md (always, full content)
- Loads state.md (current goals and mode)
- Loads SYSTEM.md (boot instructions)
- Scans all primitive directories, loading files at the appropriate disclosure level based on remaining token budget
- Loads scratch.md if it has content
Total harness overhead is typically ~1,000-3,000 tokens depending on how many primitives you have.
Tested Models
These local-capable models work well with the harness via OpenRouter:
| Model | Speed | Quality | Best For |
|---|---|---|---|
| google/gemma-4-26b-a4b-it | Fast (2s) | Excellent | Default local model |
| google/gemma-4-31b-it | Medium (8s) | Good | Complex reasoning |
| qwen/qwen3.5-35b-a3b | Slow (30s) | Good | Very concise responses |
| z-ai/glm-4.7-flash | Fast (5s) | Good | Natural conversation |
Philosophy
- The agent is the filesystem. Not the code.
- Ownership is law. Every file has exactly one owner.
- You shouldn't need to write code to build a capable agent. But you can.
- Progressive disclosure. Load what you need, at the level you need.
- Agents learn. Instincts evolve. Sessions become journals.
- Infrastructure does bookkeeping. The agent does thinking.
License
MIT