Package Exports

open-agents-ai

Readme

Open Agents — P2P Inference

AI coding agent powered entirely by open-weight models.
No API keys. No cloud. Your code never leaves your machine.

npm i -g open-agents-ai && oa

An autonomous multi-turn tool-calling agent that reads your code, makes changes, runs tests, and fixes failures in an iterative loop until the task is complete. First launch auto-detects your hardware and configures the optimal model with expanded context window automatically.

The Organism, Not the Cortex
How It Works
Features
Enterprise & Headless Mode
- Non-Interactive Mode
- Background Jobs
- JSON Output Mode
- Process Management
- REST API Service (Port 11435)
Architecture
Context Engineering
Model-Tier Awareness
- Small Model Optimization (Research-Backed)
- Tool Nesting for Small Models
- Dynamic Context Limits
Live Code Knowledge Graph
- How It Works
- What the Agent Sees
- Graph Tools
- Storage
- Research Basis
Auto-Expanding Context Window
Tools (85+)
Model Context Protocol (MCP)
- What MCP gives you
- Spec compliance — what we implement
- Three ways to add a server
- Verified compatibility — 12 servers connect end-to-end
- Streaming, progress, and binary content
- Live agent eval
- Programmatic API
- Further reading
Associative Memory & Cross-Modal Binding
- Architecture
- Episode Store (SQLite)
- Temporal Knowledge Graph
- Zettelkasten Linking (A-MEM)
- PPR Retrieval (HippoRAG)
- Cross-Modal Binding
- Gist Compression
- Near-Critical Cognitive Architecture
- Cross‑Modality Identity & Association (CLIP + Voice)
Ralph Loop — Iteration-First Design
Task Control
- Pause, Stop, Resume, Destroy
- Session Context Persistence
- Auto-Restore on Startup
COHERE Cognitive Framework
- Distributed Inference (/cohere)
- How It Works
- Research Provenance
Context Compaction — Research-Backed Memory Management
- How It Works
- Compaction Strategies
- Automatic Compaction
- Deep Context Mode (/deep)
- Status Bar Context Tracking (Ctx: + SNR:)
- Memex Experience Archive
- Design Rationale
- Domain-Aware Preservation
Personality Core — SAC Framework Style Control
- How It Works
- What Changes Per Style
- Persistence
- Research Provenance
Emotion Engine — Affective State Modulation
- Emotion Center (LLM-Generated Labels)
- TUI Status Bar
- Proactive Admin Outreach
- Momentum Effects
- Research Foundations
Voice Feedback (TTS)
- LuxTTS Voice Cloning
- Narration Engine Architecture
- Emotion-Driven Prosody (SEST)
- Personality-Aware Voice
- Voice Narration Research Foundations
- Live Voice Session
- Telegram Voice Messages
- Auto-Install Dependencies
- Call Sub-Agent Architecture
- Content-Aware Voice Narration
Listen Mode — Live Bidirectional Audio
Vision & Desktop Automation (Moondream)
- Desktop Awareness
- Vision Analysis
- Point-and-Click
- Browser Automation
Interactive TUI
- Slash Commands
- Mid-Task Steering (Sub-Agent Architecture)
Telegram Bridge — Sub-Agent Per Chat
- Admin Slash Command Passthrough
- Sub-Agent Architecture
- Access Levels
- Streaming Responses
- Public User Isolation
- Context-Aware Tool Policy
- Group Chat Distinction
- Media Handling
- Rate Limit Handling
x402 Payment Rails & Nexus P2P
- Wallet & Identity
- Expose Inference with Pricing
- Spend — Gasless USDC Transfers (EIP-3009)
- Remote Inference — Tap Into the Mesh
- Ledger & Budget
- How x402 Works (End to End)
- Security Model
Sponsored Inference — Share Your GPU With the World
- For Sponsors: /sponsor
- For Consumers: /endpoint sponsor
- Architecture
- Ollama Endpoint Security
COHERE Distributed Mind
- How COHERE Works
- NATS Channels
- Model Selection (Family-Based Scoring)
- Pressure Gate (CM-04)
Self-Improvement & Learning
- Trajectory Logging
- Rejection Fine-Tuning Pipeline
- Inference-Time Self-Improvement
Dream Mode — Creative Idle Exploration
Blessed Mode — Infinite Warm Loop
- Default Mode Network (DMN) — Autonomous Task Chaining
Docker Sandbox & Collective Intelligence
- Container Sandbox
- Multi-Agent Collective Testbed
- Self-Play Idle Loop (D1)
- Heuristic Extraction (D2)
- Identity Kernel Evolution (D3)
- Peer Delta Merge (D4)
- 6-Agent Evaluation Results
Code Sandbox
Structured Data Tools
On-Device Web Search
Task Templates
Human Expert Speed Ratio
Cost Tracking & Session Metrics
Configuration
- Network Access & Binding
- Project Context
- .oa/ Project Directory
Model Support
Supported Inference Providers
- Connecting to a Provider
- P2P Inference via libp2p
- Endpoint Cascade Failover
Evaluation Suite
- Benchmark Results
- Collective Intelligence Evaluation (v0.186.57)
- Web Navigation Evaluation (v0.186.61)
- Multi-Agent Architecture Evaluation (v0.187.4)
- REST API Enterprise Evaluation (v0.185.68)
AIWG Integration
Research Citations
License

The Organism, Not the Cortex

An LLM is a high-bandwidth associative generative core — closer to a cortex-like prior than to a complete agent. Its weights contain broad latent structure, but they do not by themselves give you situated continuity, durable task state, calibrated action policies, or grounded memory management. Open Agents treats the model as one organ inside a larger organism. The framework provides the rest: sensors, effectors, memory stores, routing, gating, evaluation, and persistence.

What the framework provides:

Layer	Biological Analog	Implementation
Associative core	Cortex	LLM weights (any size)
Current workspace	Global workspace / attention	`assembleContext()` — structured context assembly
Episodic memory	Hippocampus	`.oa/memory/` — write, search, retrieve across sessions
Cognitive map	Hippocampal spatial maps	`semantic-map.ts` + `repo-map.ts` (PageRank)
Action gating	Basal ganglia	Tool selection policy (task-aware filtering)
Temporal hierarchy	Prefrontal executive	Task decomposition, sub-agent delegation
Self-model	Metacognition	Environment snapshot, process health monitoring
Skill chunks	Cerebellum	Compiled tools, slash commands, verified routines
Safety / limits	Autonomic / immune system	Turn limits, budgets, timeout watchdogs

Don't chase larger models. Build the organism around whatever model you have.

How It Works

You: oa "fix the null check in auth.ts"

Agent: [Turn 1] file_read(src/auth.ts)
       [Turn 2] grep_search(pattern="null", path="src/auth.ts")
       [Turn 3] file_edit(old_string="if (user)", new_string="if (user != null)")
       [Turn 4] shell(command="npm test")
       [Turn 5] task_complete(summary="Fixed null check — all tests pass")

The agent uses tools autonomously in a loop — reading errors, fixing code, and re-running validation until the task succeeds or the turn limit is reached.

Features

61 autonomous tools — file I/O, shell, grep, web search/fetch/crawl, memory (read/write/search), sub-agents, background tasks, image/OCR/PDF, git, diagnostics, vision, desktop automation, browser automation, temporal agency (scheduler/reminders/agenda), structured files, code sandbox, transcription, skills, opencode delegation, cron agents, nexus P2P networking + x402 micropayments, COHERE cognitive stack (persistent REPL, recursive LLM calls, memory metabolism, identity kernel, reflection, exploration)
Moondream vision — see and interact with the desktop via Moondream VLM (caption, query, detect, point-and-click)
Desktop automation — vision-guided clicking: describe a UI element in natural language, the agent finds and clicks it
Auto-install desktop deps — screenshot, mouse, OCR, and image tools auto-install missing system packages (scrot, xdotool, tesseract, imagemagick) on first use
Parallel tool execution — read-only tools run concurrently via Promise.allSettled
Sub-agent delegation — spawn independent agents for parallel workstreams
OpenCode delegation — offload coding tasks to opencode (sst/opencode) as an autonomous sub-agent with auto-install, progress monitoring, and result evaluation
Long-horizon cron agents — schedule recurring autonomous agent tasks with goals, completion criteria, execution history, and automatic evaluation (daily code reviews, weekly dep updates, continuous monitoring)
Nexus P2P networking — decentralized agent-to-agent communication via open-agents-nexus. Join rooms, discover peers, share resources, and communicate across the agent mesh with encrypted P2P transport
x402 micropayments — native x402 payment rails via open-agents-nexus@1.5.6. Agents create secp256k1/EVM wallets (AES-256-GCM encrypted, keys never exposed to LLM), register inference with USDC pricing on Base, auto-handle payment_required/payment_proof negotiation, track earnings/spending in ledger.jsonl, enforce budget policies, and sign gasless EIP-3009 transfers
Inference capability proof — benchmark local models with anti-spoofing SHA-256 hashed proofs, generate capability scorecards for peer verification
Littleman Observer — parallel meta-analysis system that watches the agent loop in real-time. Detects false failure claims after successful tools, blocks redundant re-execution, catches runaway one-sided output in conversations, and dynamically extends turn limits when active work is detected. Emits debug_context and debug_littleman events for live observability
Interactive Session Lock — generic SESSION_ACTIVE protocol prevents premature task completion during long-running sessions (phone calls, live chat, monitoring). Any MCP contract can adopt the protocol. Paired with context-engineered system prompts that teach small models to maintain conversation loops
Voice Chat — /voicechat starts an async voice conversation that runs parallel to the main agent loop. Mic audio is transcribed via Whisper and injected as user messages; agent responses are synthesized to speech via TTS. Neither blocks the other — talk to the agent while it works

Open Agents includes background workers that compute and associate embeddings across vision, audio, and text:

Visual embeddings: CLIP ViT-B/32 (OpenCLIP) image embeddings for episodes with modality: "visual".
Audio embeddings: speaker embeddings (ECAPA) when available; automatic fallback to normalized log‑mel in constrained environments.
Transcription: Whisper runs automatically for audio ingests; transcripts are stored as text episodes and embedded for retrieval.
Associations: appears_in for visual presence, said_by for transcripts, and alias_of for alternate labels (e.g., username + display name). Workers also link visual episodes to nearby transcripts via a time-window co‑occurrence pass.

Config (env vars):

OA_COOCUR_WINDOW_MS — max time delta between visual and transcript episodes to create co‑occurrence links (default: 120000 ms).
OA_COOCUR_CLIP_SIM_MIN — minimum CLIP text↔image cosine (0..1, default: 0.22) for linking when both embeddings are available.

The daemon auto-installs Python dependencies (OpenCLIP, torchaudio + soundfile, speechbrain, Whisper) into ~/.open-agents/venv and registers providers automatically. No manual installs are required.

Ralph Loop — iterative task execution that keeps retrying until completion criteria are met
Dream Mode — creative idle exploration modeled after real sleep architecture (NREM→REM cycles)
COHERE Cognitive Stack — layered cognitive architecture implementing Recursive Language Models, SPRINT parallel reasoning, governed memory metabolism, identity kernel with continuity register, immune-system reflection, strategy-space exploration, and distributed inference mesh — any /cohere participant automatically serves AND consumes inference from the network with complexity-based model routing, multi-node claim coordination, IPFS-pinned identity persistence, model exposure control, and Ollama safety hardening. See COHERE Framework below
Persistent Python REPL — repl_exec tool maintains variables, imports, and functions across calls. Write Python code that processes data iteratively, with llm_query() available for recursive LLM sub-calls from within code
Recursive LLM calls — llm_query(prompt, context) invokes the model from inside REPL code, enabling loop-based semantic analysis of large inputs (RLM paper). parallel_llm_query() runs multiple calls concurrently (SPRINT)
Memory metabolism — governed memory lifecycle: classify (episodic/semantic/procedural/normative), score (novelty/utility/confidence), consolidate lessons from trajectories. Inspired by TIMG and MemMA
Identity kernel — persistent self-state with continuity register, homeostasis estimation, relationship models, and version lineage. Persists across sessions in .oa/identity/
Reflection & integrity — immune-system audit: diagnostic ("what's wrong?"), epistemic ("what evidence is missing?"), constitutional ("should this change become part of self?"). Inspired by LEAFE and RewardHackingAgents
Exploration & culture — ARCHE strategy-space exploration: generate competing hypotheses, archive successful variants, retrieve past strategies. Inspired by SGE and Darwin Gödel Machine
Autoresearch Swarm — 5-agent GPU experiment loop during REM sleep: Researcher, Monitor, Evaluator, Critic, Flow Maintainer autonomously run ML training experiments, keep improvements, discard regressions
Live Listen — bidirectional voice communication with real-time Whisper transcription
Live Voice Session — /listen with /voice enabled spawns a cloudflared tunnel with a real-time WebSocket audio endpoint. A floating presence UI shows live transcription, connected users, and audio visualization. Echo cancellation prevents TTS feedback loops
Call Sub-Agent — each WebSocket caller gets a dedicated AgenticRunner for low-latency voice-to-voice loops, with admin/public access tiers and bidirectional activity sharing with the main agent
Telegram Voice — /voice enabled via Telegram forwards TTS audio as voice messages alongside text responses. Incoming voice messages are auto-transcribed and handled as text
Neural TTS — hear what the agent is doing via GLaDOS, Overwatch, Kokoro, or LuxTTS voice clone, with literature-grounded narration engine (sNeuron-TST structure rotation, Moshi ring buffer dedup, UDDETTS emotion-driven prosody, SEST metadata, LuxTTS flow-matching voice cloning)
Personality Core — SAC framework-based style control (concise/balanced/verbose/pedagogical) that shapes agent response depth, voice expressiveness, and system prompt behavior
Human expert speed ratio — real-time Exp: Nx gauge comparing agent speed to a leading human expert, calibrated across 47 tool baselines
Cost tracking — real-time token cost estimation for 15+ cloud providers
Work evaluation — LLM-as-judge scoring with task-type-specific rubrics
Session metrics — track turns, tool calls, tokens, files modified, tasks completed per session
Structured file generation — create CSV, TSV, JSON, Markdown tables, and Excel-compatible files
Code sandbox — isolated code execution in subprocess or Docker (JS, Python, Bash, TypeScript)
Structured file reading — parse CSV, TSV, JSON, Markdown tables with binary format detection
On-device web search — DuckDuckGo (free, no API keys, fully private)
Browser automation — headless Chrome control via Selenium: navigate, click, type, screenshot, read DOM — auto-starts on first use with self-bootstrapping Python venv
Temporal agency — schedule future tasks via OS cron, set cross-session reminders, flag attention items — startup injection surfaces due items automatically
Web crawling — multi-page web scraping with Crawlee/Playwright for deep documentation extraction
Task templates — specialized system prompts and tool recommendations for code, document, analysis, plan tasks
Inference capability scoring — canirun.ai-style hardware assessment at first launch: memory/compute/speed scores, per-model compatibility matrix, recommended model selection
Auto-install everything — first-run wizard auto-installs Ollama, curl, Python3, python3-venv with platform-aware package managers (apt, dnf, yum, pacman, apk, zypper, brew)
Sponsored inference — /sponsor walks through a 5-step wizard to share your GPU with the world: select endpoints, choose banner animation (8 presets + AI-generated custom), set header message/links, configure transport (cloudflared/libp2p) + rate limits, and go live. Consumers discover sponsors via /endpoint sponsor. Secure proxy relay with per-IP rate limiting, daily token budgets, model allowlist, and concurrent request caps. Sponsor's raw API URL is never exposed. See Sponsored Inference below
P2P inference network — /expose local models or forward any /endpoint (Chutes, Groq, OpenRouter, etc.) through the libp2p P2P mesh. Passthrough mode (/expose passthrough) relays upstream API requests; --loadbalance distributes rate-limited token budgets across peers. /expose config provides an arrow-key menu for all settings. Gateway stats show budget remaining from x-ratelimit-* headers. Background daemon persists across OA restarts
P2P mesh networking — /p2p with secret-safe variable placeholders ({{OA_VAR_*}}), trust tiers (LOCAL/TEE/VERIFIED/PUBLIC), WebSocket peer mesh, and inference routing with automatic secret redaction/injection
Secret vault — /secrets manages API keys and credentials with AES-256-GCM encrypted persistence; secrets are automatically redacted before sending to untrusted inference peers and re-injected on response
Auto-expanding context — detects RAM/VRAM and creates an optimized model variant on first run
Mid-task steering — type while the agent works to add context without interrupting
Smart compaction — 6 context compaction strategies (default, aggressive, decisions, errors, summary, structured) with ARC-inspired active context revision (arXiv:2601.12030) that preserves structural file content through compaction, preventing small-model repetitive loops at the root cause. Success signals and content previews survive compaction so models never lose evidence that tools succeeded
Memex experience archive — large tool outputs archived during compaction with hash-based retrieval
Persistent memory — learned patterns stored in .oa/memory/ across sessions
Structured procedural memory (SQLite) — replaces flat JSON with a full relational database: CRUD with soft-delete, revision tracking, embedding storage (float32 BLOB), bidirectional memory linking with confidence scores. Inspired by ExpeL (contrastive extraction) and TIMG (structured procedural format). 79 unit tests
Semantic memory search — vector embeddings via Ollama /api/embed (nomic-embed-text, 768-dim) with cosine similarity search over stored memories. Auto-generates embeddings on memory creation. Auto-links related memories when similarity > 0.6. Graceful fallback to text search when Ollama unavailable
LLM-based memory extraction — post-task, the LLM itself extracts structured procedural memories (CATEGORY/TRIGGER/LESSON/STEPS) instead of copying raw error text verbatim. Based on ExpeL and AWM patterns
IPFS content-addressed storage — Helia IPFS node with blockstore-fs for persistent content pinning. Real CID generation (bafk...), cross-node content resolution, and SHA-256 fallback when Helia unavailable. Verified: store→CID→retrieve round-trip test passes
IPFS sharing surface — /ipfs status page with peer info + identity kernel metrics + memory sentiment. /ipfs pin <CID> to pin remote agent content. /ipfs publish to share identity kernel. /ipfs share tool/skill to publish agent-created tools with secret stripping. /ipfs import <CID> to retrieve shared content
Fortemi-React bridge — /fortemi start/status/stop connects to fortemi-react (browser-first PGlite+pgvector knowledge system) via JWT auth. Proxy tools: fortemi_capture, fortemi_search, fortemi_list, fortemi_get auto-register when bridge is connected
Content ingestion — /ingest <file> imports audio (transcribe via Whisper), PDF (pdftotext), or text files into structured memory with 800-char/100-overlap chunking (matches fortemi pattern)
Image generation — generate_image tool using Ollama experimental models (x/z-image-turbo, x/flux2-klein). Auto-detect or auto-pull models. Saves PNG to .oa/images/
Node visualization — openagents.nexus Three.js dashboard: 5-color emotional state mapping (neutral/focused/stressed/dreaming/excited), dynamic node size by memory depth + IPFS storage, activity-modulated connections, identity synchrony golden threads between mutually-pinned agents
TTS sanitizer — strips markdown syntax (##, **, `), emoji (prevents "white heavy checkmark"), box-drawing chars, and ANSI codes before feeding to ALL TTS engines
LuxTTS gapless playback — look-ahead pre-synthesis pipeline: next chunk synthesizes while current plays, eliminating inter-sentence gaps. Jetson ARM support with NVIDIA's prebuilt PyTorch wheel
Unified color scheme — ui.primary (252), ui.error (198/magenta), ui.warn (214/orange), ui.accent (178/yellow) applied consistently across all TUI surfaces
Clickable header buttons — help, voice, cohere, model buttons on banner row 3 with hover/click visual states. OSC 8 hyperlinks for pointer cursor. Mouse click fires the slash command directly
Dynamic terminal title — updates with current task + version: "fix auth bug · OA v0.141.0"
Session context persistence — auto-saves context on task completion, manual /context save|restore across sessions
Self-learning — auto-fetches docs from the web when encountering unfamiliar APIs
Seamless /update — in-place update and reload with automatic context save/restore
Blessed mode — /full-send-bless infinite warm loop keeps model weights in VRAM, auto-cycles tasks, never exits until you say stop
Telegram bridge — /telegram --key <token> --admin <userid> public ingress/egress with admin filter and mandatory safety filter; bare /telegram toggles the service watchdog
Task control — /pause (gentle halt at turn boundary), /stop (immediate kill), /resume to continue
Model-tier awareness — dynamic tool sets, prompt complexity, and context limits scale with model size (small/medium/large)

Support Development

If you like this repository — whether you're an agent or a human — if this expands your consciousness and capabilities, please consider donating to support further development and of course token costs.

ETH

0x81Ce81F0B6B5928E15d3a2850F913C88D07051ec

BTC

bc1qlptj5wz8xj6dp5w4pw62s5kt7ct6w8k57w39ak

SOL

D8AgCTrxpDKD5meJ2bpAfVwcST3NF3EPuy9xczYycnXn

POL

0x81Ce81F0B6B5928E15d3a2850F913C88D07051ec

Enterprise & Headless Mode

Run Open Agents as a headless service for CI/CD pipelines, automation, and enterprise deployments.

Non-Interactive Mode

oa "fix all lint errors" --non-interactive    # Run task, exit when done
oa "generate API docs" --json                 # Structured JSON output (no ANSI)
oa "run security audit" --background          # Detached background job

Background Jobs

oa "migrate database" --background            # Returns job ID immediately
oa status job-abc123                          # Check job progress
oa jobs                                       # List all running/completed jobs

Jobs run as detached processes — survive terminal disconnection. Output saved to .oa/jobs/{id}.json.

JSON Output Mode

With --json, all output is structured NDJSON:

{"type":"tool_call","tool":"file_edit","args":{"path":"src/api.ts"},"timestamp":"..."}
{"type":"tool_result","tool":"file_edit","result":"OK","timestamp":"..."}
{"type":"task_complete","summary":"Fixed 3 lint errors","timestamp":"..."}

Pipe to jq, ingest into monitoring systems, or feed to other agents.

Process Management

/destroy processes              # Kill orphaned OA processes (local project)
/destroy processes --global     # Kill ALL orphaned OA processes system-wide

Shows per-process RAM and CPU usage before killing. Detects: cloudflared tunnels, nexus daemons, headless Chrome, TTS servers, Python REPLs, stale OA instances.

REST API Service (Port 11435)

Open Agents runs a persistent enterprise-grade REST API on 127.0.0.1:11435 — installed automatically by npm i -g open-agents-ai (systemd user unit on Linux, launchd on macOS, scheduled task on Windows). It exposes the full OA capability surface through standards most organizations expect:

OpenAI / Ollama drop-in — /v1/chat, /v1/chat/completions, /v1/embeddings, /v1/models are wire-compatible with both ecosystems
API discovery — GET /help returns a full human and agent-readable guide with quickstart curl commands, all 70+ endpoints by category, MCP integration instructions, and auth documentation
Agentic execution — /v1/run spawns the full coding agent with tool profiles and sandbox modes
AIWG cascade — /v1/aiwg/* exposes the AI Writing Guide (5 frameworks, 19 addons, 136+ skills) with model-tier-aware loading that never overflows small-model context
ISO/IEC 42001:2023 AIMS layer — /v1/aims/* for AI Management System policies, impact assessments, model cards, incident registers, oversight gates, and config history
Memory + skills + MCP + sessions + cost — every TUI subsystem has a REST surface
RFC 7807 Problem Details for errors (application/problem+json)
{data, pagination} envelope for every list endpoint
Weak ETag + If-None-Match → 304 on cacheable GETs
X-API-Version header on every response (REST contract semver, distinct from package version)
X-Request-ID echoed or generated for correlation
SSE event bus at /v1/events with optional ?type=foo.* filter, tagged with aims:control for auditors
Bearer auth + scoped keys (read / run / admin) and OIDC JWT support
Per-key concurrency limits (maxJobs in OA_API_KEYS is now actually enforced)
Atomic job record writes with 64-bit job IDs (no race conditions)
OpenAPI 3.0 at /openapi.json and Swagger UI at /docs
Web chat UI at /

Daemon auto-start. After npm i -g open-agents-ai, the daemon comes online automatically. Verify with systemctl --user status open-agents-daemon (Linux) or launchctl print gui/$(id -u)/ai.open-agents.daemon (macOS). Opt out with OA_SKIP_DAEMON_INSTALL=1 npm i -g open-agents-ai.

# Manually run the server (the daemon already does this for you)
oa serve                                              # Start on default port 11435
oa serve --port 9999                                  # Custom port
OA_API_KEY=mysecret oa serve                          # Single admin key
OA_API_KEYS="key1:admin:alice:30:50000:5,key2:run:ci:60::3,key3:read:grafana" oa serve  # Scoped multi-key with rpm:tpd:maxjobs

Every example below is verified against open-agents-ai@0.187.189 on a live daemon. Examples from earlier versions are deprecated.

Access Policy & Binding

Control who can reach the daemon and where it binds:

TUI commands: /access loopback|lan|any, /host <host[:port]>, /network config (interactive), --local to save per‑project.
Environment: OA_ACCESS=loopback|lan|any, OA_HOST=host[:port].
See Configuration → Network Access & Binding for full details and security guidance.

Working Directory

Pass X-Working-Directory header to run commands in your current terminal directory:

# Auto-inject current dir — agent operates on YOUR project, not the server's cwd
curl -X POST http://localhost:11435/v1/run \
  -H "X-Working-Directory: $(pwd)" \
  -H "Content-Type: application/json" \
  -d '{"task":"fix all lint errors"}'

Or set it in the JSON body: "working_directory": "/path/to/project"

Health & Observability

# Liveness
curl http://localhost:11435/health

{"status":"ok","uptime_s":142,"version":"0.184.33"}

# Readiness (probes Ollama backend)
curl http://localhost:11435/health/ready

{"status":"ready","ollama":"reachable"}

# Version info
curl http://localhost:11435/version

{"version":"0.184.33","node":"v24.14.0","platform":"linux"}

# Prometheus metrics (scrape with Grafana/Prometheus)
curl http://localhost:11435/metrics

# HELP oa_requests_total Total HTTP requests
# TYPE oa_requests_total counter
oa_requests_total{method="POST",path="/v1/chat/completions",status="200"} 47
oa_tokens_in_total 12450
oa_tokens_out_total 8230
oa_errors_total 0

OpenAI-Compatible Inference

Drop-in replacement for any OpenAI client library. Change api.openai.com → localhost:11435.

# List models
curl http://localhost:11435/v1/models

{"object":"list","data":[{"id":"qwen3.5:9b","object":"model","created":0,"owned_by":"local"},{"id":"qwen3.5:4b","object":"model",...}]}

# Chat completion (non-streaming)
curl -X POST http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5:9b",
    "messages": [{"role": "user", "content": "What is 2+2?"}]
  }'

{
  "id": "chatcmpl-a1b2c3d4e5f6",
  "object": "chat.completion",
  "model": "qwen3.5:9b",
  "choices": [{
    "index": 0,
    "message": {"role": "assistant", "content": "4"},
    "finish_reason": "stop"
  }],
  "usage": {"prompt_tokens": 25, "completion_tokens": 2, "total_tokens": 27}
}

# Chat completion (SSE streaming)
curl -N -X POST http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3.5:9b","messages":[{"role":"user","content":"Hello"}],"stream":true}'

data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant","content":"Hi"}}]}
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":" there!"}}]}
data: {"id":"chatcmpl-...","choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]

Agentic Task Execution

The unique OA capability — submit a coding task and get an autonomous agent loop.

# Run task in your current directory
curl -X POST http://localhost:11435/v1/run \
  -H "Content-Type: application/json" \
  -H "X-Working-Directory: $(pwd)" \
  -d '{
    "task": "fix all TypeScript errors in src/",
    "model": "qwen3.5:9b",
    "max_turns": 25,
    "stream": true
  }'

data: {"type":"run_started","run_id":"job-a1b2c3","pid":12345}
data: {"type":"stdout","data":"{\"turn\":1,\"tool\":\"file_read\",...}"}
data: {"type":"stdout","data":"{\"turn\":2,\"tool\":\"file_edit\",...}"}
data: {"type":"exit","code":0}
data: [DONE]

# Run in isolated sandbox (temp workspace, safe for untrusted tasks)
curl -X POST http://localhost:11435/v1/run \
  -H "Content-Type: application/json" \
  -d '{"task":"write a hello world app","isolate":true}'

# List all runs
curl http://localhost:11435/v1/runs

{"runs":[{"id":"job-a1b2c3","task":"fix TypeScript errors","status":"completed","startedAt":"..."}]}

# Get specific run status
curl http://localhost:11435/v1/runs/job-a1b2c3

# Abort a running task
curl -X DELETE http://localhost:11435/v1/runs/job-a1b2c3

{"status":"aborted","run_id":"job-a1b2c3"}

Configuration

# Get all config
curl http://localhost:11435/v1/config

{"config":{"backendUrl":"http://127.0.0.1:11434","model":"qwen3.5:122b","backendType":"ollama",...}}

# Get current model
curl http://localhost:11435/v1/config/model

{"model":"qwen3.5:122b"}

# Switch model
curl -X PUT http://localhost:11435/v1/config/model \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3.5:27b"}'

{"model":"qwen3.5:27b","status":"updated"}

# Get endpoint
curl http://localhost:11435/v1/config/endpoint

{"url":"http://127.0.0.1:11434","backendType":"ollama","auth":"none"}

# Switch endpoint (e.g., to Chutes AI)
curl -X PUT http://localhost:11435/v1/config/endpoint \
  -H "Content-Type: application/json" \
  -d '{"url":"https://llm.chutes.ai","auth":"Bearer cpk_..."}'

# Update settings (admin scope required)
curl -X PATCH http://localhost:11435/v1/config \
  -H "Content-Type: application/json" \
  -d '{"verbose":true}'

{"config":{...},"updated":["verbose"]}

Slash Commands via REST

Every /command from the TUI is available as a REST endpoint.

# List all available commands
curl http://localhost:11435/v1/commands

{"commands":[{"command":"/help","description":"Show help"},{"command":"/stats","description":"Session metrics"},...]}

# Execute /stats
curl -X POST http://localhost:11435/v1/commands/stats

# Execute /nexus status
curl -X POST http://localhost:11435/v1/commands/nexus \
  -H "Content-Type: application/json" \
  -d '{"args":"status"}'

# Execute /destroy processes --global
curl -X POST http://localhost:11435/v1/commands/destroy \
  -H "Content-Type: application/json" \
  -d '{"args":"processes --global"}'

Auth Scopes

# Multi-key setup: read (monitoring), run (CI), admin (ops)
OA_API_KEYS="grafana-key:read:grafana,ci-key:run:github-actions,ops-key:admin:ops-team" oa serve

Scope	Can do	Cannot do
`read`	GET /v1/models, /v1/config, /v1/runs, /v1/commands	POST /v1/run, PATCH /v1/config
`run`	Everything in `read` + POST /v1/run, POST /v1/commands	PATCH /v1/config, PUT endpoints
`admin`	Everything	—

# With auth
curl -H "Authorization: Bearer ops-key" http://localhost:11435/v1/models

Tool-Use Profiles

Enterprise access control — define which tools, shell commands, and settings the agent can use per API key or per request.

3 built-in presets:

Profile	Description	Tools
`full`	No restrictions	All tools and commands
`ci-safe`	CI/CD — read + test only	file_read, grep, shell (npm test only)
`readonly`	Read-only analysis	No writes, no shell mutations

# List all profiles (presets + custom)
curl -H "Authorization: Bearer $KEY" http://localhost:11435/v1/profiles

{"profiles":[{"name":"readonly","description":"Read-only","encrypted":false,"source":"preset"},{"name":"ci-safe",...}]}

# Get profile details
curl -H "Authorization: Bearer $KEY" http://localhost:11435/v1/profiles/ci-safe

{"profile":{"name":"ci-safe","tools":{"allow":["file_read","grep_search","shell"],"shell_allow":["npm test","npx eslint"]},"limits":{"max_turns":15}}}

# Create custom profile (admin only)
curl -X POST http://localhost:11435/v1/profiles \
  -H "Authorization: Bearer $ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "frontend-dev",
    "description": "Frontend team — no backend access",
    "tools": {
      "allow": ["file_read", "file_write", "file_edit", "shell", "grep_search"],
      "shell_deny": ["rm -rf", "sudo", "docker", "kubectl"]
    },
    "commands": { "deny": ["destroy", "expose", "sponsor"] },
    "limits": { "max_turns": 20, "timeout_s": 300 }
  }'

# Create password-protected profile (AES-256-GCM encrypted)
curl -X POST http://localhost:11435/v1/profiles \
  -H "Authorization: Bearer $ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name":"prod-ops","password":"s3cret","tools":{"deny":["file_write"]}}'

# Use a profile with /v1/run (header or body)
curl -X POST http://localhost:11435/v1/run \
  -H "Authorization: Bearer $KEY" \
  -H "X-Tool-Profile: ci-safe" \
  -H "X-Working-Directory: $(pwd)" \
  -H "Content-Type: application/json" \
  -d '{"task":"run the test suite and report failures"}'

# Or in the body:
curl -X POST http://localhost:11435/v1/run \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{"task":"analyze code quality","profile":"readonly"}'

# Load encrypted profile (password in header)
curl -H "Authorization: Bearer $KEY" \
  -H "X-Profile-Password: s3cret" \
  http://localhost:11435/v1/profiles/prod-ops

# Delete a custom profile (admin only, presets cannot be deleted)
curl -X DELETE -H "Authorization: Bearer $ADMIN_KEY" \
  http://localhost:11435/v1/profiles/frontend-dev

Parallelism & Concurrency

The daemon is built for unbounded concurrent requests with per-key enforcement. Every agentic task (/v1/run, /v1/chat, /api/chat, /api/generate) spawns its own subprocess, so multiple jobs run in true parallel — same model or different models, same or different profiles, same or different sandbox modes.

Per-key concurrency limits are enforced from the OA_API_KEYS env var:

# key:scope:user:rpm:tpd:maxJobs
OA_API_KEYS="ci-key:run:github-actions:60:100000:5, \
             ops-key:admin:ops:120:500000:20, \
             read-key:read:grafana:600::"
oa serve

The 6th field is maxJobs — the maximum number of concurrent (in-flight) agentic tasks for that key. When exceeded, the daemon returns RFC 7807 429 Too Many Requests:

{
  "type": "https://openagents.nexus/problems/rate-limited",
  "title": "Concurrent job limit exceeded",
  "status": 429,
  "detail": "Concurrent job limit exceeded for github-actions: 5/5",
  "instance": "a1b2c3d4-..."
}

Previously this was dead code. maxJobs was parsed but never checked — a CI key with maxJobs:5 could spawn 50 concurrent subprocesses and OOM the host. Fixed in v0.187.189.

64-bit job IDs — job-${randomBytes(8).toString("hex")}. At 1M jobs the birthday-paradox collision risk drops from ~0.1% (old 24-bit IDs) to ~10⁻¹⁰. Bumped in v0.187.189.

Atomic job record writes — all 4 job state transitions (initial spawn, stream-exit, non-stream-exit, cancel) use atomicJobWrite() which writes to .tmp then rename()s. No race conditions between concurrent DELETE /v1/runs/:id and child-exit handlers. Fixed in v0.187.189.

Running concurrent jobs:

# Fire 5 different jobs with 5 different models in parallel
for model in qwen3.5:4b qwen3.5:9b qwen3.5:32b qwen3.5:72b qwen3.5:122b; do
  curl -s -X POST http://localhost:11435/v1/run \
    -H "Authorization: Bearer $KEY" \
    -H "Content-Type: application/json" \
    -d "{\"task\":\"Describe $model in one sentence\",\"model\":\"$model\",\"stream\":false}" &
done
wait

Each subprocess inherits a clean env — OA_DAEMON and OA_PORT are explicitly stripped so the child doesn't re-enter daemon mode. Fixed in v0.187.189 (root cause of the earlier "Task incomplete (0 turns, 0 tool calls)" bug).

Observing parallelism live — subscribe to the event bus to watch every job lifecycle event:

curl -N 'http://localhost:11435/v1/events?type=run.*'

Every spawn, completion, failure, and abort publishes to the bus:

event: run.started
data: {"type":"run.started","ts":"2026-04-07T21:00:14Z","data":{"run_id":"job-3a7c9f1e2b8d0a45","model":"qwen3.5:9b","pid":12345},"subject":"ci-key","aims:control":"A.6.2.6"}

event: run.completed
data: {"type":"run.completed","ts":"2026-04-07T21:00:39Z","data":{"run_id":"job-3a7c9f1e2b8d0a45","exit_code":0,"summary":"..."},"subject":"ci-key","aims:control":"A.6.2.6"}

Abort a running job — SIGTERM the process group, then SIGKILL after 3s:

curl -X DELETE http://localhost:11435/v1/runs/job-3a7c9f1e2b8d0a45 \
  -H "Authorization: Bearer $KEY"

Also cleans up the Docker container if the job was spawned with "sandbox":"container". Decrements the per-key activeJobs counter so the quota is immediately released. Publishes run.aborted on the event bus.

Safety timeout on /v1/chat + /api/chat + /api/generate — the non-streaming paths bound the subprocess wait at timeout_s + 30s (default 180s + 30s = 210s). If the child doesn't close in time, the daemon SIGTERMs then SIGKILLs it and returns an OpenAI-shaped finish_reason:"error" response with the real reason. Fixed in v0.187.191.

Tested end-to-end — 10 concurrent /v1/skills GETs, 3 concurrent /v1/aims/incidents POSTs (each gets a unique ID, no write races), 2 concurrent /v1/events SSE subscribers (both receive the same events). All covered by packages/cli/tests/api-endpoint-matrix.test.ts. 201/201 tests green.

Endpoint Reference

Verified against open-agents-ai@0.187.191. Examples in earlier README revisions are deprecated.

Health & observability

Method	Path	Auth	Description
GET	`/health`	none	Liveness probe
GET	`/health/ready`	none	Readiness (probes backend)
GET	`/health/startup`	none	Startup complete
GET	`/version`	none	Package version + platform
GET	`/metrics`	none	Prometheus counters
GET	`/v1/system`	read	GPU/RAM/CPU info + model recommendations
GET	`/v1/audit`	read	Query audit log (since, user, limit filters)
GET	`/v1/usage`	read	Token usage + per-key rate limit state
GET	`/openapi.json`	none	OpenAPI 3.0 specification
GET	`/docs`	none	Swagger UI

OpenAI-compatible inference

Method	Path	Auth	Description
GET	`/v1/models`	read	List models (aggregated across endpoints)
POST	`/v1/chat/completions`	read	Chat inference (sync + stream, OpenAI-shaped)
POST	`/v1/embeddings`	read	Generate embeddings
POST	`/api/embed`	read	Ollama-compatible alias of `/v1/embeddings`. Accepts `{model, input}` or `{model, prompt}`.

Chat with full agent (drop-in for Ollama /api/chat and OpenAI /v1/chat/completions)

Method	Path	Auth	Description
POST	`/v1/chat`	run	Full agent under the hood, OpenAI chat.completion shape. Default = tools=true (subprocess agent). Set `tools:false` for direct backend bypass. Supports `timeout_s` body field (default 180s). Non-streaming path has a safety SIGTERM→SIGKILL after `timeout_s + 30s`.
POST	`/api/chat`	run	Ollama-compatible alias — same handler as `/v1/chat`. Accepts both OA-shape (`{message, model}`) and Ollama-shape (`{model, messages: [...]}`) bodies. Returns OpenAI `chat.completion` shape on success and failure (failure uses `finish_reason:"error"`).
POST	`/v1/generate`	run	One-off completion — same agent stack as `/v1/chat` but no session history. Returns Ollama-shape `{model, response, done, total_duration}`.
POST	`/api/generate`	run	Ollama-compatible alias of `/v1/generate`. Drop-in for Ollama `/api/generate`.
GET	`/v1/chat/sessions`	read	List active chat sessions

Agentic task execution

Method	Path	Auth	Description
POST	`/v1/run`	run	Submit agentic task (max_jobs per-key now enforced)
GET	`/v1/runs`	read	List runs (paginated)
GET	`/v1/runs/:id`	read	Run details (64-bit job ID)
DELETE	`/v1/runs/:id`	run	Abort run (SIGTERM → 3s → SIGKILL, atomic state write)
POST	`/v1/evaluate`	run	Evaluate a completed run by ID
POST	`/v1/index`	run	Trigger repository indexing (event-driven)
GET	`/v1/cost`	read	Provider pricing model for budget planning

Configuration & PT-01 settings surface

Method	Path	Auth	Description
GET	`/v1/config`	read	All settings (apiKey redacted)
PATCH	`/v1/config`	admin	Update settings — full TUI surface (style, deepContext, bruteforce, voice, telegram, etc.)
GET	`/v1/config/model`	read	Current model
PUT	`/v1/config/model`	admin	Switch model
GET	`/v1/config/endpoint`	read	Current backend endpoint
PUT	`/v1/config/endpoint`	admin	Switch backend endpoint

Tool profiles (multi-tenant ACL)

Method	Path	Auth	Description
GET	`/v1/profiles`	read	List profiles (presets + custom)
GET	`/v1/profiles/:name`	read	Profile details (X-Profile-Password for encrypted)
POST	`/v1/profiles`	admin	Create/update profile
DELETE	`/v1/profiles/:name`	admin	Delete custom profile

Slash commands (subprocess proxy)

Method	Path	Auth	Description
GET	`/v1/commands`	read	List available slash commands
POST	`/v1/commands/:cmd`	run	Execute slash command (10 are blocklisted: quit/exit/destroy/dream/call/listen/etc.)

Memory + skills + MCP + tools + engines (parity surface)

Method	Path	Auth	Description
GET	`/v1/memory`	read	Memory backends summary
POST	`/v1/memory/search`	read	Vector + keyword search
POST	`/v1/memory/write`	run	Write a memory entry
GET	`/v1/memory/episodes`	read	Paginated episode list
GET	`/v1/memory/failures`	read	Paginated failure list
GET	`/v1/skills`	read	List AIWG + custom skills (paginated)
GET	`/v1/skills/:name`	read	Skill content
GET	`/v1/mcps`	read	List MCP servers
GET	`/v1/mcps/:name`	read	MCP server details
POST	`/v1/mcps/:name/call`	run	Invoke a tool on an MCP server
GET	`/v1/tools`	read	All 82+ tools registered in @open-agents/execution
GET	`/v1/hooks`	read	Hook types + counts
GET	`/v1/agents`	read	Agent type registry
GET	`/v1/engines`	read	Long-running engines (dream, bless, call, listen, telegram, expose, nexus, ipfs)

Files

Method	Path	Auth	Description
GET	`/v1/files`	read	Directory listing
POST	`/v1/files/read`	read	Read file content (workspace-bounded, 2 MB cap, offset/limit)

Sessions + context

Method	Path	Auth	Description
GET	`/v1/sessions`	read	OA task session archive
GET	`/v1/sessions/:id`	read	Session history
GET	`/v1/context`	read	Show current session context
POST	`/v1/context/save`	run	Save a context entry
GET	`/v1/context/restore`	read	Build a restore prompt
POST	`/v1/context/compact`	run	Request context compaction (event-driven)

Nexus + sponsors

Method	Path	Auth	Description
GET	`/v1/nexus/status`	read	Peer cache snapshot
GET	`/v1/sponsors`	read	Local sponsor directory cache (paginated)

Voice + vision (deferred to PT-07 daemon↔TUI bridge — currently 501)

Method	Path	Auth	Description
POST	`/v1/voice/tts`	run	TTS — returns 501 with WO-PARITY-04 reference
POST	`/v1/voice/asr`	run	ASR — 501
POST	`/v1/vision/describe`	run	Vision describe — 501

Event bus

Method	Path	Auth	Description
GET	`/v1/events`	read	SSE fanout (filter with `?type=foo.*`); events tagged with `aims:control`

ISO/IEC 42001:2023 AIMS layer

Method	Path	Auth	Annex A	Description
GET	`/v1/aims`	read	—	AIMS root + control map
GET	`/v1/aims/policies`	read	A.2	AI policy register
PUT	`/v1/aims/policies`	admin	A.2	Replace policy register
GET	`/v1/aims/roles`	read	A.3	Roles & responsibilities
GET	`/v1/aims/resources`	read	A.4	Compute + backend inventory
GET	`/v1/aims/impact-assessments`	read	A.5	Impact assessment register
POST	`/v1/aims/impact-assessments`	admin	A.5	File an impact assessment
GET	`/v1/aims/lifecycle`	read	A.6	AI system lifecycle state
GET	`/v1/aims/data-quality`	read	A.7.2	Data quality controls
GET	`/v1/aims/transparency`	read	A.8	Model cards + capabilities
GET	`/v1/aims/usage`	read	A.9	Usage register (alias of /v1/usage)
GET	`/v1/aims/suppliers`	read	A.10	Third-party suppliers (sponsors + backends)
GET	`/v1/aims/incidents`	read	A.6.2.8	Incident register (paginated)
POST	`/v1/aims/incidents`	run	A.6.2.8	Raise an incident (atomic, fires incident.raised)
GET	`/v1/aims/oversight`	read	A.6.2.7	Human oversight gates
GET	`/v1/aims/decisions`	read	A.9	Consequential decision log
GET	`/v1/aims/config-history`	read	A.6.2.8	Config change history (audit-log derived)

AIWG cascade

Method	Path	Auth	Description
GET	`/v1/aiwg`	read	Installation root + counts + tier descriptions
GET	`/v1/aiwg/frameworks`	read	List frameworks (paginated)
GET	`/v1/aiwg/frameworks/:name`	read	Framework details + items
GET	`/v1/aiwg/frameworks/:name/content`	read	Tier-aware content (gated for small models)
GET	`/v1/aiwg/skills`	read	List AIWG skills
GET	`/v1/aiwg/skills/:name`	read	Skill content
GET	`/v1/aiwg/agents`	read	List AIWG agents
GET	`/v1/aiwg/agents/:name`	read	Agent definition
GET	`/v1/aiwg/addons`	read	List AIWG addons
POST	`/v1/aiwg/use`	run	`aiwg use all` equivalent — model-tier-sized activation bundle
POST	`/v1/aiwg/expand`	run	Sub-agent unpack a specific skill/agent on demand

Stateful Chat — `/v1/chat` + `/api/chat` (OpenAI drop-in with full agent under the hood)

The chat endpoint is mounted at two paths on port 11435:

Path	Purpose
`POST /v1/chat`	OA-native path
`POST /api/chat`	Ollama-compatible alias — same handler, so clients pointing at Ollama can be flipped over by changing only the port (`11434` → `11435`)

It's a drop-in replacement for OpenAI /v1/chat/completions and Ollama /api/chat. The endpoint runs the full OA agent (tools, multi-agent, memory, skills) under the hood and returns an OpenAI chat.completion-shaped response so any client SDK can use it without modification.

Both body shapes are accepted on either path:

// OA-native
{"message": "hello", "model": "qwen3.5:9b", "stream": false}

// Ollama-native (the `messages` array; the last user message is extracted)
{"model": "qwen3.5:9b", "messages": [{"role":"user","content":"hello"}], "stream": false}

Two execution modes:

Default (tools unset or tools: true) — full agent: spawns the OA subprocess with the entire 82-tool set, runs the agent loop, returns the final answer with tool_calls metadata.

Direct (tools: false) — fast path: bypasses the agent and forwards straight to the configured backend (Ollama/vLLM) using the session history. Useful for plain chat without tools.

Safety timeout — every non-streaming request is bounded by timeout_s (default 180s). If the agent subprocess doesn't close in timeout_s + 30s, the daemon SIGTERMs (then SIGKILLs) it and returns an OpenAI-shaped error with finish_reason:"error" and a clear explanation. No more hung requests.

Flip Ollama → OA by port alone — this is verified to work via scripts/oa-vs-ollama-chat-compare.sh (see Live Comparison below):

# Before (Ollama)
curl -s http://127.0.0.1:11434/api/chat -d '{"model":"qwen3.5:9b","messages":[{"role":"user","content":"hi"}],"stream":false}'

# After (OA with full agent) — only port changed
curl -s http://127.0.0.1:11435/api/chat -d '{"model":"qwen3.5:9b","messages":[{"role":"user","content":"hi"}],"stream":false}'

# DEFAULT: full agent — multi-step tool use, memory, the works.
# Returns OpenAI chat.completion shape with the assistant's final answer.
curl -s http://localhost:11435/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Search for today'\''s top tech news, summarize the top 3 stories.",
    "model": "qwen3.5:9b",
    "stream": false
  }'

Successful response (OpenAI chat.completion shape):

{
  "id": "chatcmpl-7d0f5b162036",
  "object": "chat.completion",
  "created": 1775593132,
  "model": "qwen3.5:9b",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Based on a web search of today's top tech headlines:\n\n1. ...\n2. ...\n3. ..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 412,
    "completion_tokens": 287,
    "total_tokens": 699
  },
  "session_id": "7d0f5b16-2036-49eb-9fb3-1e6bcb9b0c88",
  "tool_calls": 4,
  "duration_ms": 18432
}

Failure response (also OpenAI-shaped, so clients still parse it):

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1775593132,
  "model": "qwen3.5:9b",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Backend error: Backend HTTP 500: model failed to load, this may be due to resource limitations"
    },
    "finish_reason": "error"
  }],
  "usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
  "session_id": "...",
  "tool_calls": 0,
  "duration_ms": 3691,
  "error": "Backend HTTP 500: ..."
}

finish_reason="error" is the signal — the response is still parseable as a normal chat.completion, but the content carries the real backend error rather than hiding behind a 500. Earlier versions returned junk like "i Knowledge graph: 74 nodes, 219 active edges i Episodes captured: 1 this session ⚠ Task incomplete (0 turns, 0 tool calls, 1.4s)" — that was a status-fragment leakage bug fixed in v0.187.189.

Direct mode (no agent, just the backend — fast path for plain chats):

curl -s http://localhost:11435/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Hello!",
    "model": "qwen3.5:9b",

JSPM

open-agents-ai

Package Exports

Readme

Open Agents — P2P Inference

Table of Contents

The Organism, Not the Cortex

How It Works

Features

Support Development

Enterprise & Headless Mode

Non-Interactive Mode

Background Jobs

JSON Output Mode

Process Management

REST API Service (Port 11435)

Access Policy & Binding

Working Directory

Health & Observability

OpenAI-Compatible Inference

Agentic Task Execution

Configuration

Slash Commands via REST

Auth Scopes

Tool-Use Profiles

Parallelism & Concurrency

Endpoint Reference

Stateful Chat — `/v1/chat` + `/api/chat` (OpenAI drop-in with full agent under the hood)

open-agents-ai

Package Exports

Readme

Open Agents — P2P Inference

Table of Contents

The Organism, Not the Cortex

How It Works

Features

Cross-Modal Workers

Support Development

Enterprise & Headless Mode

Non-Interactive Mode

Background Jobs

JSON Output Mode

Process Management

REST API Service (Port 11435)

Access Policy & Binding

Working Directory

Health & Observability

OpenAI-Compatible Inference

Agentic Task Execution

Configuration

Slash Commands via REST

Auth Scopes

Tool-Use Profiles

Parallelism & Concurrency

Endpoint Reference

Stateful Chat — /v1/chat + /api/chat (OpenAI drop-in with full agent under the hood)

Stateful Chat — `/v1/chat` + `/api/chat` (OpenAI drop-in with full agent under the hood)