Package Exports
- open-agents-ai
- open-agents-ai/dist/index.js
- open-agents-ai/dist/launcher.cjs
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (open-agents-ai) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
freedom of information · freedom of patterns · creating freely · open-weights
libertad de informacion · crear libremente · creer librement · liberte d'expression
Freiheit der Muster · jiyuu ni souzou suru · jayuroun changjak · svoboda tvorchestva
liberdade de criar · creare liberamente · ozgurce yarat · skapa fritt
vrij creeren · tworz swobodnie · dimiourgia elefthera · khuli soch
hurriyat al-ibdaa · code is poetry · democratize AI · imagine freely
Open Agents
npm i -g open-agents-ai && oaAI coding agent powered entirely by open-weight models. No API keys. No cloud. Your code never leaves your machine.
An autonomous multi-turn tool-calling agent that reads your code, makes changes, runs tests, and fixes failures in an iterative loop until the task is complete. First launch auto-detects your hardware and configures the optimal model with expanded context window automatically.
Features
- 35 autonomous tools — file I/O, shell, grep, web search/fetch, memory, sub-agents, background tasks, image/OCR, git, diagnostics, vision, desktop automation, structured files, code sandbox
- Moondream vision — see and interact with the desktop via Moondream VLM (caption, query, detect, point-and-click)
- Desktop automation — vision-guided clicking: describe a UI element in natural language, the agent finds and clicks it
- Auto-install desktop deps — screenshot, mouse, OCR, and image tools auto-install missing system packages (scrot, xdotool, tesseract, imagemagick) on first use
- Parallel tool execution — read-only tools run concurrently via
Promise.allSettled - Sub-agent delegation — spawn independent agents for parallel workstreams
- Ralph Loop — iterative task execution that keeps retrying until completion criteria are met
- Dream Mode — creative idle exploration modeled after real sleep architecture (NREM→REM cycles)
- Live Listen — bidirectional voice communication with real-time Whisper transcription
- Neural TTS — hear what the agent is doing via GLaDOS or Overwatch ONNX voices
- Cost tracking — real-time token cost estimation for 15+ cloud providers
- Work evaluation — LLM-as-judge scoring with task-type-specific rubrics
- Session metrics — track turns, tool calls, tokens, files modified, tasks completed per session
- Structured file generation — create CSV, TSV, JSON, Markdown tables, and Excel-compatible files
- Code sandbox — isolated code execution in subprocess or Docker (JS, Python, Bash, TypeScript)
- Structured file reading — parse CSV, TSV, JSON, Markdown tables with binary format detection
- Multi-provider web search — DuckDuckGo (free), Tavily (structured), Jina AI (markdown) with auto-detection
- Task templates — specialized system prompts and tool recommendations for code, document, analysis, plan tasks
- Auto-expanding context — detects RAM/VRAM and creates an optimized model variant on first run
- Mid-task steering — type while the agent works to add context without interrupting
- Smart compaction — long conversations compressed preserving files, commands, errors, decisions
- Persistent memory — learned patterns stored in
.oa/memory/across sessions - Self-learning — auto-fetches docs from the web when encountering unfamiliar APIs
- Seamless
/update— in-place update and reload without losing context
How It Works
You: oa "fix the null check in auth.ts"
Agent: [Turn 1] file_read(src/auth.ts)
[Turn 2] grep_search(pattern="null", path="src/auth.ts")
[Turn 3] file_edit(old_string="if (user)", new_string="if (user != null)")
[Turn 4] shell(command="npm test")
[Turn 5] task_complete(summary="Fixed null check — all tests pass")The agent uses tools autonomously in a loop — reading errors, fixing code, and re-running validation until the task succeeds or the turn limit is reached.
Ralph Loop — Iteration-First Design
The Ralph Loop is the core execution philosophy: iteration beats perfection. Instead of trying to get everything right on the first attempt, the agent executes in a retry loop where errors become learning data rather than session-ending failures.
/ralph "fix all failing tests" --completion "npm test passes with 0 failures"
/ralph "migrate to TypeScript" --completion "npx tsc --noEmit exits 0" --max-iterations 20
/ralph "reach 80% coverage" --completion "coverage report shows >80%" --timeout 120Each iteration:
- Execute — make changes based on the task + all accumulated learnings
- Verify — run the completion command (tests, build, lint, coverage)
- Learn — if verification fails, extract what went wrong and why
- Iterate — retry with the new knowledge until passing or limits reached
The loop tracks iteration history, generates completion reports saved to .aiwg/ralph/, and supports resume/abort for interrupted sessions. Safety bounds (max iterations, timeout) prevent runaway loops.
/ralph-status # Check current/previous loop status
/ralph-resume # Resume interrupted loop
/ralph-abort # Cancel running loopDream Mode — Creative Idle Exploration
When you're not actively tasking the agent, Dream Mode lets it creatively explore your codebase and generate improvement proposals autonomously. The system models real human sleep architecture with four stages per cycle:
| Stage | Name | What Happens |
|---|---|---|
| NREM-1 | Light Scan | Quick codebase overview, surface observations |
| NREM-2 | Pattern Detection | Identify recurring patterns, technical debt, gaps |
| NREM-3 | Deep Consolidation | Synthesize findings into structured proposals |
| REM | Creative Expansion | Novel ideas, cross-domain connections, bold plans |
Each cycle expands through all four stages then contracts (evaluation, pruning of weak ideas). Three modes control how far the agent can go:
/dream # Default — read-only exploration, proposals saved to .oa/dreams/
/dream deep # Multi-cycle deep exploration with expansion/contraction phases
/dream lucid # Full implementation — saves workspace backup, then implements,
# tests, evaluates, and self-plays each proposal with checkpoints
/dream stop # Wake up — stop dreamingDefault and Deep modes are completely safe — the agent can only read your code and write proposals to .oa/dreams/. File writes, edits, and shell commands outside that directory are blocked by sandboxed dream tools.
Lucid mode unlocks full write access. Before making changes, it saves a workspace checkpoint so you can roll back. Each cycle goes: dream → implement → test → evaluate → checkpoint → next cycle.
All proposals are indexed in .oa/dreams/PROPOSAL-INDEX.md for easy review.
Listen Mode — Live Bidirectional Audio
Listen mode enables real-time voice communication with the agent. Your microphone audio is captured, streamed through Whisper, and the transcription is injected directly into the input line — creating a hands-free coding workflow.
Two transcription backends ensure broad platform support:
- transcribe-cli (faster-whisper / ONNX) — used by default, fastest on x86
- openai-whisper (Python venv) — automatic fallback for ARM, linux-arm64, or when ONNX is unavailable. Auto-creates a venv and installs deps on first use.
/listen # Toggle microphone capture on/off
/listen auto # Auto-submit after 3 seconds of silence (hands-free)
/listen confirm # Require Enter to submit transcription (default)
/listen stop # Stop listeningModel selection — choose the Whisper model size for your hardware:
/listen tiny # Fastest, least accurate (~39MB)
/listen base # Good balance (~74MB)
/listen small # Better accuracy (~244MB)
/listen medium # High accuracy (~769MB)
/listen large # Best accuracy, slower (~1.5GB)When combined with /voice, you get full bidirectional audio — speak your tasks, hear the agent's progress through TTS, and speak corrections mid-task. The status bar shows a blinking red ● REC indicator with a countdown timer during auto-mode recording.
Platform support:
- Linux x86:
arecord(ALSA) orffmpeg(PulseAudio) + transcribe-cli - Linux ARM:
arecordorffmpeg+ openai-whisper (auto-installed in Python venv) - macOS:
sox(CoreAudio) orffmpeg(AVFoundation)
The transcribe-cli dependency auto-installs in the background on first use. On ARM or when transcribe-cli fails, the system automatically falls back to openai-whisper via a self-managed Python venv (same approach used by Moondream vision).
File transcription: Drag-and-drop audio/video files (.mp3, .wav, .mp4, .mkv, etc.) onto the terminal to transcribe them. Results are saved to .oa/transcripts/.
Vision & Desktop Automation (Moondream)
Open Agents can see your screen, understand UI elements, and interact with desktop applications through natural language — powered by the Moondream vision language model running entirely locally.
Desktop Awareness
The agent can take a screenshot and describe what's on screen:
You: what's on my desktop right now?
Agent: [Turn 1] desktop_describe()
→ "A Linux desktop showing three terminal windows with code editors,
a file manager in the background, and a taskbar at the bottom
with Firefox, Files, and Terminal icons."Ask specific questions about the screen:
Agent: [Turn 1] desktop_describe(question="What application is in focus?")
→ "The focused application is a terminal running vim with a Python file open."Vision Analysis
Analyze any image with four actions:
Agent: vision(image="screenshot.png", action="caption")
→ "A terminal window displaying code with syntax highlighting"
Agent: vision(image="ui.png", action="query", prompt="How many buttons are visible?")
→ "There are 4 buttons visible: Save, Cancel, Help, and Close"
Agent: vision(image="ui.png", action="detect", prompt="button")
→ Detected 4 "button" in ui.png:
1. bbox: [0.10, 0.85, 0.25, 0.95]
2. bbox: [0.30, 0.85, 0.45, 0.95]
...
Agent: vision(image="ui.png", action="point", prompt="close button")
→ Found 1 "close button" at (0.95, 0.02) — pixel (1824, 22)Point-and-Click
Describe what to click in plain English — the agent screenshots, finds the element with Moondream, and clicks it:
Agent: desktop_click(target="the Save button")
→ Clicked "Save button" at (480, 920)
Agent: desktop_click(target="File menu", button="left")
→ Clicked "File menu" at (45, 12)
Agent: desktop_click(target="terminal icon", click_type="double")
→ Clicked "terminal icon" at (1850, 540)Supports left/right/middle click, single/double click, multi-match selection by index, dry-run mode for verification, and configurable delay for UI transitions.
Setup
Moondream runs locally — no API keys, no cloud, your screen data never leaves your machine:
# Create a Python venv and install Moondream Station
python3 -m venv .moondream-venv
.moondream-venv/bin/pip install moondream-station pydantic uvicorn fastapi packaging
# Start the vision server (downloads model on first run, ~1.7GB)
.moondream-venv/bin/python packages/execution/scripts/start-moondream.pyThe vision tools auto-detect a running Moondream Station on localhost:2020. For cloud inference, set MOONDREAM_API_KEY instead.
System dependencies (auto-installed on first use):
Desktop tools automatically install missing system packages when first needed. No manual setup required — just use the tool and it handles the rest:
| Tool | Linux Package | What It Does |
|---|---|---|
scrot |
apt install scrot |
Screenshot capture |
xdotool |
apt install xdotool |
Mouse/keyboard automation |
tesseract |
apt install tesseract-ocr |
OCR text extraction |
identify |
apt install imagemagick |
Image dimensions/conversion |
Supports apt (Debian/Ubuntu), dnf (Fedora), pacman (Arch), and brew (macOS). You can also pre-install everything at once:
./scripts/setup-desktop.sh # Install all desktop deps
./scripts/setup-desktop.sh --check-only # Just check what's missingVision backend:
- Moondream Station (local) — runs entirely on your machine, no API keys needed
- Moondream Cloud API — set
MOONDREAM_API_KEYfor cloud inference
Interactive TUI
Launch without arguments to enter the interactive REPL:
oaThe TUI features an animated multilingual phrase carousel, live metrics bar with pastel-colored labels (token in/out, context window usage), rotating tips, syntax-highlighted tool output, and dynamic terminal-width cropping.
Slash Commands
| Command | Description |
|---|---|
/help |
Show all available commands |
/model <name> |
Switch to a different Ollama model |
/endpoint <url> |
Connect to a remote vLLM or OpenAI-compatible API |
/voice [model] |
Toggle TTS voice (GLaDOS, Overwatch) |
/listen [mode] |
Toggle live microphone transcription |
/dream [mode] |
Start dream mode (default, deep, lucid) |
/stream |
Toggle streaming token display |
/bruteforce |
Toggle brute-force mode (auto re-engage on turn limit) |
/tools |
List available tools |
/skills |
List/search available skills |
/update |
Check for and install updates (seamless reload) |
/cost |
Show token cost breakdown for the current session |
/evaluate |
Score the last completed task with LLM-as-judge |
/stats |
Show session metrics (turns, tools, tokens, files) |
/task-type <type> |
Set task type for specialized prompts (code, document, analysis, plan) |
/config |
Show current configuration |
/clear |
Clear the screen |
/exit |
Quit |
Mid-Task Steering
While the agent is working (shown by the + prompt), type to add context:
> fix the auth bug
⎿ Read: src/auth.ts
+ also check the session handling ← typed while agent works
↪ Context added: also check the session handling
⎿ Search: session
⎿ Edit: src/auth.tsTools (35)
| Tool | Description |
|---|---|
| File Operations | |
file_read |
Read file contents with line numbers (offset/limit) |
file_write |
Create or overwrite files |
file_edit |
Precise string replacement in files |
batch_edit |
Multiple edits across files in one call |
list_directory |
List directory contents |
| Search & Navigation | |
grep_search |
Search file contents with regex (ripgrep) |
find_files |
Find files by glob pattern |
codebase_map |
High-level project structure overview |
| Shell & Execution | |
shell |
Execute any shell command |
code_sandbox |
Isolated code execution (JS, Python, Bash, TS) in subprocess or Docker |
background_run |
Run shell command in background |
task_status |
Check background task status |
task_output |
Read background task output |
task_stop |
Stop a background task |
| Web | |
web_search |
Search the web (DuckDuckGo, Tavily, Jina AI — auto-detected) |
web_fetch |
Fetch and extract text from web pages |
| Structured Data | |
structured_file |
Generate CSV, TSV, JSON, Markdown tables, Excel-compatible files |
read_structured_file |
Parse CSV, TSV, JSON, Markdown tables with binary detection |
| Vision & Desktop | |
vision |
Moondream VLM — caption, query, detect, point on any image |
desktop_click |
Vision-guided clicking: describe a UI element, agent finds and clicks it |
desktop_describe |
Screenshot + Moondream caption/query for desktop awareness |
image_read |
Read images (base64 + OCR) |
screenshot |
Capture screen/window |
ocr |
Extract text from images (Tesseract) |
| Memory & Knowledge | |
memory_read |
Read from persistent memory store |
memory_write |
Store patterns for future sessions |
| Git & Diagnostics | |
diagnostic |
Lint/typecheck/test/build validation pipeline |
git_info |
Structured git status, log, diff, branch info |
| Agents & Skills | |
sub_agent |
Delegate to an independent agent |
create_tool |
Create reusable custom tools at runtime |
manage_tools |
List, inspect, delete custom tools |
skill_list |
Discover available AIWG skills |
skill_execute |
Run an AIWG skill |
| AIWG SDLC | |
aiwg_setup |
Deploy AIWG SDLC framework |
aiwg_health |
Analyze SDLC health |
aiwg_workflow |
Execute AIWG workflows |
Read-only tools execute concurrently when called in the same turn. Mutating tools run sequentially.
Auto-Expanding Context Window
On startup and /model switch, Open Agents detects your RAM/VRAM and creates an optimized model variant:
| Available Memory | Context Window |
|---|---|
| 200GB+ | 128K tokens |
| 100GB+ | 64K tokens |
| 50GB+ | 32K tokens |
| 20GB+ | 16K tokens |
| 8GB+ | 8K tokens |
| < 8GB | 4K tokens |
Voice Feedback (TTS)
/voice # Toggle on/off (default: GLaDOS)
/voice glados # GLaDOS voice
/voice overwatch # Overwatch voiceAuto-downloads the ONNX voice model (~50MB) on first use. Install espeak-ng for best quality (apt install espeak-ng / brew install espeak-ng).
Cost Tracking & Session Metrics
Real-time token cost estimation for cloud providers. The status bar shows running cost when using a paid endpoint.
/cost # Show cost breakdown by model/provider
/stats # Session metrics: turns, tool calls, tokens, files modified
/evaluate # Score the last completed task (LLM-as-judge, 5 rubric dimensions)Cost tracking supports 15+ providers including Groq, Together AI, OpenRouter, Fireworks AI, DeepInfra, Mistral, Cerebras, and more. Pricing is per-million tokens with separate input/output rates.
Work evaluation uses five task-type-specific rubrics (code, document, analysis, plan, general) scoring correctness, completeness, efficiency, code quality, and communication on a 1-5 scale.
Code Sandbox
Execute code snippets in isolated environments without affecting your project:
Agent: code_sandbox(language="python", code="import math; print(math.factorial(20))")
→ 2432902008176640000
Agent: code_sandbox(language="javascript", code="console.log([...new Set([1,2,2,3])].length)")
→ 3Supports JavaScript, TypeScript, Python, and Bash. Two execution modes:
- Subprocess (default) — runs in a child process with timeout and output limits
- Docker — runs in an isolated container when
dockeris available
Structured Data Tools
Generate structured files
Create CSV, TSV, JSON, Markdown tables, and Excel-compatible files from data:
Agent: structured_file(format="csv", path="results.csv", columns=["name","score"],
data=[{"name":"Alice","score":95},{"name":"Bob","score":87}])
→ Created results.csv (2 rows, 2 columns)Read structured files
Parse existing data files with automatic format detection:
Agent: read_structured_file(path="data.csv")
→ CSV: 150 rows, 5 columns [showing first 100]
Agent: read_structured_file(path="report.md")
→ Markdown: 3 table(s) extractedDetects binary formats (XLSX, PDF, DOCX) and suggests conversion tools.
Multi-Provider Web Search
Web search automatically selects the best available provider:
| Provider | Trigger | Features |
|---|---|---|
| DuckDuckGo | Default (no key needed) | Free, privacy-focused |
| Tavily | TAVILY_API_KEY set |
Structured results + AI-generated answer |
| Jina AI | JINA_API_KEY set |
Markdown-formatted results |
export TAVILY_API_KEY=tvly-... # Enable Tavily (optional)
export JINA_API_KEY=jina_... # Enable Jina AI (optional)Task Templates
Set a task type to get specialized system prompts, recommended tools, and output guidance:
/task-type code # Code generation/fix — emphasizes tests, diffs, file edits
/task-type document # Documentation — emphasizes clarity, structure, completeness
/task-type analysis # Analysis tasks — emphasizes data, metrics, evidence
/task-type plan # Planning — emphasizes steps, dependencies, risksConfiguration
Config priority: CLI flags > env vars > ~/.open-agents/config.json > defaults.
open-agents config set model qwen3.5:122b
open-agents config set backendUrl http://localhost:11434Project Context
Create AGENTS.md, OA.md, or .open-agents.md in your project root for agent instructions. Context files merge from parent to child directories.
.oa/ Project Directory
.oa/
├── config.json # Project config overrides
├── settings.json # TUI settings
├── memory/ # Persistent memory store
├── dreams/ # Dream mode proposals & checkpoints
├── transcripts/ # Audio/video transcriptions
├── index/ # Cached codebase index
├── context/ # Auto-generated project context
└── history/ # Session historyModel Support
Primary target: Qwen3.5-122B-A10B via Ollama (MoE, 48GB+ VRAM)
Any Ollama or OpenAI-compatible API model with tool calling works:
oa --model qwen2.5-coder:32b "fix the bug"
oa --backend vllm --backend-url http://localhost:8000/v1 "add tests"
oa --backend-url http://10.0.0.5:11434 "refactor auth"Supported Inference Providers
Open Agents auto-detects your provider from the endpoint URL and configures auth + health checks accordingly. All providers use standard Authorization: Bearer <key> authentication.
| Provider | Endpoint URL | API Key | Notes |
|---|---|---|---|
| Ollama (local) | http://localhost:11434 |
None | Default. Auto-detects, auto-expands context window |
| vLLM (local) | http://localhost:8000 |
Optional | Self-hosted OpenAI-compatible server |
| LM Studio (local) | http://localhost:1234 |
None | Local model server with GUI |
| Chutes AI | https://llm.chutes.ai |
cpk_... |
Bearer auth. Fast cloud inference |
| Together AI | https://api.together.xyz |
Required | Large model catalog |
| Groq | https://api.groq.com/openai |
gsk_... |
Ultra-fast LPU inference |
| OpenRouter | https://openrouter.ai/api |
sk-or-... |
Multi-provider routing |
| Fireworks AI | https://api.fireworks.ai/inference |
fw_... |
Fast serverless inference |
| DeepInfra | https://api.deepinfra.com |
Required | Cost-effective inference |
| Mistral AI | https://api.mistral.ai |
Required | Mistral models |
| Cerebras | https://api.cerebras.ai |
csk-... |
Wafer-scale inference |
| SambaNova | https://api.sambanova.ai |
Required | RDU-accelerated inference |
| NVIDIA NIM | https://integrate.api.nvidia.com |
nvapi-... |
NVIDIA cloud inference |
| Hyperbolic | https://api.hyperbolic.xyz |
Required | GPU cloud inference |
| OpenAI | https://api.openai.com |
sk-... |
GPT models (tool calling) |
Connecting to a Provider
Use /endpoint in the TUI or pass via CLI:
# Chutes AI
/endpoint https://llm.chutes.ai --auth cpk_your_key_here
# Groq
/endpoint https://api.groq.com/openai --auth gsk_your_key_here
# Together AI
/endpoint https://api.together.xyz --auth your_key_here
# Self-hosted vLLM on LAN
/endpoint http://10.0.0.5:8000The agent auto-detects the provider, normalizes the URL (strips /v1/chat/completions if pasted), tests connectivity, and saves the configuration. You can paste full endpoint URLs — they'll be cleaned up automatically.
Evaluation Suite
23 evaluation tasks test the agent's autonomous capabilities across coding, web research, SDLC analysis, and tool creation:
node eval/run-agentic.mjs # Run all 23 tasks
node eval/run-agentic.mjs 04-add-test # Single task
node eval/run-agentic.mjs --model qwen2.5-coder:32b # Different model| ID | Task | Category |
|---|---|---|
| 01 | Fix typo in function name | Code Fix |
| 02 | Add isPrime function | Code Generation |
| 03 | Fix off-by-one bug | Code Fix |
| 04 | Write comprehensive tests | Test Generation |
| 05 | Extract functions from long method | Refactoring |
| 06 | Fix TypeScript type errors | Type Safety |
| 07 | Add REST API endpoint | Feature Addition |
| 08 | Add pagination across files | Multi-File Edit |
| 09 | CSS named color lookup (148 colors) | Web Research |
| 10 | HTTP status code lookup (32+ codes) | Web Research |
| 11 | MIME type lookup (30+ types) | Web Research |
| 12 | SDLC health analyzer | AIWG Analysis |
| 13 | SDLC artifact generator | AIWG Generation |
| 14 | Batch refactor variable names | Multi-File Refactor |
| 15 | Codebase overview from structure | Code Analysis |
| 16 | Diagnostic fix loop | Error Recovery |
| 17 | Git repository analyzer | Git Integration |
| 18 | Create custom tool from spec | Tool Creation |
| 19 | Tool from usage pattern | Tool Discovery |
| 20 | Tool management operations | Tool Lifecycle |
| 21 | Large file patch | Precision Editing |
| 22 | Skill discovery | Skill System |
| 23 | Skill execution | Skill System |
Benchmark Results (Qwen3.5-122B)
Pass rate: 100% (8/8 core tasks)
Total: 39 turns, 55 tool calls, ~10 minutes
Average: 4.9 turns/task, 6.9 tools/taskAIWG Integration
Open Agents integrates with AIWG for AI-augmented software development:
npm i -g aiwg
oa "analyze this project's SDLC health and set up documentation"| Capability | Description |
|---|---|
| Structured Memory | .aiwg/ directory persists project knowledge |
| SDLC Artifacts | Requirements, architecture, test strategy, deployment docs |
| Health Analysis | Score your project's SDLC maturity |
| 85+ Agents | Specialized AI personas (Test Engineer, Security Auditor, API Designer) |
| Traceability | @-mention system links requirements to code to tests |
Architecture
The core is AgenticRunner — a multi-turn tool-calling loop:
User task → System prompt + tools → LLM → tool_calls → Execute → Feed results → LLM
(repeat until task_complete or max turns)- Tool-first — the model explores via tools, not pre-stuffed context
- Iterative — tests, sees failures, fixes them
- Parallel-safe — read-only tools concurrent, mutating tools sequential
- Observable — every tool call and result emitted as a real-time event
- Bounded — max turns, timeout, output limits prevent runaway loops
License
MIT