JSPM

open-agents-ai

0.24.0
    • ESM via JSPM
    • ES Module Entrypoint
    • Export Map
    • Keywords
    • License
    • Repository URL
    • TypeScript Types
    • README
    • Created
    • Published
    • Downloads 26260
    • Score
      100M100P100Q140441F
    • License MIT

    AI coding agent powered by open-source models (Ollama/vLLM) — interactive TUI with agentic tool-calling loop

    Package Exports

    • open-agents-ai
    • open-agents-ai/dist/index.js
    • open-agents-ai/dist/launcher.cjs

    This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (open-agents-ai) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

    Readme

    npm version npm downloads license node version open-weight models

    freedom of information · freedom of patterns · creating freely · open-weights
    libertad de informacion · crear libremente · creer librement · liberte d'expression
    Freiheit der Muster · jiyuu ni souzou suru · jayuroun changjak · svoboda tvorchestva
    liberdade de criar · creare liberamente · ozgurce yarat · skapa fritt
    vrij creeren · tworz swobodnie · dimiourgia elefthera · khuli soch
    hurriyat al-ibdaa · code is poetry · democratize AI · imagine freely


    Open Agents

    npm i -g open-agents-ai && oa

    AI coding agent powered entirely by open-weight models. No API keys. No cloud. Your code never leaves your machine.

    An autonomous multi-turn tool-calling agent that reads your code, makes changes, runs tests, and fixes failures in an iterative loop until the task is complete. First launch auto-detects your hardware and configures the optimal model with expanded context window automatically.

    Features

    • 35 autonomous tools — file I/O, shell, grep, web search/fetch, memory, sub-agents, background tasks, image/OCR, git, diagnostics, vision, desktop automation, structured files, code sandbox
    • Moondream vision — see and interact with the desktop via Moondream VLM (caption, query, detect, point-and-click)
    • Desktop automation — vision-guided clicking: describe a UI element in natural language, the agent finds and clicks it
    • Auto-install desktop deps — screenshot, mouse, OCR, and image tools auto-install missing system packages (scrot, xdotool, tesseract, imagemagick) on first use
    • Parallel tool execution — read-only tools run concurrently via Promise.allSettled
    • Sub-agent delegation — spawn independent agents for parallel workstreams
    • Ralph Loop — iterative task execution that keeps retrying until completion criteria are met
    • Dream Mode — creative idle exploration modeled after real sleep architecture (NREM→REM cycles)
    • Live Listen — bidirectional voice communication with real-time Whisper transcription
    • Neural TTS — hear what the agent is doing via GLaDOS or Overwatch ONNX voices
    • Cost tracking — real-time token cost estimation for 15+ cloud providers
    • Work evaluation — LLM-as-judge scoring with task-type-specific rubrics
    • Session metrics — track turns, tool calls, tokens, files modified, tasks completed per session
    • Structured file generation — create CSV, TSV, JSON, Markdown tables, and Excel-compatible files
    • Code sandbox — isolated code execution in subprocess or Docker (JS, Python, Bash, TypeScript)
    • Structured file reading — parse CSV, TSV, JSON, Markdown tables with binary format detection
    • Multi-provider web search — DuckDuckGo (free), Tavily (structured), Jina AI (markdown) with auto-detection
    • Task templates — specialized system prompts and tool recommendations for code, document, analysis, plan tasks
    • Auto-expanding context — detects RAM/VRAM and creates an optimized model variant on first run
    • Mid-task steering — type while the agent works to add context without interrupting
    • Smart compaction — long conversations compressed preserving files, commands, errors, decisions
    • Persistent memory — learned patterns stored in .oa/memory/ across sessions
    • Self-learning — auto-fetches docs from the web when encountering unfamiliar APIs
    • Seamless /update — in-place update and reload without losing context

    How It Works

    You: oa "fix the null check in auth.ts"
    
    Agent: [Turn 1] file_read(src/auth.ts)
           [Turn 2] grep_search(pattern="null", path="src/auth.ts")
           [Turn 3] file_edit(old_string="if (user)", new_string="if (user != null)")
           [Turn 4] shell(command="npm test")
           [Turn 5] task_complete(summary="Fixed null check — all tests pass")

    The agent uses tools autonomously in a loop — reading errors, fixing code, and re-running validation until the task succeeds or the turn limit is reached.

    Ralph Loop — Iteration-First Design

    The Ralph Loop is the core execution philosophy: iteration beats perfection. Instead of trying to get everything right on the first attempt, the agent executes in a retry loop where errors become learning data rather than session-ending failures.

    /ralph "fix all failing tests" --completion "npm test passes with 0 failures"
    /ralph "migrate to TypeScript" --completion "npx tsc --noEmit exits 0" --max-iterations 20
    /ralph "reach 80% coverage" --completion "coverage report shows >80%" --timeout 120

    Each iteration:

    1. Execute — make changes based on the task + all accumulated learnings
    2. Verify — run the completion command (tests, build, lint, coverage)
    3. Learn — if verification fails, extract what went wrong and why
    4. Iterate — retry with the new knowledge until passing or limits reached

    The loop tracks iteration history, generates completion reports saved to .aiwg/ralph/, and supports resume/abort for interrupted sessions. Safety bounds (max iterations, timeout) prevent runaway loops.

    /ralph-status     # Check current/previous loop status
    /ralph-resume     # Resume interrupted loop
    /ralph-abort      # Cancel running loop

    Dream Mode — Creative Idle Exploration

    When you're not actively tasking the agent, Dream Mode lets it creatively explore your codebase and generate improvement proposals autonomously. The system models real human sleep architecture with four stages per cycle:

    Stage Name What Happens
    NREM-1 Light Scan Quick codebase overview, surface observations
    NREM-2 Pattern Detection Identify recurring patterns, technical debt, gaps
    NREM-3 Deep Consolidation Synthesize findings into structured proposals
    REM Creative Expansion Novel ideas, cross-domain connections, bold plans

    Each cycle expands through all four stages then contracts (evaluation, pruning of weak ideas). Three modes control how far the agent can go:

    /dream              # Default — read-only exploration, proposals saved to .oa/dreams/
    /dream deep         # Multi-cycle deep exploration with expansion/contraction phases
    /dream lucid        # Full implementation — saves workspace backup, then implements,
                        #   tests, evaluates, and self-plays each proposal with checkpoints
    /dream stop         # Wake up — stop dreaming

    Default and Deep modes are completely safe — the agent can only read your code and write proposals to .oa/dreams/. File writes, edits, and shell commands outside that directory are blocked by sandboxed dream tools.

    Lucid mode unlocks full write access. Before making changes, it saves a workspace checkpoint so you can roll back. Each cycle goes: dream → implement → test → evaluate → checkpoint → next cycle.

    All proposals are indexed in .oa/dreams/PROPOSAL-INDEX.md for easy review.

    Listen Mode — Live Bidirectional Audio

    Listen mode enables real-time voice communication with the agent. Your microphone audio is captured, streamed through Whisper, and the transcription is injected directly into the input line — creating a hands-free coding workflow.

    Two transcription backends ensure broad platform support:

    • transcribe-cli (faster-whisper / ONNX) — used by default, fastest on x86
    • openai-whisper (Python venv) — automatic fallback for ARM, linux-arm64, or when ONNX is unavailable. Auto-creates a venv and installs deps on first use.
    /listen             # Toggle microphone capture on/off
    /listen auto        # Auto-submit after 3 seconds of silence (hands-free)
    /listen confirm     # Require Enter to submit transcription (default)
    /listen stop        # Stop listening

    Model selection — choose the Whisper model size for your hardware:

    /listen tiny        # Fastest, least accurate (~39MB)
    /listen base        # Good balance (~74MB)
    /listen small       # Better accuracy (~244MB)
    /listen medium      # High accuracy (~769MB)
    /listen large       # Best accuracy, slower (~1.5GB)

    When combined with /voice, you get full bidirectional audio — speak your tasks, hear the agent's progress through TTS, and speak corrections mid-task. The status bar shows a blinking red ● REC indicator with a countdown timer during auto-mode recording.

    Platform support:

    • Linux x86: arecord (ALSA) or ffmpeg (PulseAudio) + transcribe-cli
    • Linux ARM: arecord or ffmpeg + openai-whisper (auto-installed in Python venv)
    • macOS: sox (CoreAudio) or ffmpeg (AVFoundation)

    The transcribe-cli dependency auto-installs in the background on first use. On ARM or when transcribe-cli fails, the system automatically falls back to openai-whisper via a self-managed Python venv (same approach used by Moondream vision).

    File transcription: Drag-and-drop audio/video files (.mp3, .wav, .mp4, .mkv, etc.) onto the terminal to transcribe them. Results are saved to .oa/transcripts/.

    Vision & Desktop Automation (Moondream)

    Open Agents can see your screen, understand UI elements, and interact with desktop applications through natural language — powered by the Moondream vision language model running entirely locally.

    Desktop Awareness

    The agent can take a screenshot and describe what's on screen:

    You: what's on my desktop right now?
    
    Agent: [Turn 1] desktop_describe()
           → "A Linux desktop showing three terminal windows with code editors,
              a file manager in the background, and a taskbar at the bottom
              with Firefox, Files, and Terminal icons."

    Ask specific questions about the screen:

    Agent: [Turn 1] desktop_describe(question="What application is in focus?")
           → "The focused application is a terminal running vim with a Python file open."

    Vision Analysis

    Analyze any image with four actions:

    Agent: vision(image="screenshot.png", action="caption")
           → "A terminal window displaying code with syntax highlighting"
    
    Agent: vision(image="ui.png", action="query", prompt="How many buttons are visible?")
           → "There are 4 buttons visible: Save, Cancel, Help, and Close"
    
    Agent: vision(image="ui.png", action="detect", prompt="button")
           → Detected 4 "button" in ui.png:
             1. bbox: [0.10, 0.85, 0.25, 0.95]
             2. bbox: [0.30, 0.85, 0.45, 0.95]
             ...
    
    Agent: vision(image="ui.png", action="point", prompt="close button")
           → Found 1 "close button" at (0.95, 0.02) — pixel (1824, 22)

    Point-and-Click

    Describe what to click in plain English — the agent screenshots, finds the element with Moondream, and clicks it:

    Agent: desktop_click(target="the Save button")
           → Clicked "Save button" at (480, 920)
    
    Agent: desktop_click(target="File menu", button="left")
           → Clicked "File menu" at (45, 12)
    
    Agent: desktop_click(target="terminal icon", click_type="double")
           → Clicked "terminal icon" at (1850, 540)

    Supports left/right/middle click, single/double click, multi-match selection by index, dry-run mode for verification, and configurable delay for UI transitions.

    Setup

    Moondream runs locally — no API keys, no cloud, your screen data never leaves your machine:

    # Create a Python venv and install Moondream Station
    python3 -m venv .moondream-venv
    .moondream-venv/bin/pip install moondream-station pydantic uvicorn fastapi packaging
    
    # Start the vision server (downloads model on first run, ~1.7GB)
    .moondream-venv/bin/python packages/execution/scripts/start-moondream.py

    The vision tools auto-detect a running Moondream Station on localhost:2020. For cloud inference, set MOONDREAM_API_KEY instead.

    System dependencies (auto-installed on first use):

    Desktop tools automatically install missing system packages when first needed. No manual setup required — just use the tool and it handles the rest:

    Tool Linux Package What It Does
    scrot apt install scrot Screenshot capture
    xdotool apt install xdotool Mouse/keyboard automation
    tesseract apt install tesseract-ocr OCR text extraction
    identify apt install imagemagick Image dimensions/conversion

    Supports apt (Debian/Ubuntu), dnf (Fedora), pacman (Arch), and brew (macOS). You can also pre-install everything at once:

    ./scripts/setup-desktop.sh          # Install all desktop deps
    ./scripts/setup-desktop.sh --check-only  # Just check what's missing

    Vision backend:

    • Moondream Station (local) — runs entirely on your machine, no API keys needed
    • Moondream Cloud API — set MOONDREAM_API_KEY for cloud inference

    Interactive TUI

    Launch without arguments to enter the interactive REPL:

    oa

    The TUI features an animated multilingual phrase carousel, live metrics bar with pastel-colored labels (token in/out, context window usage), rotating tips, syntax-highlighted tool output, and dynamic terminal-width cropping.

    Slash Commands

    Command Description
    /help Show all available commands
    /model <name> Switch to a different Ollama model
    /endpoint <url> Connect to a remote vLLM or OpenAI-compatible API
    /voice [model] Toggle TTS voice (GLaDOS, Overwatch)
    /listen [mode] Toggle live microphone transcription
    /dream [mode] Start dream mode (default, deep, lucid)
    /stream Toggle streaming token display
    /bruteforce Toggle brute-force mode (auto re-engage on turn limit)
    /tools List available tools
    /skills List/search available skills
    /update Check for and install updates (seamless reload)
    /cost Show token cost breakdown for the current session
    /evaluate Score the last completed task with LLM-as-judge
    /stats Show session metrics (turns, tools, tokens, files)
    /task-type <type> Set task type for specialized prompts (code, document, analysis, plan)
    /config Show current configuration
    /clear Clear the screen
    /exit Quit

    Mid-Task Steering

    While the agent is working (shown by the + prompt), type to add context:

    > fix the auth bug
      ⎿  Read: src/auth.ts
    + also check the session handling        ← typed while agent works
      ↪ Context added: also check the session handling
      ⎿  Search: session
      ⎿  Edit: src/auth.ts

    Tools (35)

    Tool Description
    File Operations
    file_read Read file contents with line numbers (offset/limit)
    file_write Create or overwrite files
    file_edit Precise string replacement in files
    batch_edit Multiple edits across files in one call
    list_directory List directory contents
    Search & Navigation
    grep_search Search file contents with regex (ripgrep)
    find_files Find files by glob pattern
    codebase_map High-level project structure overview
    Shell & Execution
    shell Execute any shell command
    code_sandbox Isolated code execution (JS, Python, Bash, TS) in subprocess or Docker
    background_run Run shell command in background
    task_status Check background task status
    task_output Read background task output
    task_stop Stop a background task
    Web
    web_search Search the web (DuckDuckGo, Tavily, Jina AI — auto-detected)
    web_fetch Fetch and extract text from web pages
    Structured Data
    structured_file Generate CSV, TSV, JSON, Markdown tables, Excel-compatible files
    read_structured_file Parse CSV, TSV, JSON, Markdown tables with binary detection
    Vision & Desktop
    vision Moondream VLM — caption, query, detect, point on any image
    desktop_click Vision-guided clicking: describe a UI element, agent finds and clicks it
    desktop_describe Screenshot + Moondream caption/query for desktop awareness
    image_read Read images (base64 + OCR)
    screenshot Capture screen/window
    ocr Extract text from images (Tesseract)
    Memory & Knowledge
    memory_read Read from persistent memory store
    memory_write Store patterns for future sessions
    Git & Diagnostics
    diagnostic Lint/typecheck/test/build validation pipeline
    git_info Structured git status, log, diff, branch info
    Agents & Skills
    sub_agent Delegate to an independent agent
    create_tool Create reusable custom tools at runtime
    manage_tools List, inspect, delete custom tools
    skill_list Discover available AIWG skills
    skill_execute Run an AIWG skill
    AIWG SDLC
    aiwg_setup Deploy AIWG SDLC framework
    aiwg_health Analyze SDLC health
    aiwg_workflow Execute AIWG workflows

    Read-only tools execute concurrently when called in the same turn. Mutating tools run sequentially.

    Auto-Expanding Context Window

    On startup and /model switch, Open Agents detects your RAM/VRAM and creates an optimized model variant:

    Available Memory Context Window
    200GB+ 128K tokens
    100GB+ 64K tokens
    50GB+ 32K tokens
    20GB+ 16K tokens
    8GB+ 8K tokens
    < 8GB 4K tokens

    Voice Feedback (TTS)

    /voice              # Toggle on/off (default: GLaDOS)
    /voice glados       # GLaDOS voice
    /voice overwatch    # Overwatch voice

    Auto-downloads the ONNX voice model (~50MB) on first use. Install espeak-ng for best quality (apt install espeak-ng / brew install espeak-ng).

    Cost Tracking & Session Metrics

    Real-time token cost estimation for cloud providers. The status bar shows running cost when using a paid endpoint.

    /cost              # Show cost breakdown by model/provider
    /stats             # Session metrics: turns, tool calls, tokens, files modified
    /evaluate          # Score the last completed task (LLM-as-judge, 5 rubric dimensions)

    Cost tracking supports 15+ providers including Groq, Together AI, OpenRouter, Fireworks AI, DeepInfra, Mistral, Cerebras, and more. Pricing is per-million tokens with separate input/output rates.

    Work evaluation uses five task-type-specific rubrics (code, document, analysis, plan, general) scoring correctness, completeness, efficiency, code quality, and communication on a 1-5 scale.

    Code Sandbox

    Execute code snippets in isolated environments without affecting your project:

    Agent: code_sandbox(language="python", code="import math; print(math.factorial(20))")
           → 2432902008176640000
    
    Agent: code_sandbox(language="javascript", code="console.log([...new Set([1,2,2,3])].length)")
           → 3

    Supports JavaScript, TypeScript, Python, and Bash. Two execution modes:

    • Subprocess (default) — runs in a child process with timeout and output limits
    • Docker — runs in an isolated container when docker is available

    Structured Data Tools

    Generate structured files

    Create CSV, TSV, JSON, Markdown tables, and Excel-compatible files from data:

    Agent: structured_file(format="csv", path="results.csv", columns=["name","score"],
             data=[{"name":"Alice","score":95},{"name":"Bob","score":87}])
           → Created results.csv (2 rows, 2 columns)

    Read structured files

    Parse existing data files with automatic format detection:

    Agent: read_structured_file(path="data.csv")
           → CSV: 150 rows, 5 columns [showing first 100]
    
    Agent: read_structured_file(path="report.md")
           → Markdown: 3 table(s) extracted

    Detects binary formats (XLSX, PDF, DOCX) and suggests conversion tools.

    Web search automatically selects the best available provider:

    Provider Trigger Features
    DuckDuckGo Default (no key needed) Free, privacy-focused
    Tavily TAVILY_API_KEY set Structured results + AI-generated answer
    Jina AI JINA_API_KEY set Markdown-formatted results
    export TAVILY_API_KEY=tvly-...   # Enable Tavily (optional)
    export JINA_API_KEY=jina_...     # Enable Jina AI (optional)

    Task Templates

    Set a task type to get specialized system prompts, recommended tools, and output guidance:

    /task-type code       # Code generation/fix — emphasizes tests, diffs, file edits
    /task-type document   # Documentation — emphasizes clarity, structure, completeness
    /task-type analysis   # Analysis tasks — emphasizes data, metrics, evidence
    /task-type plan       # Planning — emphasizes steps, dependencies, risks

    Configuration

    Config priority: CLI flags > env vars > ~/.open-agents/config.json > defaults.

    open-agents config set model qwen3.5:122b
    open-agents config set backendUrl http://localhost:11434

    Project Context

    Create AGENTS.md, OA.md, or .open-agents.md in your project root for agent instructions. Context files merge from parent to child directories.

    .oa/ Project Directory

    .oa/
    ├── config.json        # Project config overrides
    ├── settings.json      # TUI settings
    ├── memory/            # Persistent memory store
    ├── dreams/            # Dream mode proposals & checkpoints
    ├── transcripts/       # Audio/video transcriptions
    ├── index/             # Cached codebase index
    ├── context/           # Auto-generated project context
    └── history/           # Session history

    Model Support

    Primary target: Qwen3.5-122B-A10B via Ollama (MoE, 48GB+ VRAM)

    Any Ollama or OpenAI-compatible API model with tool calling works:

    oa --model qwen2.5-coder:32b "fix the bug"
    oa --backend vllm --backend-url http://localhost:8000/v1 "add tests"
    oa --backend-url http://10.0.0.5:11434 "refactor auth"

    Supported Inference Providers

    Open Agents auto-detects your provider from the endpoint URL and configures auth + health checks accordingly. All providers use standard Authorization: Bearer <key> authentication.

    Provider Endpoint URL API Key Notes
    Ollama (local) http://localhost:11434 None Default. Auto-detects, auto-expands context window
    vLLM (local) http://localhost:8000 Optional Self-hosted OpenAI-compatible server
    LM Studio (local) http://localhost:1234 None Local model server with GUI
    Chutes AI https://llm.chutes.ai cpk_... Bearer auth. Fast cloud inference
    Together AI https://api.together.xyz Required Large model catalog
    Groq https://api.groq.com/openai gsk_... Ultra-fast LPU inference
    OpenRouter https://openrouter.ai/api sk-or-... Multi-provider routing
    Fireworks AI https://api.fireworks.ai/inference fw_... Fast serverless inference
    DeepInfra https://api.deepinfra.com Required Cost-effective inference
    Mistral AI https://api.mistral.ai Required Mistral models
    Cerebras https://api.cerebras.ai csk-... Wafer-scale inference
    SambaNova https://api.sambanova.ai Required RDU-accelerated inference
    NVIDIA NIM https://integrate.api.nvidia.com nvapi-... NVIDIA cloud inference
    Hyperbolic https://api.hyperbolic.xyz Required GPU cloud inference
    OpenAI https://api.openai.com sk-... GPT models (tool calling)

    Connecting to a Provider

    Use /endpoint in the TUI or pass via CLI:

    # Chutes AI
    /endpoint https://llm.chutes.ai --auth cpk_your_key_here
    
    # Groq
    /endpoint https://api.groq.com/openai --auth gsk_your_key_here
    
    # Together AI
    /endpoint https://api.together.xyz --auth your_key_here
    
    # Self-hosted vLLM on LAN
    /endpoint http://10.0.0.5:8000

    The agent auto-detects the provider, normalizes the URL (strips /v1/chat/completions if pasted), tests connectivity, and saves the configuration. You can paste full endpoint URLs — they'll be cleaned up automatically.

    Evaluation Suite

    23 evaluation tasks test the agent's autonomous capabilities across coding, web research, SDLC analysis, and tool creation:

    node eval/run-agentic.mjs                          # Run all 23 tasks
    node eval/run-agentic.mjs 04-add-test              # Single task
    node eval/run-agentic.mjs --model qwen2.5-coder:32b  # Different model
    ID Task Category
    01 Fix typo in function name Code Fix
    02 Add isPrime function Code Generation
    03 Fix off-by-one bug Code Fix
    04 Write comprehensive tests Test Generation
    05 Extract functions from long method Refactoring
    06 Fix TypeScript type errors Type Safety
    07 Add REST API endpoint Feature Addition
    08 Add pagination across files Multi-File Edit
    09 CSS named color lookup (148 colors) Web Research
    10 HTTP status code lookup (32+ codes) Web Research
    11 MIME type lookup (30+ types) Web Research
    12 SDLC health analyzer AIWG Analysis
    13 SDLC artifact generator AIWG Generation
    14 Batch refactor variable names Multi-File Refactor
    15 Codebase overview from structure Code Analysis
    16 Diagnostic fix loop Error Recovery
    17 Git repository analyzer Git Integration
    18 Create custom tool from spec Tool Creation
    19 Tool from usage pattern Tool Discovery
    20 Tool management operations Tool Lifecycle
    21 Large file patch Precision Editing
    22 Skill discovery Skill System
    23 Skill execution Skill System

    Benchmark Results (Qwen3.5-122B)

    Pass rate: 100% (8/8 core tasks)
    Total: 39 turns, 55 tool calls, ~10 minutes
    Average: 4.9 turns/task, 6.9 tools/task

    AIWG Integration

    Open Agents integrates with AIWG for AI-augmented software development:

    npm i -g aiwg
    oa "analyze this project's SDLC health and set up documentation"
    Capability Description
    Structured Memory .aiwg/ directory persists project knowledge
    SDLC Artifacts Requirements, architecture, test strategy, deployment docs
    Health Analysis Score your project's SDLC maturity
    85+ Agents Specialized AI personas (Test Engineer, Security Auditor, API Designer)
    Traceability @-mention system links requirements to code to tests

    Architecture

    The core is AgenticRunner — a multi-turn tool-calling loop:

    User task → System prompt + tools → LLM → tool_calls → Execute → Feed results → LLM
                                              (repeat until task_complete or max turns)
    • Tool-first — the model explores via tools, not pre-stuffed context
    • Iterative — tests, sees failures, fixes them
    • Parallel-safe — read-only tools concurrent, mutating tools sequential
    • Observable — every tool call and result emitted as a real-time event
    • Bounded — max turns, timeout, output limits prevent runaway loops

    License

    MIT