JSPM

  • Created
  • Published
  • Downloads 2690
  • Score
    100M100P100Q132740F
  • License MIT

Multi-agent autonomous startup system for Claude Code, Codex CLI, and Gemini CLI

Package Exports

    This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (loki-mode) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

    Readme

    Loki Mode

    The First Truly Autonomous Multi-Agent Startup System

    Claude Code Agent Types Loki Mode HumanEval SWE-bench License

    Documentation Website | Architecture | Research | Comparisons

    PRD → Deployed Product in Zero Human Intervention

    Loki Mode transforms a Product Requirements Document into a fully built, tested, deployed, and revenue-generating product while you sleep. No manual steps. No intervention. Just results.


    Demo

    asciicast

    Click to watch Loki Mode build a complete Todo App from PRD - zero human intervention


    Presentation

    Loki Mode Presentation

    9 slides: Problem, Solution, 41 Agents, RARV Cycle, Benchmarks, Multi-Provider, Full Lifecycle

    Download PPTX for offline viewing


    Usage

    # Install
    git clone https://github.com/asklokesh/loki-mode.git ~/.claude/skills/loki-mode
    
    # Run
    claude --dangerously-skip-permissions
    
    # Then say:
    Loki Mode with PRD at ./my-prd.md

    Option 2: Shell Script

    # Clone repo
    git clone https://github.com/asklokesh/loki-mode.git
    cd loki-mode
    
    # Run directly
    ./autonomy/run.sh ./my-prd.md

    Option 3: npm

    npm install -g loki-mode
    loki start ./my-prd.md

    Option 4: Homebrew (macOS/Linux)

    brew install asklokesh/tap/loki-mode
    loki start ./my-prd.md

    Option 5: Docker

    docker run -v $(pwd):/workspace asklokesh/loki-mode:5.1.1 ./my-prd.md

    Option 6: VS Code Extension

    Install directly from the VS Code Marketplace for a visual interface:

    # From VS Code
    1. Open Extensions (Cmd+Shift+X / Ctrl+Shift+X)
    2. Search "loki-mode"
    3. Click Install
    
    # Or via command line
    code --install-extension asklokesh.loki-mode

    Important: Start the Loki Mode server before using the extension:

    loki start              # If using CLI
    # or
    ./autonomy/run.sh       # If running from source

    Extension Features:

    • Start/Stop/Pause/Resume sessions from the activity bar
    • Real-time task progress in the sidebar
    • Provider selection (Claude, Codex, Gemini)
    • Status bar showing current phase and progress
    • Quick actions menu (Cmd+Shift+L / Ctrl+Shift+L)

    View on Marketplace

    See Installation Guide for more details.

    Multi-Provider Support (v5.0.0)

    Loki Mode supports three AI providers:

    # Claude Code (default - full features)
    loki start --provider claude ./my-prd.md
    
    # OpenAI Codex CLI (degraded mode)
    loki start --provider codex ./my-prd.md
    
    # Google Gemini CLI (degraded mode)
    loki start --provider gemini ./my-prd.md
    
    # Or via environment variable
    LOKI_PROVIDER=codex loki start ./my-prd.md

    Provider Comparison:

    Provider Features Parallel Agents Task Tool
    Claude Full Yes (10+) Yes
    Codex Degraded No No
    Gemini Degraded No No

    See skills/providers.md for full provider documentation.


    Benchmark Results

    Three-Way Comparison (HumanEval)

    System Pass@1 Details
    Loki Mode (Multi-Agent) 98.78% 162/164 problems, RARV cycle recovered 2
    Direct Claude 98.17% 161/164 problems (baseline)
    MetaGPT 85.9-87.7% Published benchmark

    Loki Mode beats MetaGPT by +11-13% thanks to the RARV (Reason-Act-Reflect-Verify) cycle.

    Full Results

    Benchmark Score Details
    Loki Mode HumanEval 98.78% Pass@1 162/164 (multi-agent with RARV)
    Direct Claude HumanEval 98.17% Pass@1 161/164 (single agent baseline)
    Direct Claude SWE-bench 99.67% patch gen 299/300 problems
    Loki Mode SWE-bench 99.67% patch gen 299/300 problems
    Model Claude Opus 4.5

    Key Finding: Multi-agent RARV matches single-agent performance on both benchmarks after timeout optimization. The 4-agent pipeline (Architect->Engineer->QA->Reviewer) achieves the same 99.67% patch generation as direct Claude.

    See benchmarks/results/ for full methodology and solutions.


    What is Loki Mode?

    Loki Mode is a multi-provider AI skill that orchestrates 41 specialized AI agent types across 7 swarms to autonomously build, test, deploy, and scale complete startups. Works with Claude Code, OpenAI Codex CLI, and Google Gemini CLI. It dynamically spawns only the agents you need—5-10 for simple projects, 100+ for complex startups—working in parallel with continuous self-verification.

    PRD → Research → Architecture → Development → Testing → Deployment → Marketing → Revenue

    Just say "Loki Mode" and point to a PRD. Walk away. Come back to a deployed product.


    Why Loki Mode?

    Better Than Anything Out There

    What Others Do What Loki Mode Does
    Single agent writes code linearly 100+ agents work in parallel across engineering, ops, business, data, product, and growth
    Manual deployment required Autonomous deployment to AWS, GCP, Azure, Vercel, Railway with blue-green and canary strategies
    No testing or basic unit tests 7 automated quality gates: input/output guardrails, static analysis, blind review, anti-sycophancy, severity blocking, test coverage
    Code only - you handle the rest Full business operations: marketing, sales, legal, HR, finance, investor relations
    Stops on errors Self-healing: circuit breakers, dead letter queues, exponential backoff, automatic recovery
    No visibility into progress Real-time dashboard with agent monitoring, task queues, and live status updates
    "Done" when code is written Never "done": continuous optimization, A/B testing, customer feedback loops, perpetual improvement

    Core Advantages

    1. Truly Autonomous: RARV (Reason-Act-Reflect-Verify) cycle with self-verification achieves 2-3x quality improvement
    2. Massively Parallel: 100+ agents working simultaneously, not sequential single-agent bottlenecks
    3. Production-Ready: Not just code—handles deployment, monitoring, incident response, and business operations
    4. Self-Improving: Learns from mistakes, updates continuity logs, prevents repeated errors
    5. Zero Babysitting: Auto-resumes on rate limits, recovers from failures, runs until completion
    6. Efficiency Optimized: ToolOrchestra-inspired metrics track cost per task, reward signals drive continuous improvement

    Features & Documentation

    Feature Description Documentation
    VS Code Extension Visual interface with sidebar, status bar Marketplace
    Multi-Provider (v5.0.0) Claude, Codex, Gemini support Provider Guide
    CLI (v4.1.0) loki command for start/stop/pause/status CLI Commands
    Config Files YAML configuration support autonomy/config.example.yaml
    Dashboard Realtime Kanban board, agent monitoring Dashboard Guide
    41 Agent Types Engineering, Ops, Business, Data, Product, Growth, Orchestration Agent Definitions
    RARV Cycle Reason-Act-Reflect-Verify workflow Core Workflow
    Quality Gates 7-gate system: guardrails, static analysis, blind review, anti-sycophancy, severity blocking, test coverage Quality Control
    Memory System (v5.15.0) Complete 3-tier memory with progressive disclosure Memory Architecture
    Parallel Workflows Git worktree-based parallelism Parallel Workflows
    GitHub Integration Issue import, PR creation, status sync GitHub Integration
    Distribution npm, Homebrew, Docker installation Installation Guide
    Research Foundation OpenAI, DeepMind, Anthropic patterns Acknowledgements
    Benchmarks HumanEval 98.78%, SWE-bench 99.67% Benchmark Results
    Comparisons vs Auto-Claude, Cursor Auto-Claude, Cursor

    Dashboard & Real-Time Monitoring

    Monitor your autonomous startup being built in real-time through the Loki Mode dashboard:

    Agent Monitoring

    Loki Mode Dashboard - Active Agents

    Track all active agents in real-time:

    • Agent ID and Type (frontend, backend, QA, DevOps, etc.)
    • Model Badge (Sonnet, Haiku, Opus) with color coding
    • Current Work being performed
    • Runtime and Tasks Completed
    • Status (active, completed)

    Task Queue Visualization

    Loki Mode Dashboard - Task Queue

    Four-column kanban view:

    • Pending: Queued tasks waiting for agents
    • In Progress: Currently being worked on
    • Completed: Successfully finished (shows last 10)
    • Failed: Tasks requiring attention

    Live Status Monitor

    # Watch status updates in terminal
    watch -n 2 cat .loki/STATUS.txt
    ╔════════════════════════════════════════════════════════════════╗
    ║                    LOKI MODE STATUS                            ║
    ╚════════════════════════════════════════════════════════════════╝
    
    Phase: DEVELOPMENT
    
    Active Agents: 47
      ├─ Engineering: 18
      ├─ Operations: 12
      ├─ QA: 8
      └─ Business: 9
    
    Tasks:
      ├─ Pending:     10
      ├─ In Progress: 47
      ├─ Completed:   203
      └─ Failed:      0
    
    Last Updated: 2026-01-04 20:45:32

    Access the dashboard:

    # Automatically opens when running autonomously
    ./autonomy/run.sh ./docs/requirements.md
    
    # Or open manually
    open .loki/dashboard/index.html

    Auto-refreshes every 3 seconds. Works with any modern browser.


    Autonomous Capabilities

    RARV Cycle: Reason-Act-Reflect-Verify

    Loki Mode doesn't just write code—it thinks, acts, learns, and verifies:

    1. REASON
       └─ Read .loki/CONTINUITY.md including "Mistakes & Learnings"
       └─ Check .loki/state/ and .loki/queue/
       └─ Identify next task or improvement
    
    2. ACT
       └─ Execute task, write code
       └─ Commit changes atomically (git checkpoint)
    
    3. REFLECT
       └─ Update .loki/CONTINUITY.md with progress
       └─ Update state files
       └─ Identify NEXT improvement
    
    4. VERIFY
       └─ Run automated tests (unit, integration, E2E)
       └─ Check compilation/build
       └─ Verify against spec
    
       IF VERIFICATION FAILS:
       ├─ Capture error details (stack trace, logs)
       ├─ Analyze root cause
       ├─ UPDATE "Mistakes & Learnings" in CONTINUITY.md
       ├─ Rollback to last good git checkpoint if needed
       └─ Apply learning and RETRY from REASON

    Result: 2-3x quality improvement through continuous self-verification.

    Perpetual Improvement Mode

    There is NEVER a "finished" state. After completing the PRD, Loki Mode:

    • Runs performance optimizations
    • Adds missing test coverage
    • Improves documentation
    • Refactors code smells
    • Updates dependencies
    • Enhances user experience
    • Implements A/B test learnings

    It keeps going until you stop it.

    Auto-Resume & Self-Healing

    Rate limits? Exponential backoff and automatic resume. Errors? Circuit breakers, dead letter queues, retry logic. Interruptions? State checkpoints every 5 seconds—just restart.

    # Start autonomous mode
    ./autonomy/run.sh ./docs/requirements.md
    
    # Hit rate limit? Script automatically:
    # ├─ Saves state checkpoint
    # ├─ Waits with exponential backoff (60s → 120s → 240s...)
    # ├─ Resumes from exact point
    # └─ Continues until completion or max retries (default: 50)

    Quick Start

    1. Install

    # Option A: npm (recommended)
    npm install -g loki-mode
    
    # Option B: Homebrew (macOS/Linux)
    brew tap asklokesh/tap && brew install loki-mode
    loki-mode-install-skill  # Set up Claude Code integration
    
    # Option C: Docker
    docker pull asklokesh/loki-mode:5.0.0
    
    # Option D: Git clone
    git clone https://github.com/asklokesh/loki-mode.git ~/.claude/skills/loki-mode

    See Installation Guide for detailed instructions.

    2. Create a PRD

    # Product: AI-Powered Todo App
    
    ## Overview
    Build a todo app with AI-powered task suggestions and deadline predictions.
    
    ## Features
    - User authentication (email/password)
    - Create, read, update, delete todos
    - AI suggests next tasks based on patterns
    - Smart deadline predictions
    - Mobile-responsive design
    
    ## Tech Stack
    - Next.js 14 with TypeScript
    - PostgreSQL database
    - OpenAI API for suggestions
    - Deploy to Vercel

    Save as my-prd.md.

    3. Run Loki Mode

    # Using the CLI (v4.1.0)
    loki start ./my-prd.md
    
    # Or using run.sh directly
    ./autonomy/run.sh ./my-prd.md
    
    # Or manual mode in Claude Code
    claude --dangerously-skip-permissions
    > Loki Mode with PRD at ./my-prd.md

    4. Monitor Progress

    # Check status
    loki status
    
    # Open dashboard in browser
    loki dashboard
    
    # Or watch terminal output
    watch -n 2 cat .loki/STATUS.txt

    5. Walk Away

    Seriously. Go get coffee. It'll be deployed when you get back.

    That's it. No configuration. No manual steps. No intervention.


    CLI Commands (v4.1.0)

    The loki CLI provides easy access to all Loki Mode features:

    Command Description
    loki start [PRD] Start Loki Mode with optional PRD file
    loki stop Stop execution immediately
    loki pause Pause after current session
    loki resume Resume paused execution
    loki status Show current status
    loki dashboard Open dashboard in browser
    loki import Import GitHub issues as tasks
    loki config show Show configuration
    loki config init Create config file from template
    loki version Show version

    Configuration File

    Create a YAML config file for persistent settings:

    # Initialize config
    loki config init
    
    # Or copy template manually
    cp ~/.claude/skills/loki-mode/autonomy/config.example.yaml .loki/config.yaml

    Config search order: .loki/config.yaml (project) -> ~/.config/loki-mode/config.yaml (global)


    Agent Swarms (41 Types)

    Loki Mode has 41 predefined agent types organized into 7 specialized swarms. The orchestrator spawns only what you need—simple projects use 5-10 agents, complex startups spawn 100+.

    Agent Swarms Visualization

    Engineering (8 types)

    eng-frontend eng-backend eng-database eng-mobile eng-api eng-qa eng-perf eng-infra

    Operations (8 types)

    ops-devops ops-sre ops-security ops-monitor ops-incident ops-release ops-cost ops-compliance

    Business (8 types)

    biz-marketing biz-sales biz-finance biz-legal biz-support biz-hr biz-investor biz-partnerships

    Data (3 types)

    data-ml data-eng data-analytics

    Product (3 types)

    prod-pm prod-design prod-techwriter

    Growth (4 types)

    growth-hacker growth-community growth-success growth-lifecycle

    Review (3 types)

    review-code review-business review-security

    Orchestration (4 types)

    orch-planner orch-sub-planner orch-judge orch-coordinator

    View All 41 Agent Types with Capabilities
    Swarm Agent Capabilities
    Engineering eng-frontend React/Vue/Svelte, TypeScript, Tailwind, accessibility, responsive design
    eng-backend Node/Python/Go, REST/GraphQL, auth, business logic, middleware
    eng-database PostgreSQL/MySQL/MongoDB, migrations, query optimization, indexing
    eng-mobile React Native/Flutter/Swift/Kotlin, offline-first, push notifications
    eng-api OpenAPI specs, SDK generation, versioning, webhooks, rate limiting
    eng-qa Unit/integration/E2E tests, coverage, automation, test data
    eng-perf Profiling, benchmarking, optimization, caching, load testing
    eng-infra Docker, K8s manifests, IaC, networking, security hardening
    Operations ops-devops CI/CD pipelines, GitHub Actions, GitLab CI, Jenkins
    ops-sre Reliability, SLOs/SLIs, capacity planning, runbooks
    ops-security SAST/DAST, pen testing, vulnerability management
    ops-monitor Observability, Datadog/Grafana, alerting, dashboards
    ops-incident Incident response, RCA, post-mortems, communication
    ops-release Versioning, changelogs, blue-green, canary, rollbacks
    ops-cost Cloud cost optimization, right-sizing, FinOps
    ops-compliance SOC2, GDPR, HIPAA, PCI-DSS, audit preparation
    Business biz-marketing Landing pages, SEO, content, email campaigns, social media
    biz-sales CRM setup, outreach, demos, proposals, pipeline
    biz-finance Billing (Stripe), invoicing, metrics, runway, pricing
    biz-legal ToS, privacy policy, contracts, IP protection
    biz-support Help docs, FAQs, ticket system, chatbot, knowledge base
    biz-hr Job posts, recruiting, onboarding, culture docs
    biz-investor Pitch decks, investor updates, data room, cap table
    biz-partnerships BD outreach, integrations, co-marketing, API partnerships
    Data data-ml Model training, MLOps, feature engineering, inference
    data-eng ETL pipelines, data warehousing, dbt, Airflow
    data-analytics Product analytics, A/B tests, dashboards, insights
    Product prod-pm Backlog grooming, prioritization, roadmap, specs
    prod-design Design system, Figma, UX patterns, prototypes
    prod-techwriter API docs, guides, tutorials, release notes
    Growth growth-hacker Growth experiments, viral loops, referral programs
    growth-community Community building, Discord/Slack, ambassador programs
    growth-success Customer success, health scoring, churn prevention
    growth-lifecycle Email lifecycle, in-app messaging, re-engagement
    Review review-code Code quality, design patterns, SOLID, maintainability
    review-business Requirements alignment, business logic, edge cases
    review-security Vulnerabilities, auth/authz, OWASP Top 10
    Orchestration orch-planner Task decomposition, dependency analysis, work distribution
    orch-sub-planner Domain-specific planning, recursive task breakdown
    orch-judge Cycle continuation decisions, goal assessment, escalation
    orch-coordinator Cross-stream coordination, merge decisions, conflict resolution

    See references/agent-types.md for complete agent type definitions.


    How It Works

    Skill Architecture (v3.0+)

    Loki Mode uses a progressive disclosure architecture to minimize context usage:

    SKILL.md (~190 lines)         # Always loaded: core RARV cycle, autonomy rules
    skills/
      00-index.md                  # Module routing table
      agents.md                    # Agent dispatch, A2A patterns
      production.md                # HN patterns, batch processing, CI/CD
      quality-gates.md             # Review system, severity handling
      testing.md                   # Playwright, E2E, property-based
      model-selection.md           # Task tool, parallelization
      artifacts.md                 # Code generation patterns
      patterns-advanced.md         # Constitutional AI, debate
      troubleshooting.md           # Error recovery, fallbacks
    references/                    # Deep documentation (23KB+ files)

    Why this matters:

    • Original 1,517-line SKILL.md consumed ~15% of context before any work began
    • Now only ~1% of context for core skill + on-demand modules
    • More room for actual code and reasoning

    Phase Execution

    Phase Description
    0. Bootstrap Create .loki/ directory structure, initialize state
    1. Discovery Parse PRD, competitive research via web search
    2. Architecture Tech stack selection with self-reflection
    3. Infrastructure Provision cloud, CI/CD, monitoring
    4. Development Implement with TDD, parallel code review
    5. QA 7 quality gates, security audit, load testing
    6. Deployment Blue-green deploy, auto-rollback on errors
    7. Business Marketing, sales, legal, support setup
    8. Growth Continuous optimization, A/B testing, feedback loops

    Parallel Code Review

    Every code change goes through 3 specialized reviewers simultaneously:

    IMPLEMENT → REVIEW (parallel) → AGGREGATE → FIX → RE-REVIEW → COMPLETE
                    │
                    ├─ code-reviewer (Sonnet) - Code quality, patterns, best practices
                    ├─ business-logic-reviewer (Sonnet) - Requirements, edge cases, UX
                    └─ security-reviewer (Sonnet) - Vulnerabilities, OWASP Top 10

    Severity-based issue handling:

    • Critical/High/Medium: Block. Fix immediately. Re-review.
    • Low: Add // TODO(review): ... comment, continue.
    • Cosmetic: Add // FIXME(nitpick): ... comment, continue.

    Directory Structure

    .loki/
    ├── state/          # Orchestrator and agent states
    ├── queue/          # Task queue (pending, in-progress, completed, dead-letter)
    ├── memory/         # Episodic, semantic, and procedural memory
    ├── metrics/        # Efficiency tracking and reward signals
    ├── messages/       # Inter-agent communication
    ├── logs/           # Audit logs
    ├── config/         # Configuration files
    ├── prompts/        # Agent role prompts
    ├── artifacts/      # Releases, reports, backups
    ├── dashboard/      # Real-time monitoring dashboard
    └── scripts/        # Helper scripts

    Memory System (v5.15.0)

    Complete 3-tier memory architecture with progressive disclosure:

    WORKING MEMORY (CONTINUITY.md)
            |
            v
    EPISODIC MEMORY (.loki/memory/episodic/)
            |
            v (consolidation)
    SEMANTIC MEMORY (.loki/memory/semantic/)
            |
            v
    PROCEDURAL MEMORY (.loki/memory/skills/)

    Key Features:

    • Progressive Disclosure: 3-layer loading (index ~100 tokens, timeline ~500 tokens, full details) reduces context usage by 60-80%
    • Token Economics: Track discovery vs read tokens, automatic threshold-based optimization
    • Vector Search: Optional embedding-based similarity search (sentence-transformers)
    • Consolidation Pipeline: Automatic episodic-to-semantic transformation
    • Task-Aware Retrieval: Different memory strategies for exploration, implementation, debugging, review, and refactoring

    CLI Commands:

    loki memory index           # View index layer
    loki memory timeline        # View compressed history
    loki memory consolidate     # Run consolidation pipeline
    loki memory economics       # View token usage metrics
    loki memory retrieve "query"  # Test task-aware retrieval

    API Endpoints:

    • GET /api/memory - Memory summary
    • POST /api/memory/retrieve - Query memories
    • POST /api/memory/consolidate - Trigger consolidation
    • GET /api/memory/economics - Token economics

    See references/memory-system.md for complete documentation.


    Example PRDs

    Test Loki Mode with these pre-built PRDs in the examples/ directory:

    PRD Complexity Est. Time Description
    simple-todo-app.md Low ~10 min Basic todo app - tests core functionality
    api-only.md Low ~10 min REST API only - tests backend agents
    static-landing-page.md Low ~5 min HTML/CSS only - tests frontend/marketing
    full-stack-demo.md Medium ~30-60 min Complete bookmark manager - full test
    # Example: Run with simple todo app
    ./autonomy/run.sh examples/simple-todo-app.md

    Configuration

    Autonomy Settings

    Customize the autonomous runner with environment variables:

    LOKI_MAX_RETRIES=100 \
    LOKI_BASE_WAIT=120 \
    LOKI_MAX_WAIT=7200 \
    ./autonomy/run.sh ./docs/requirements.md
    Variable Default Description
    LOKI_PROVIDER claude AI provider: claude, codex, gemini
    LOKI_MAX_RETRIES 50 Maximum retry attempts before giving up
    LOKI_BASE_WAIT 60 Base wait time in seconds
    LOKI_MAX_WAIT 3600 Maximum wait time (1 hour)
    LOKI_SKIP_PREREQS false Skip prerequisite checks

    Circuit Breakers

    # .loki/config/circuit-breakers.yaml
    defaults:
      failureThreshold: 5
      cooldownSeconds: 300

    External Alerting

    # .loki/config/alerting.yaml
    channels:
      slack:
        webhook_url: "${SLACK_WEBHOOK_URL}"
        severity: [critical, high]
      pagerduty:
        integration_key: "${PAGERDUTY_KEY}"
        severity: [critical]

    Requirements

    • Claude Code with --dangerously-skip-permissions flag
    • Internet access for competitive research and deployment
    • Cloud provider credentials (for deployment phase)
    • Python 3 (for test suite)

    Optional but recommended:

    • Git (for version control and checkpoints)
    • Node.js/npm (for dashboard and web projects)
    • Docker (for containerized deployments)

    Integrations

    Vibe Kanban (Visual Dashboard)

    Integrate with Vibe Kanban for a visual kanban board:

    # 1. Start Vibe Kanban (terminal 1)
    npx vibe-kanban
    
    # 2. Run Loki Mode (terminal 2)
    ./autonomy/run.sh ./prd.md
    
    # 3. Export tasks to see them in Vibe Kanban (terminal 3)
    ./scripts/export-to-vibe-kanban.sh
    
    # 4. Optional: Auto-sync for real-time updates
    ./scripts/vibe-sync-watcher.sh

    Important: Vibe Kanban integration requires manual export. Tasks don't automatically appear - you must run the export script to sync.

    Benefits:

    • Visual progress tracking of all active agents
    • Manual intervention/prioritization when needed
    • Code review with visual diffs
    • Multi-project dashboard

    See integrations/vibe-kanban.md for complete step-by-step setup guide and troubleshooting.


    Testing

    Run the comprehensive test suite:

    # Run all tests
    ./tests/run-all-tests.sh
    
    # Or run individual test suites
    ./tests/test-bootstrap.sh        # Directory structure, state init
    ./tests/test-task-queue.sh       # Queue operations, priorities
    ./tests/test-circuit-breaker.sh  # Failure handling, recovery
    ./tests/test-agent-timeout.sh    # Timeout, stuck process handling
    ./tests/test-state-recovery.sh   # Checkpoints, recovery

    Contributing

    Contributions welcome! Please:

    1. Read SKILL.md to understand the core architecture
    2. Review skills/00-index.md for module organization (v3.0+)
    3. Check references/agents.md for agent definitions
    4. Open an issue for bugs or feature requests
    5. Submit PRs with clear descriptions and tests

    License

    MIT License - see LICENSE for details.


    Acknowledgments

    Loki Mode incorporates research and patterns from leading AI labs and practitioners:

    Research Foundation

    Source Key Contribution
    Anthropic: Building Effective Agents Evaluator-optimizer pattern, parallelization
    Anthropic: Constitutional AI Self-critique against principles
    DeepMind: Scalable Oversight via Debate Debate-based verification
    DeepMind: SIMA 2 Self-improvement loop
    OpenAI: Agents SDK Guardrails, tripwires, tracing
    NVIDIA ToolOrchestra Efficiency metrics, reward signals
    CONSENSAGENT (ACL 2025) Anti-sycophancy, blind review
    GoalAct Hierarchical planning

    Practitioner Insights

    • Boris Cherny (Claude Code creator) - Self-verification loop, extended thinking
    • Simon Willison - Sub-agents for context isolation, skills system
    • Hacker News Community - Production patterns from real deployments

    Inspirations

    Full Acknowledgements - Complete list of 50+ research papers, articles, and resources

    Built for the Claude Code ecosystem, powered by Anthropic's Claude models (Sonnet, Haiku, Opus).


    Ready to build a startup while you sleep?

    git clone https://github.com/asklokesh/loki-mode.git ~/.claude/skills/loki-mode
    ./autonomy/run.sh your-prd.md

    Keywords: claude-code, claude-skills, ai-agents, autonomous-development, multi-agent-system, sdlc-automation, startup-automation, devops, mlops, deployment-automation, self-healing, perpetual-improvement