JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 8
  • Score
    100M100P100Q71816F
  • License MIT

CLI for KindLM — behavioral regression testing for AI agents

Package Exports

  • @kindlm/cli

Readme

KindLM

CI

Behavioral regression testing for AI agents. Test what your agents do — not just what they say.

Why KindLM?

LLM evals measure text quality. KindLM tests behavior — the tool calls your agent makes, the decisions it takes, and whether it leaks PII or violates compliance rules. It runs in CI so regressions never ship.

Features

  • Tool call assertions — verify agents call the right tools with the right arguments, in the right order
  • Schema validation — structured output checked against JSON Schema (AJV)
  • PII detection — catch leaked SSNs, credit cards, emails, phone numbers, IBANs
  • LLM-as-judge — score responses against natural-language criteria (0.0–1.0)
  • Drift detection — semantic + field-level comparison against saved baselines
  • Keyword guards — require or forbid specific phrases in output
  • Latency & cost budgets — fail tests that exceed time or token-cost thresholds
  • EU AI Act compliance — generate Annex IV documentation from test results
  • CI-native — exit code 0/1, JUnit XML reporter, GitHub Actions ready

Supported Providers

Provider Example config
OpenAI openai:gpt-4o
Anthropic anthropic:claude-sonnet-4-5-20250929
Google Gemini google:gemini-2.0-flash
Mistral mistral:mistral-large-latest
Cohere cohere:command-r-plus
Ollama ollama:llama3

Quick Start

Try it instantly:

npx @kindlm/cli init

Or install globally:

npm install -g @kindlm/cli
kindlm init

Edit the generated kindlm.yaml:

kindlm: 1
project: "my-agent"

suite:
  name: "refund-agent"

providers:
  openai:
    apiKeyEnv: "OPENAI_API_KEY"

models:
  - id: "gpt-4o"
    provider: "openai"
    model: "gpt-4o"
    params:
      temperature: 0

prompts:
  refund:
    system: "You are a refund support agent. Use lookup_order(order_id) to find orders."
    user: "{{message}}"

tests:
  - name: "looks-up-order"
    prompt: "refund"
    vars:
      message: "I want to return order #12345"
    tools:
      - name: "lookup_order"
        responses:
          - when: { order_id: "12345" }
            then: { order_id: "12345", status: "eligible" }
    expect:
      toolCalls:
        - tool: "lookup_order"
          argsMatch: { order_id: "12345" }
      guardrails:
        pii:
          enabled: true
      judge:
        - criteria: "Response is empathetic and professional"
          minScore: 0.8

Run your tests:

kindlm test

Output:

  refund-agent / looks-up-order

  gpt-4o
    ✓ looks-up-order  (1.3s)
      ✓ tool_called: lookup_order
      ✓ pii: no PII detected
      ✓ judge: 0.92 ≥ 0.80

  1 passed, 0 failed
  Gates: ✓ PASSED

CLI Flags

Flag Description Added
--reporter <type> Output format: pretty (default), json, junit v1.0.0
--output <path> Write report to file v1.0.0
--compliance Generate EU AI Act Annex IV report v1.0.0
--pdf <path> Export compliance report as PDF (requires --compliance) v1.0.0
-s <suite> Run a specific suite by name v1.0.0
--runs <count> Override the repeat count from config v1.0.0
--gate <percent> Fail if suite pass rate falls below threshold (0–100) v1.0.0
--isolate Run in an isolated git worktree (clean environment) v2.0.0
--concurrency <n> Override the concurrency setting from config v2.1.0
--timeout <ms> Override the per-test timeout from config v2.1.0

CI Integration

# .github/workflows/test.yml
- run: npm install -g @kindlm/cli
- run: kindlm test --reporter junit --output results.xml

Repository Layout

packages/
  core/       @kindlm/core  — Business logic, zero I/O dependencies
  cli/        @kindlm/cli   — CLI entry point
  cloud/      @kindlm/cloud — Cloudflare Workers API + D1 database
docs/         Technical specs and documentation
site/         Documentation website (Next.js)

Documentation

Full docs: kindlm.dev | Source: docs/

License

MIT (core + CLI) | AGPL (cloud)