Package Exports
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@kindlm/cli) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
KindLM
Behavioral regression testing for AI agents. Test what your agents do — not just what they say.
Why KindLM?
LLM evals measure text quality. KindLM tests behavior — the tool calls your agent makes, the decisions it takes, and whether it leaks PII or violates compliance rules. It runs in CI so regressions never ship.
Features
- Tool call assertions — verify agents call the right tools with the right arguments, in the right order
- Schema validation — structured output checked against JSON Schema (AJV)
- PII detection — catch leaked SSNs, credit cards, emails, phone numbers, IBANs
- LLM-as-judge — score responses against natural-language criteria (0.0–1.0)
- Drift detection — semantic + field-level comparison against saved baselines
- Keyword guards — require or forbid specific phrases in output
- Latency & cost budgets — fail tests that exceed time or token-cost thresholds
- EU AI Act compliance — generate Annex IV documentation from test results
- CI-native — exit code 0/1, JUnit XML reporter, GitHub Actions ready
Supported Providers
| Provider | Example config |
|---|---|
| OpenAI | openai:gpt-4o |
| Anthropic | anthropic:claude-sonnet-4-5-20250929 |
| Google Gemini | google:gemini-2.0-flash |
| Mistral | mistral:mistral-large-latest |
| Cohere | cohere:command-r-plus |
| Ollama | ollama:llama3 |
Quick Start
Try it instantly:
npx @kindlm/cli initOr install globally:
npm install -g @kindlm/cli
kindlm initEdit the generated kindlm.yaml:
kindlm: 1
project: "my-agent"
suite:
name: "refund-agent"
providers:
openai:
apiKeyEnv: "OPENAI_API_KEY"
models:
- id: "gpt-4o"
provider: "openai"
model: "gpt-4o"
params:
temperature: 0
prompts:
refund:
system: "You are a refund support agent. Use lookup_order(order_id) to find orders."
user: "{{message}}"
tests:
- name: "looks-up-order"
prompt: "refund"
vars:
message: "I want to return order #12345"
tools:
- name: "lookup_order"
responses:
- when: { order_id: "12345" }
then: { order_id: "12345", status: "eligible" }
expect:
toolCalls:
- tool: "lookup_order"
argsMatch: { order_id: "12345" }
guardrails:
pii:
enabled: true
judge:
- criteria: "Response is empathetic and professional"
minScore: 0.8Run your tests:
kindlm testOutput:
refund-agent / looks-up-order
gpt-4o
✓ looks-up-order (1.3s)
✓ tool_called: lookup_order
✓ pii: no PII detected
✓ judge: 0.92 ≥ 0.80
1 passed, 0 failed
Gates: ✓ PASSEDCLI Flags
| Flag | Description | Added |
|---|---|---|
--reporter <type> |
Output format: pretty (default), json, junit |
v1.0.0 |
--output <path> |
Write report to file | v1.0.0 |
--compliance |
Generate EU AI Act Annex IV report | v1.0.0 |
--pdf <path> |
Export compliance report as PDF (requires --compliance) |
v1.0.0 |
-s <suite> |
Run a specific suite by name | v1.0.0 |
--runs <count> |
Override the repeat count from config |
v1.0.0 |
--gate <percent> |
Fail if suite pass rate falls below threshold (0–100) | v1.0.0 |
--isolate |
Run in an isolated git worktree (clean environment) | v2.0.0 |
--concurrency <n> |
Override the concurrency setting from config | v2.1.0 |
--timeout <ms> |
Override the per-test timeout from config | v2.1.0 |
CI Integration
# .github/workflows/test.yml
- run: npm install -g @kindlm/cli
- run: kindlm test --reporter junit --output results.xmlRepository Layout
packages/
core/ @kindlm/core — Business logic, zero I/O dependencies
cli/ @kindlm/cli — CLI entry point
cloud/ @kindlm/cloud — Cloudflare Workers API + D1 database
docs/ Technical specs and documentation
site/ Documentation website (Next.js)Documentation
Full docs: kindlm.dev | Source: docs/
License
MIT (core + CLI) | AGPL (cloud)