Package Exports
- @dzhechkov/harness-cli
Readme
@dzhechkov/harness-cli
The dz CLI — the main entry point to the DZ Harness Hub. Install AI skills for Claude Code, Codex, OpenCode, Hermes, OpenClaude from a single command.
Install
npm install -g @dzhechkov/harness-cli
# Update to latest version (run from outside any workspace project):
cd /tmp && npm install -g @dzhechkov/harness-cli@latestNote: If you get
EUNSUPPORTEDPROTOCOL workspace:*, you're inside a pnpm/yarn workspace. Run the install from/tmpor~instead.
User Journey — from install to mastery
All 31 commands mapped to a real workflow:
DISCOVER → INSTALL → USE → CREATE → MAINTAIN → SHAREPhase 1: Discover (what's available?)
npm install -g @dzhechkov/harness-cli # install the CLI
dz help # see all commands
dz pretrain # analyze project files → recommend by tech stack
dz recommend "build API and deploy to K8s" # keyword match → skills + toolkits
dz recommend "work on this project" # generic? → auto-runs pretrain → recommends by stack
dz stats # 31 packages, 115 skills, 5 targets, 11 presets
dz dashboard # visual panel — packages, adapters, skill packs
dz registry # browse all 115 skills by category
dz registry search kubernetes # find specific skills
dz registry --category devops # filter by domain
dz downloads # npm weekly download statsPhase 2: Install (set up your workspace)
# Full setup with self-learning (recommended):
dz setup --target claude-code --preset devops # pretrain + hooks + JSONL memory
# With AgentDB vector memory (semantic search + self-learning):
dz setup --target claude-code --preset devops --memory agentdb # .rvf + 41 MCP tools
# Or just install skills (no learning):
dz init --target claude-code --preset devops # 27 DevOps skills
dz init --target openclaude --preset web3 # 12 DeFi skills for OpenClaude
dz init --target codex --preset mcp # 16 MCP skills for Codex
# Or pick individual skills:
dz init --target claude-code --select terraform,kubernetes,docker-compose
# Or install from any npm package:
dz install @dzhechkov/skills-devops # npm install + copy skills
# Verify everything is correct:
dz verify # structural validation
dz doctor # 7 health checks
dz list # show installed skills
dz info --id terraform # detailed info about a skillPhase 3: Use (work with your agent)
# Now use Claude Code / Codex / OpenCode / Hermes normally.
# Skills are auto-discovered from the platform's skills directory.
# Example in Claude Code:
# "Review this PR" → pr-review skill activates
# "Design an API" → api-design skill activates
# "Fix this CI" → ci-fix skill activatesPhase 4: Create (build your own skills)
# Scaffold a new skill:
dz create-skill --name my-skill --description "What it does" --tier 2
# With BTO-compatible eval templates:
dz create-skill --name my-skill --bto
# Benchmark your skill (aim for Grade A):
dz benchmark .claude/skills/my-skill # single skill — 19 L0 checks
dz benchmark packages/@dzhechkov/skills-devops --all # batch all
dz benchmark skill-a --compare skill-b # A/B compare
# Find skills to canonicalize from the ecosystem:
dz scout # scan 9 sources (GitHub, npm+plugins, HN, ...)
dz scout --deep # deep analysis with SKILL.md parsing
dz auto-canonicalize --source github.com/user/repo --pack packages/@dzhechkov/skills-devopsPhase 5: Maintain (keep skills fresh)
# Check for upstream changes (canonicalized skills):
dz sync-upstream --list # which packages have external sources?
dz sync-upstream --all # check all against upstream
dz sync-upstream --package packages/@dzhechkov/skills-devops # check one
# Check installed skills vs canonical:
dz upgrade # shows which skills need update
dz upgrade --target openclaude # check specific platform
# Sync canonical to legacy layout:
dz sync # canonical → project skills
dz migrate # detect legacy installations
# Orchestrate dynamic workflows:
dz workflow --task coverage-lift # parallel coverage improvement
dz workflow --task security-audit # adversarial security scan
# Cross-host state sync:
dz roam --apply # sync agent state across machinesPhase 6: Share (publish to the world)
# Publish updated packages to npm:
dz publish --dry-run # preview
dz publish --filter skills-devops # publish specific package
dz publish # publish all changed packagesThree Ways to Install Skills
| Individual Skill | Preset | npx Package | |
|---|---|---|---|
| What | 1 SKILL.md file | Curated list of skill names | Full toolkit with orchestration |
| Contains | Instructions for 1 task | N skill references | Skills + commands + rules + shards + agents + memory |
| Pipeline | No | No | Yes (phases, checkpoints, governance) |
| Self-learning | No | dz setup adds it |
Built-in |
| Install | dz init --select X |
dz setup --preset X |
npx @dzhechkov/X init |
| Example | terraform |
devops (27 skills) |
keysarium (7-phase research) |
# One skill:
dz init --target claude-code --select design-thinking
# Curated set by topic (recommended):
dz setup --target claude-code --preset meta # 15 development skills + self-learning
# Full toolkit with orchestrated pipeline:
npx @dzhechkov/keysarium init # 7-phase research + commands + memoryWhen to use which:
- Need 1 specific capability →
--select - Need a themed set that works together →
--preset - Need a full pipeline with commands and governance →
npx
Available Presets (11)
| Preset | Skills | Description |
|---|---|---|
meta |
15 | Development process (explore, goap-research, problem-solver, design-thinking, feature-adr, knowledge-extractor, understand-anything-bridge, agentshield-scan, adversarial-verifier) |
qe-engineer |
20 | Quality engineering (test-gen, coverage, chaos, defect, ...) |
bto |
1 | Build-Benchmark-Test-Optimize pipeline |
health |
8 | Medical AI (diagnostics, drugs, labs, clinical decisions) |
keysarium |
9 | Full research toolkit (feature-adr, presentation, reverse-eng) |
p-replicator |
10 | AI product development (/replicate, SPARC PRD, pipeline-forge) |
feature-adr |
5 | Feature pipeline (feature-adr, explore, frontend-design) |
devops |
27 | DevOps skills (terraform, kubernetes, c4-architecture, risk-assessment, ...) |
web3 |
12 | Web3/DeFi (quicknode, zerion, symbiosis, bankr, veil, neynar, ...) |
mcp |
16 | MCP servers (agentdb, brave-search, gmail, gitlab, comfyui, notion, ...) |
academic |
5 | Thesis defense (review, questions, doc-check, live defense + answer eval) |
Standalone Packages (install via npx, no dz CLI needed)
| Package | Install | What it does |
|---|---|---|
| @dzhechkov/keysarium | npx @dzhechkov/keysarium init |
Full 7-phase research toolkit |
| @dzhechkov/design-thinking | npx @dzhechkov/design-thinking init |
d.school 6-phase Design Thinking (8 skills) |
| @dzhechkov/p-replicator | npx @dzhechkov/p-replicator init |
AI product development (/replicate pipeline) |
| @dzhechkov/health-advisor | npx @dzhechkov/health-advisor init |
Medical AI (25 skills) |
| @dzhechkov/skills-bto | npx @dzhechkov/skills-bto init |
BTO benchmarking (Build-Test-Optimize) |
| @dzhechkov/skills-feature-adr | npx @dzhechkov/skills-feature-adr init |
11-step feature pipeline |
| @dzhechkov/skills-edu-site | npx @dzhechkov/skills-edu-site init |
Gamified edu site generator |
| @dzhechkov/skills-transcript-site | npx @dzhechkov/skills-transcript-site init |
Transcript → interactive site |
| @dzhechkov/skills-analyst-manual | npx @dzhechkov/skills-analyst-manual init |
3-phase analyst composite |
Difference: dz init --preset installs individual skills from .claude/skills/ source into a target platform tree. Standalone npx packages have their own CLI and install a complete toolkit with commands, rules, shards, and agents — a richer but self-contained experience.
A skill and its npx toolkit are not duplicates — they're a graduation. Several skills (e.g.
feature-adr,design-thinking) exist BOTH as a skill inside adzpreset AND as a standalonenpxpackage. The preset's SKILL.md is fully functional on its own (the whole methodology — modules + references — travels with it, and it auto-activates by description), and it's the only way to compile that capability to the non-Claude platforms (Codex/OpenCode/Hermes/OpenClaude) viadz. The npx package adds project-level runtime governance around the same skill: a slash command, governance rules, a context shard, and (for feature-adr) reward-learning +/harvest. So: pick the skill/preset for a working capability across platforms; pick the npx toolkit when you want it as a governed, command-driven fixture of one project.
All Commands (31)
dz setup --target <name> [--preset <name>] [--memory agentdb] [--no-hooks] [--install-driver] [--force]
dz init --target <name> [--preset <name>] [--select id,id,...] [--force]
dz install <npm-pkg> [--target <name>] [--project <dir>]
dz teach "<pattern>" [--reward <0-1>] [--domain <name>]
dz pretrain [--project <dir>]
dz recommend "<task description>"
dz compose <preset1+preset2+...> [--target <name>]
dz diff <skill-dir>
dz upgrade [--target <name>] [--project <dir>]
dz verify [--skills-dir <dir>] [--target <name>]
dz sync [--canonical <dir>] [--project <dir>] [--dry-run] [--force]
dz update (alias for sync)
dz list [--skills-dir <dir>]
dz info --id <skill-id> [--skills-dir <dir>]
dz create-skill --name <id> [--description <text>] [--tier 1|2|3] [--bto]
dz registry [search <query>] [--category <cat>]
dz benchmark <skill-dir> [--compare <dir>] [--all]
dz publish [--filter <name>] [--dry-run] [--bump-only]
dz auto-canonicalize --source <github-url> --pack <skills-pack>
dz sync-upstream [--package <dir>] [--list] [--all]
dz scout [--topics <list>] [--since <date>] [--deep]
dz workflow --task <name> [--dry-run]
dz plugin [--version <ver>]
dz downloads
dz migrate [--project <dir>]
dz stats
dz dashboard
dz doctor [--project <dir>]
dz roam [--apply] [--slug <slug>]
dz import-ecc [--local-path <dir>] [--select id,id,...] [--limit N] [--output <dir>] [--force]
dz helpTargets (5 platforms)
All 5 platforms natively support the agentskills.io SKILL.md format:
| Target | Skills directory | Native SKILL.md? |
|---|---|---|
claude-code |
.claude/skills/ |
Yes |
codex |
.agents/skills/ |
Yes (docs) |
opencode |
.opencode/skills/ |
Yes (also scans .claude/skills/) |
hermes |
.hermes/skills/ |
Yes |
openclaude |
.openclaude/skills/ |
Yes |
Same SKILL.md file, different directory — no format conversion needed.
Optional platform enrichment (skills work without these):
| Platform | Optional extra | What it adds |
|---|---|---|
| Codex | agents/openai.yaml |
UI metadata (icons, display_name, MCP deps) |
| OpenCode | opencode.json + .opencode/agents/*.md |
Config, custom agents |
| Hermes | cli-config.yaml |
Agent config, persona, memory |
Workflows (Opus 4.8+ dynamic workflows)
dz workflow --task coverage-lift # parallel coverage improvement
dz workflow --task mutation-kill # kill surviving mutants
dz workflow --task canonicalize # canonicalize new packages
dz workflow --task security-audit # adversarial security scanScout (ecosystem intelligence)
dz scout # quick scan — radar mode
dz scout --deep # deep analysis — AI analyst mode
dz scout --topics mcp-server,ai-agent # custom topics
dz scout --since 2026-05-01 # only recent reposRadar mode (dz scout) scans 9 sources in parallel (GitHub + npm + HN + MCP Registry + Glama + OSSInsight + Smithery + Semantic Scholar + arXiv):
- Detects skill format — SKILL.md, plugin.json, .claude/skills/, .claude-plugin/, MCP manifests
- Scores relevance — format (40%) + stars (30%) + recency (20%) + novelty (10%)
- Compares against our 31 packages — finds skills we don't have
- Recommends — integrate (score ≥70) / monitor (40-69 + ≥50 stars) / skip
Deep analyst mode (dz scout --deep) goes further for top-scored repos:
- Downloads SKILL.md from each repo, parses frontmatter + body
- Finds closest match in our inventory by keyword overlap
- Explains the delta — what the found skill adds that ours doesn't
- Recommends integration path:
- canonicalize — high-signal novel skill → new
@dzhechkov/skills-*pack - merge — similar to existing skill → add unique features to ours
- new-preset — novel skill → add to preset or create new pack
- skip — already in our inventory
- canonicalize — high-signal novel skill → new
- Gap analysis — identifies trending categories across the ecosystem that our harness lacks
Example deep analysis output:
## 🔬 Deep Analysis
### cool/agent-toolkit (★500)
2/3 skills are novel
| Skill | Description | Closest match | Integration | Rationale |
|-------|------------|---------------|-------------|-----------|
| code-review | Automated OWASP-focused review | brutal-honesty-review | **merge** | Similar to ours — merge OWASP checklist |
| deploy-check | Pre-deploy validation gates | — | **canonicalize** | High-signal novel skill (500 stars) |
## 📊 Harness Gap Analysis
| Category | Frequency | Recommendation |
|----------|-----------|---------------|
| deploy-automation | 12 repos | Create @dzhechkov/skills-devops — high demand |
| data-pipeline | 5 repos | Monitor — emerging trend |Powered by @dzhechkov/scout.
BTO integration (create-skill --bto)
# Scaffold a new skill with BTO-compatible 3-layer evaluation:
dz create-skill --name my-skill --bto
# What you get:
# evals/my-skill.yaml — BTO eval with L0/L1/L2 layers
# references/judge-rubrics.md — scoring rubrics for 3-judge panelThe --bto flag generates eval templates compatible with /bto-test:
| Layer | What | Gate |
|---|---|---|
| L0 | Deterministic checks (U1-U5 universal + S1-S14 skill-specific) | Pass rate >= 80% |
| L1 | Single LLM judge (Haiku) — 5 dimensions: Clarity, Completeness, Actionability, Quality, Anti-patterns | Average >= 7.0 |
| L2 | 3-judge panel (Sonnet) — Expert (0.40), Critic (0.30), Auditor (0.30) — 5 dimensions: Methodology, Depth, Correctness, Usability, Robustness | Weighted avg >= 7.0 |
After scaffolding, fill in the SKILL.md protocol and run /bto-test .claude/skills/my-skill to evaluate.
dz install — install skills from any npm package
# Install skills from any npm package directly
dz install @dzhechkov/skills-devops
dz install @dzhechkov/skills-web3 --target openclaude
dz install @lythos/skill-curator --target claude-codeRuns npm install, discovers SKILL.md files in the package, copies them to the target platform directory. Works with any agentskills.io-compatible npm package.
dz sync-upstream — check for upstream updates
dz sync-upstream --list # show packages with external sources
dz sync-upstream --all # check ALL packages against upstream
dz sync-upstream --package packages/@dzhechkov/skills-devops # check one packageDiscovers all skill packs with sources.json, fetches SKILL.md from origin repos, reports which skills have upstream changes.
dz upgrade — check installed skills for updates
dz upgrade # check .claude/skills/ against canonical
dz upgrade --target openclaude # check .openclaude/skills/Compares installed skills with canonical source, reports which need dz init --force to update.
dz downloads — npm weekly download stats
dz downloads # fetch weekly downloads for all 30 packagesdz benchmark — L0 quality gate
dz benchmark packages/@dzhechkov/skills-devops/terraform # single skill
dz benchmark packages/@dzhechkov/skills-devops --all # batch all
dz benchmark skill-a --compare skill-b # A/B compare19 deterministic checks (U1-U5 universal + S1-S14 skill-specific). Grade A = 95%+. For L1/L2 LLM judges, use /bto-test inside Claude Code.
dz publish — automated npm publish
dz publish --dry-run # preview what would publish
dz publish --filter skills-devops # publish specific package
dz publish --filter skills-devops --bump-only # bump version only, no publishdz auto-canonicalize — discover skills in GitHub repos
dz auto-canonicalize --source github.com/user/repo --pack packages/@dzhechkov/skills-devopsScans a GitHub repo for SKILL.md files, generates dz create-skill commands.
dz registry — searchable skill index
dz registry # visual panel: 115 skills in 6 categories
dz registry search security # fuzzy search
dz registry --category mcp # filter by categorydz stats + dz dashboard
dz stats # Quick metrics: packages, skills, targets, presets
dz dashboard # Visual panel with all packages, adapters, skill packsExample: Thesis Defense Preparation (Academic Preset)
# Install with AgentDB (remembers patterns across students):
dz setup --target claude-code --preset academic --memory agentdb
# Or lightweight:
# dz init --target claude-code --preset academicPrepare: Create a folder per student with thesis.pdf + review.pdf + external-review.pdf + antiplagiat.pdf.
Pre-defense (open Claude Code in student folder):
"Check document package completeness" → document-checker
"Analyze this thesis" → dissertation-review (format, criteria, grade)
"Generate 6 defense questions" → question-generator (basic → critical, page refs)During defense (feed live transcript via Whisper + VB-Cable):
"Analyze this defense transcript" → defense-evaluator (structure, coverage, delivery)
"Evaluate the student's answers" → answer-assessor (completeness, depth, reviewer alignment)| When | Skill | What it does |
|---|---|---|
| Before | document-checker |
Package completeness: thesis, reviews, antiplagiat |
| Before | dissertation-review |
ГЭК criteria, research/project format, grade 1-10, team project check |
| Before | question-generator |
4-6 questions with page refs and expected keywords |
| During | defense-evaluator |
Live transcript → structure, coverage, delivery quality |
| During | answer-assessor |
Q&A evaluation → completeness, depth, reviewer remarks |
Key features: Grade corridor, per-criterion 1-10 scoring, TO BE vs data detection, LTV/CAC > 10 warning, reviewer divergence, raise/lower conditions, compact mode (1-page справка: "компактная справка"), summary table across all students. With AgentDB, patterns persist.
Skills contain only evaluation criteria and methodology — no student data.
Batch mode: S3 archive → agent swarm
# Download and extract: each student = subfolder with .zip
curl -o students.zip "https://s3.example.com/bucket/students.zip"
mkdir students && cd students && 7z x ../students.zip
for f in *.zip; do mkdir -p "${f%.zip}" && cd "${f%.zip}" && 7z x "../$f" && cd ..; doneThen in Claude Code:
"For each student folder: run document-checker → dissertation-review → question-generator.
Save справка.md per student with clickable inline links to pages (стр. 45, разд. 2.3)
and external sources ([JTBD](https://hbr.org/...)). Run all students in parallel."With AgentDB, patterns persist across students — grading calibration improves with each analysis.
Example: Product Discovery with Design Thinking
# With self-learning (recommended — remembers HADI patterns, JTBD insights across sessions):
dz setup --target claude-code --preset meta
dz setup --target claude-code --preset meta --memory agentdb # + semantic search
# Or without self-learning:
dz init --target claude-code --preset meta
# Or individually:
dz setup --target claude-code --select design-thinkingThen in Claude Code:
"Design a mobile app for booking coworking spaces"
→ design-thinking skill activates
→ 6-phase protocol runs with complexity tier auto-selection6-Phase Protocol
Phase 1: EMPATHIZE → STOP gate: request user interview data + goap-research for market data
Phase 2: DEFINE → JTBD Canvas + CJM AS IS + Ishikawa root cause analysis
Phase 3: IDEATE → HADI hypotheses + Lean Canvas / Osterwalder BMC + GTM + Unit Economics
Phase 4: PROTOTYPE → MVP (fidelity spectrum) + CJM/VSM TO BE (labeled as hypotheses)
Phase 5: TEST → STOP gate: request usability test data + risk analysis + HADI validation
Phase 6: VALIDATE → Pilot with variance analysis: projected vs actual → Scale/Iterate/Pivot/KillComplexity Tiers (auto-selected)
| Tier | When | Phases | Integrations |
|---|---|---|---|
| S | Quick user insight | 1→2→5 | explore + goap-research |
| M | New feature | 1→2→3→4→5 | + frontend-design + six-thinking-hats |
| L | New product | 1→2→3→4→5→6 | + qcsd-swarm + reverse-engineering-unicorn |
| XL | Platform / ecosystem | All | All optional integrations (aqe init recommended) |
Key Safeguards
- Never fabricates data — STOP gates pause for real interview/survey/test data
- TO BE ≠ data — projections labeled as hypotheses, validated via pilot (Phase 6)
- LTV/CAC > 10 flagged as suspicious (Skok 2013)
- Loop-back protocol — Phase 5 can invalidate Phase 2 and return upstream
- 22 methodologies with academic validation tiers (Strong/Moderate/Practitioner/Weak)
- 12 validation rules (DT-001 through DT-012) enforce quality per tier
What's included vs what's optional
Core DT — the meta preset includes all required dependencies (15 skills):
dz setup --target claude-code --preset meta
# → explore, goap-research-ed25519, problem-solver-enhanced,
# design-thinking, feature-adr, knowledge-extractor,
# understand-anything-bridge, ... (15 total)Full DT — for ALL optional integrations, install agentic-qe:
npm install -g agentic-qe && aqe init --auto
# → 94 QE skills + 55 agents in .claude/skills/ and .claude/agents/
# → six-thinking-hats, qcsd-ideation-swarm, frontend-design, brutal-honesty-reviewOr cherry-pick: dz compose meta+keysarium for competitive analysis.
| Optional Skill | Source | What it adds |
|---|---|---|
frontend-design |
aqe init / keysarium |
HTML/React prototypes (Phase 4) |
six-thinking-hats |
aqe init |
Team ideation (Phase 3) |
qcsd-ideation-swarm |
aqe init |
9-agent quality risk (Phase 2-3) |
reverse-engineering-unicorn |
keysarium |
Competitor CJM+JTBD (Phase 1) |
Without optional skills, design-thinking uses built-in fallbacks.
BTO benchmark: L0 Grade A (100%), L2 Opus weighted 7.58/10.
Example: Import Skills from ECC
dz install @dzhechkov/skills-ecc # 20 curated ECC skills
dz import-ecc --limit 50 # import 50 from GitHub
dz import-ecc --local-path /path/to/ECC # from local clone (fast)
dz import-ecc --select docker-patterns,tdd # cherry-pickExample: Security Scan with AgentShield
# In Claude Code: "scan my agent config for security issues"
# → agentshield-scan skill activates (170 rules, 10 categories)
npx ecc-agentshield scan --format sarif # SARIF for GitHub Code ScanningExample: 4-Axis Risk Scoring
dz init --target codex --preset meta --enrich
# → agents/openai.yaml includes risk_level per skill
# Axes: base_tool + file_sensitivity + blast_radius + irreversibilityExample: Understand & Develop an Existing Project
# 1. Analyze project → get recommendations
dz pretrain # detects stack, recommends presets
dz recommend "work on this Node.js API" # suggests skills + toolkits
# 2. Install skills (choose your level)
dz setup --target claude-code --preset meta --memory agentdb # 15 skills (includes feature-adr)
dz setup --target claude-code --preset qe-engineer # + 20 QE skills
# Want the full feature-adr toolkit with /feature-adr command + governance?
npx @dzhechkov/skills-feature-adr init # adds slash command + rules + shards
# See: https://www.npmjs.com/package/@dzhechkov/skills-feature-adr
# preset = SKILL.md only (auto-activates on matching tasks)
# npx = full toolkit (slash command + governance + rules)Install Understand-Anything plugin, then in Claude Code:
# 3. Map the codebase
/understand # builds knowledge graph
# → understand-anything-bridge feeds architecture context to all skills
# 4. Develop with full context
"Add a payment module"
# → feature-adr runs with architecture awareness (layers, hot spots, dependencies)
# → see: https://www.npmjs.com/package/@dzhechkov/skills-feature-adr
# → code generation informed by real dependency graph
# → QE review targets tests at high-impact files
# → agentshield-scan checks new configs for security
# 5. Verify impact
"What files are affected by my changes?"
# → blast radius calculation → targeted test generationArchitecture-aware development: every skill knows the codebase structure.
Example: AI-Assisted Reasoning & Self-Improvement
# Auto-select reasoning strategy:
"Compare 3 architectures" → structured-reasoning: Tree-of-Thought (branches + scoring)
"Debug this test" → structured-reasoning: Chain-of-Thought (linear trace)
"We've been looping" → structured-reasoning: Reflection-Suppression (break loop)
# Self-review before delivering:
"Write a migration and verify" → reflection-loop: draft → critique → revise (max 3 rounds)
# Manage long sessions:
"Context is getting long" → context-window-management: checkpoint + prune + continue
# Learn from success:
"Extract this as a skill" → skill-crystallizer: trace → reusable SKILL.mdAll included in meta preset.
Self-Learning: JSONL vs AgentDB
DZ Harness supports two memory backends for self-learning:
dz setup --target claude-code --preset devops # JSONL (default, lightweight)
dz setup --target claude-code --preset devops --memory agentdb # AgentDB (vector memory)| Capability | JSONL (default) | AgentDB (--memory agentdb) |
|---|---|---|
| Session tracking | Append-only JSONL log | HNSW vector store (.rvf) |
| Pattern storage | dz teach → patterns.jsonl |
dz teach → .rvf + agentdb_pattern_store |
| Search | Keyword (grep) | Semantic (HNSW nearest-neighbor, cosine similarity) |
| Retrieval | Sequential scan | O(log n) approximate nearest neighbor |
| Self-learning | Frequency-based | 9 RL algorithms + Thompson Sampling bandit |
| Memory tiers | Flat file | 3-tier (working → short-term → long-term) |
| Reflexion | Reward scores (0-1) | Episodic memory (task + outcome + self-critique) |
| Causal reasoning | No | Cypher-like graph queries (X caused Y) |
| Skill composition | Manual (presets) | Bandit-picked skill chains (A→B→C) |
| Audit trail | No | Cryptographic attestation log |
| Size | ~0 KB | 4.6 MB (agentdb) |
| MCP tools | 0 | 41 tools (pattern, reflexion, causal, skill, hierarchy) |
| Dependencies | None | agentdb (optional, via npx) |
AgentDB self-learning algorithms
When using --memory agentdb, the following algorithms automatically tune search quality:
- Thompson Sampling — multi-armed bandit for ranking search results
- UCB1 (Upper Confidence Bound) — exploration-exploitation balancing
- EXP3 — adversarial bandit for non-stationary environments
- Softmax — temperature-based action selection
- Epsilon-Greedy — simple exploration with decay
- Gradient Bandit — preference-based action selection
- Contextual Bandit — context-aware ranking using features
- REINFORCE — policy gradient for complex reward landscapes
- PPO-lite — proximal policy optimization for stable learning
The bandit automatically selects the best algorithm for your usage pattern — no manual tuning needed.
How to enable AgentDB
# One command — everything is set up:
dz setup --target claude-code --preset devops --memory agentdbThis creates .dz/memory.rvf, registers the agentdb MCP server (41 tools), and configures session hooks. The agent can immediately use agentdb_pattern_store, agentdb_reflexion_recall, etc. — no additional dz init needed.
| Command | When to use |
|---|---|
dz setup --memory agentdb |
Recommended — full setup in one step |
dz init --select agentdb-memory |
Lightweight — only the SKILL.md guide (see below) |
What does dz init --select agentdb-memory actually do?
This is the lightweight path — it installs only the skill documentation, without configuring the backend:
Step 1: Auto-discovers agentdb-memory/ in skills-mcp package
Step 2: Copies to .claude/skills/agentdb-memory/
├── SKILL.md ← instructions for the agent
├── schemas/output.json
├── scripts/validate-config.json
└── evals/agentdb-memory.yaml
Step 3: Claude Code auto-discovers the skill from .claude/skills/
Step 4: When agent encounters a matching task, it reads SKILL.md
Step 5: SKILL.md teaches the agent WHICH tools to call and WHENWhat it does NOT do (unlike dz setup --memory agentdb):
- Does NOT create
.dz/memory.rvf - Does NOT register agentdb MCP server
- Does NOT configure session hooks
After dz init --select agentdb-memory, the user must manually add the MCP server:
claude mcp add agentdb -- npx agentdb@latest mcp startWhen this is useful:
- You already have agentdb installed separately and just want the skill guide
- You want to teach the agent about agentdb tools without committing to the full
.dz/infrastructure - You're in a team where agentdb is managed centrally but each developer needs the skill docs
How it works
dz initcompiles canonical skills from the agentskills.io standard into the target platform's layout- Writing is additive — existing files are never overwritten without
--force - All 5 platform adapters produce byte-identical output (ADR-005)
dz doctorruns 7 health checks (node version, adapters, config, SQLite, skills)dz migratedetects legacy keysarium/bto installations and recommends migration path
Use Cases
1. Short-term product research (one-off study)
Goal: Quickly research a product idea, competitors, market — get a structured report.
# Option A: via dz CLI
dz init --target claude-code --preset meta
# Then in Claude Code:
# /explore "Research the market for AI-powered code review tools"
# /feature-adr "Summarize findings into an ADR"
# Option B: via keysarium (full 7-phase pipeline)
npx @dzhechkov/keysarium init
# Then in Claude Code:
# /casarium "AI-powered code review tools — market analysis"
# → Phase 0: Discovery → Phase 1: Exploration → Phase 2: Paranoid Research
# → Phase 3: Solution Design → Phase 4: Architecture → Phase 5: PresentationWhat you get:
metapreset:/exploreclarifies the problem →/feature-adrstructures findings as ADR decisionskeysarium: full 7-phase pipeline with dream cycles, background workers, and presentation generation
Best for: Quick study (hours), competitive analysis, technology evaluation.
2. Long-term product research (evolving over time)
Goal: Continuously gather data, add new sources, and "recalculate" the product vision as insights accumulate.
# Install keysarium (research pipeline) + evidence-wiki (knowledge base)
npx @dzhechkov/keysarium init
# Copy evidence-wiki plugin into your project:
npx @dzhechkov/evidence-wiki # or git clone https://github.com/djd1m/evidence-wiki
npm install -g @dzhechkov/harness-cli
dz init --target claude-code --preset metaWorkflow — iterative research cycles with evidence wiki:
Week 1: /casarium "Product X — initial research"
→ researches/ directory created with findings
→ .keysarium/memory/ stores patterns + reward scores
/wiki-generate ← evidence-wiki
→ Scans researches/, ADRs, docs
→ Generates wiki/concepts/*.md (atomic pages with inline sources)
→ Builds wiki/graph.json (knowledge graph)
→ wiki/INDEX.md links everything
Week 2: Add new data → /casarium "Product X — update with Q2 metrics"
→ Memory recalls Week 1 patterns (reward-calibrated learning)
→ New findings merged with existing, conflicts resolved
/wiki-generate --check ← re-generates wiki
→ New concepts added, existing updated
→ Every claim verified: triple-pillar protocol requires N independent
typed sources (ADR + methodology + research)
→ Stale concepts flagged, broken evidence links detected
/triple-check wiki/concepts/pricing-model.md ← verify specific page
→ Checks that every factual claim has inline source citations
→ Flags unsupported statements
Week N: /casarium "Product X — pivot analysis after customer feedback"
→ Full history in memory layer + evidence wiki
→ /harvest extracts reusable knowledge patterns
→ /wiki-generate rebuilds the entire knowledge graph
→ Product vision "recalculated" — the wiki IS the living product modelThe evidence-wiki advantage:
| Without evidence-wiki | With evidence-wiki |
|---|---|
| Research in markdown files | Atomic concept pages with inline sources |
Findings scattered across researches/ |
Interlinked knowledge graph (graph.json) |
| "I think we decided X" | Every claim has a cited source (triple-pillar) |
| Hard to see what changed | /wiki-generate --check diffs the knowledge base |
| No verification | /triple-check enforces evidence discipline |
Key features for long-term research:
- Evidence wiki (
@dzhechkov/evidence-wiki): atomic concept pages where every factual claim carries inline sources; knowledge graph for cross-referencing; triple-pillar protocol (N independent typed sources per claim) - Reward-calibrated memory (
@dzhechkov/memoryReflexion): each checkpoint response trains the system — "ок" = excellent (1.0), feedback = good (0.7), rework = needs_work (0.3) - Agent SDK Dreaming: between sessions, patterns are consolidated and distilled
/harvest(knowledge-extractor skill): extracts reusable patterns from completed research intolib/templates- SQLite + FTS5 backend: scales to 100k+ records with full-text search across all research sessions
Best for: Product strategy over months, continuous market monitoring, evolving product vision with evidence-backed decisions.
3. Product research + working prototype
Goal: Research the product AND build a functional prototype.
Option A: Sequential — research first, then code
# Step 1: Install research + development presets
npx @dzhechkov/keysarium init
# OR:
dz init --target claude-code --preset keysarium
# Step 2: Research phase
# /casarium "SaaS platform for team retrospectives"
# → Phase 0-2: Discovery, Exploration, Paranoid Research
# → Phase 3: Solution Design (with CJM prototype)
# → Result: researches/<slug>/ with full analysis
# Step 3: Switch to development
dz init --target claude-code --preset feature-adr
# Step 4: Build using research outputs
# /feature-adr "Build the retrospective platform based on research in researches/<slug>/"
# → Step 0: Router classifies as L/XL
# → Step 1-5: Requirements, ADRs, DDD, Architecture (informed by research)
# → Step 6: Implementation plan
# → Step 7: Code generation (with /frontend-design for UI)
# → Step 8-9: QE review + fleet assessmentWhat you get: Research artifacts in researches/, then code in features/<slug>/ + actual repository changes. Research directly feeds into ADR decisions.
Option B: Parallel — research and code simultaneously with p-replicator
# Install the full product development toolkit
npx @dzhechkov/p-replicator init
# Single pipeline: research → requirements → prototype
# /replicate "SaaS platform for team retrospectives"
# → Reverse-engineers similar products (reverse-engineering-unicorn)
# → Generates SPARC PRD (sparc-prd-mini)
# → Validates requirements (requirements-validator)
# → Creates the project structure (pipeline-forge)
# → Builds the prototype (cc-toolkit-generator-enhanced)
# → Reviews with brutal honesty (brutal-honesty-review)What you get: A working prototype generated from research in a single /replicate pipeline run. Faster but less deep than Option A.
Comparison
| Aspect | Option A (Sequential) | Option B (p-replicator) |
|---|---|---|
| Research depth | Deep (7-phase keysarium) | Moderate (reverse-engineering) |
| Code quality | High (11-step feature-adr + QE) | Good (pipeline-forge + review) |
| Time | Days to weeks | Hours to days |
| Best for | Complex products, regulated domains | MVPs, hackathons, quick validation |
| Packages | keysarium + feature-adr preset |
p-replicator |
| Research artifacts | researches/ directory |
Embedded in PRD |
| Code artifacts | features/<slug>/ + repo changes |
Generated project |
Tip: For maximum rigor, combine both — use p-replicator for a quick prototype, then run /feature-adr --full-qe-extended on the generated code for production-grade quality engineering.
Status
v0.3.85 — published on npm. Also available as Claude Plugin. Part of DZ Harness Hub.
Claude Plugin
DZ Harness Hub is available as a Claude Code plugin:
# Via marketplace (when published):
claude plugin marketplace add djd1m/dz-harness-hub
claude plugin install dz-harness-hub@dz-harness-hub
# Or test locally:
claude --plugin-dir /path/to/dz-harness-hub
# Generate plugin manifest from current inventory:
dz plugin --version 0.3.85The .claude-plugin/ directory contains plugin.json + marketplace.json compatible with pi-claude-marketplace and skill-hub.
Related Projects
Skill sources
- agentic-qe — 20 QE skills + 55 agents (test generation, coverage, chaos, QCSD swarms)
- ECC — 20 curated skills (agent patterns, autonomous loops, docker, git workflows)
- AgentShield — Security scanning (170 rules for .claude/ configs)
- Understand-Anything — Codebase knowledge graph → architecture context
Platform & infrastructure
- AgentDB — Self-learning vector memory (
--memory agentdb, 41 MCP tools) - agentskills.io — Open standard for SKILL.md format (adopted by all 5 platforms)
- OpenAI Codex — 2nd target platform
- OpenCode — 3rd target platform (160K+ stars)
- Hermes Agent — 4th target platform
- OpenClaude — 5th target platform (28K+ stars)