Package Exports

@dzhechkov/harness-cli

Readme

@dzhechkov/harness-cli

The dz CLI — the main entry point to the DZ Harness Hub. Install AI skills for Claude Code, Codex, OpenCode, Hermes, OpenClaude from a single command.

Install

npm install -g @dzhechkov/harness-cli

User Journey — from install to mastery

All 23 commands mapped to a real workflow:

DISCOVER → INSTALL → USE → CREATE → IMPROVE → SHARE

Phase 1: Discover (what's available?)

npm install -g @dzhechkov/harness-cli    # install the CLI

dz help                                   # see all commands
dz pretrain                                # analyze project → auto-recommend skills/presets
dz recommend "build API and deploy to K8s" # task advisor — decomposes your task into skills
dz stats                                  # 28 packages, 59 skills, 5 targets, 10 presets
dz dashboard                              # visual panel — packages, adapters, skill packs
dz registry                               # browse all 59 skills by category
dz registry search kubernetes             # find specific skills
dz registry --category devops             # filter by domain
dz downloads                              # npm weekly download stats

Phase 2: Install (set up your workspace)

# Full setup with self-learning (recommended):
dz setup --target claude-code --preset devops  # pretrain + hooks + memory + skills

# Or just install skills (no learning):
dz init --target claude-code --preset devops   # 24 DevOps skills
dz init --target openclaude --preset web3      # 12 DeFi skills for OpenClaude
dz init --target codex --preset mcp            # 10 MCP skills for Codex

# Or pick individual skills:
dz init --target claude-code --select terraform,kubernetes,docker-compose

# Or install from any npm package:
dz install @dzhechkov/skills-devops            # npm install + copy skills

# Verify everything is correct:
dz verify                                       # structural validation
dz doctor                                       # 7 health checks
dz list                                         # show installed skills
dz info --id terraform                          # detailed info about a skill

Phase 3: Use (work with your agent)

# Now use Claude Code / Codex / OpenCode / Hermes normally.
# Skills are auto-discovered from the platform's skills directory.
# Example in Claude Code:
#   "Review this PR" → pr-review skill activates
#   "Design an API" → api-design skill activates
#   "Fix this CI" → ci-fix skill activates

Phase 4: Create (build your own skills)

# Scaffold a new skill:
dz create-skill --name my-skill --description "What it does" --tier 2

# With BTO-compatible eval templates:
dz create-skill --name my-skill --bto

# Benchmark your skill (aim for Grade A):
dz benchmark .claude/skills/my-skill           # single skill — 19 L0 checks
dz benchmark packages/@dzhechkov/skills-devops --all   # batch all
dz benchmark skill-a --compare skill-b          # A/B compare

# Find skills to canonicalize from the ecosystem:
dz scout                                        # scan 9 sources (GitHub, npm, HN, ...)
dz scout --deep                                 # deep analysis with SKILL.md parsing
dz auto-canonicalize --source github.com/user/repo --pack packages/@dzhechkov/skills-devops

Phase 5: Maintain (keep skills fresh)

# Check for upstream changes (canonicalized skills):
dz sync-upstream --list                                 # which packages have external sources?
dz sync-upstream --all                                  # check all against upstream
dz sync-upstream --package packages/@dzhechkov/skills-devops  # check one

# Check installed skills vs canonical:
dz upgrade                                      # shows which skills need update
dz upgrade --target openclaude                  # check specific platform

# Sync canonical to legacy layout:
dz sync                                         # canonical → project skills
dz migrate                                      # detect legacy installations

# Orchestrate dynamic workflows:
dz workflow --task coverage-lift                 # parallel coverage improvement
dz workflow --task security-audit               # adversarial security scan

# Cross-host state sync:
dz roam --apply                                 # sync agent state across machines

# Publish updated packages to npm:
dz publish --dry-run                            # preview
dz publish --filter skills-devops               # publish specific package
dz publish                                      # publish all changed packages

Presets vs Individual Skills

Approach	When to use	Example
Preset	Want a curated set of skills that work together	`dz init --target claude-code --preset keysarium`
--select	Want specific skills by name	`dz init --target claude-code --select explore,feature-adr`
Standalone npx	Want a full toolkit with its own CLI	`npx @dzhechkov/keysarium init`

Available Presets (10)

Preset	Skills	Description
`meta`	3	Development process (explore, feature-adr, knowledge-extractor)
`qe-engineer`	8	Quality engineering (test-gen, coverage, chaos, defect, ...)
`bto`	1	Build-Benchmark-Test-Optimize pipeline
`health`	8	Medical AI (diagnostics, drugs, labs, clinical decisions)
`keysarium`	9	Full research toolkit (feature-adr, presentation, reverse-eng)
`p-replicator`	10	AI product development (/replicate, SPARC PRD, pipeline-forge)
`feature-adr`	5	Feature pipeline (feature-adr, explore, frontend-design)
`devops`	24	DevOps skills (terraform, kubernetes, nginx, redis, graphql, playwright, ...)
`web3`	12	Web3/DeFi (quicknode, zerion, symbiosis, bankr, veil, neynar, ...)
`mcp`	13	MCP servers (brave-search, exa, gmail, notion, obsidian, git-mcp, ...)

Standalone Packages (install via npx, no dz CLI needed)

npx @dzhechkov/keysarium init              # full research toolkit
npx @dzhechkov/p-replicator init           # AI product development
npx @dzhechkov/health-advisor init         # medical AI (25 skills)
npx @dzhechkov/skills-bto init             # BTO benchmarking
npx @dzhechkov/skills-feature-adr init     # 11-step feature pipeline
npx @dzhechkov/skills-edu-site init        # gamified edu site generator
npx @dzhechkov/skills-transcript-site init # transcript → interactive site
npx @dzhechkov/skills-analyst-manual init  # 3-phase analyst composite

Difference: dz init --preset installs individual skills from .claude/skills/ source into a target platform tree. Standalone npx packages have their own CLI and install a complete toolkit with commands, rules, shards, and agents — a richer but self-contained experience.

All Commands (28)

dz setup             --target <name> [--preset <name>] [--no-hooks] [--no-memory] [--force]
dz init              --target <name> [--preset <name>] [--select id,id,...] [--force]
dz install           <npm-pkg> [--target <name>] [--project <dir>]
dz pretrain          [--project <dir>]
dz recommend         "<task description>"
dz compose           <preset1+preset2+...> [--target <name>]
dz diff              <skill-dir>
dz upgrade           [--target <name>] [--project <dir>]
dz verify            [--skills-dir <dir>] [--target <name>]
dz sync              [--canonical <dir>] [--project <dir>] [--dry-run] [--force]
dz update            (alias for sync)
dz list              [--skills-dir <dir>]
dz info              --id <skill-id> [--skills-dir <dir>]
dz create-skill      --name <id> [--description <text>] [--tier 1|2|3] [--bto]
dz registry          [search <query>] [--category <cat>]
dz benchmark         <skill-dir> [--compare <dir>] [--all]
dz publish           [--filter <name>] [--dry-run] [--bump-only]
dz auto-canonicalize --source <github-url> --pack <skills-pack>
dz sync-upstream     [--package <dir>] [--list] [--all]
dz scout             [--topics <list>] [--since <date>] [--deep]
dz workflow          --task <name> [--dry-run]
dz downloads
dz migrate           [--project <dir>]
dz stats
dz dashboard
dz doctor            [--project <dir>]
dz roam              [--apply] [--slug <slug>]
dz help

Targets (5 platforms)

Target	Skills directory
`claude-code`	`.claude/skills/`
`codex`	`.agents/skills/`
`opencode`	`.opencode/skills/`
`hermes`	`.hermes/skills/`
`openclaude`	`.openclaude/skills/`

Workflows (Opus 4.8+ dynamic workflows)

dz workflow --task coverage-lift     # parallel coverage improvement
dz workflow --task mutation-kill     # kill surviving mutants
dz workflow --task canonicalize      # canonicalize new packages
dz workflow --task security-audit    # adversarial security scan

Scout (ecosystem intelligence)

dz scout                              # quick scan — radar mode
dz scout --deep                       # deep analysis — AI analyst mode
dz scout --topics mcp-server,ai-agent # custom topics
dz scout --since 2026-05-01           # only recent repos

Radar mode (dz scout) scans **9 sources in parallel (GitHub + npm + HN + MCP Registry + Glama + OSSInsight + Smithery + Semantic Scholar + arXiv):

Detects skill format — SKILL.md, plugin.json, .claude/skills/, MCP manifests
Scores relevance — format (40%) + stars (30%) + recency (20%) + novelty (10%)
Compares against our 24 packages — finds skills we don't have
Recommends — integrate (score ≥70) / monitor (40-69 + ≥50 stars) / skip

Deep analyst mode (dz scout --deep) goes further for top-scored repos:

Downloads SKILL.md from each repo, parses frontmatter + body
Finds closest match in our inventory by keyword overlap
Explains the delta — what the found skill adds that ours doesn't
Recommends integration path:
- canonicalize — high-signal novel skill → new @dzhechkov/skills-* pack
- merge — similar to existing skill → add unique features to ours
- new-preset — novel skill → add to preset or create new pack
- skip — already in our inventory
Gap analysis — identifies trending categories across the ecosystem that our harness lacks

Example deep analysis output:

## 🔬 Deep Analysis

### cool/agent-toolkit (★500)
2/3 skills are novel

| Skill | Description | Closest match | Integration | Rationale |
|-------|------------|---------------|-------------|-----------|
| code-review | Automated OWASP-focused review | brutal-honesty-review | **merge** | Similar to ours — merge OWASP checklist |
| deploy-check | Pre-deploy validation gates | — | **canonicalize** | High-signal novel skill (500 stars) |

## 📊 Harness Gap Analysis

| Category | Frequency | Recommendation |
|----------|-----------|---------------|
| deploy-automation | 12 repos | Create @dzhechkov/skills-devops — high demand |
| data-pipeline | 5 repos | Monitor — emerging trend |

BTO integration (create-skill --bto)

# Scaffold a new skill with BTO-compatible 3-layer evaluation:
dz create-skill --name my-skill --bto

# What you get:
#   evals/my-skill.yaml       — BTO eval with L0/L1/L2 layers
#   references/judge-rubrics.md — scoring rubrics for 3-judge panel

The --bto flag generates eval templates compatible with /bto-test:

Layer	What	Gate
L0	Deterministic checks (U1-U5 universal + S1-S10 skill-specific)	Pass rate >= 80%
L1	Single LLM judge (Haiku) — 5 dimensions: Clarity, Completeness, Actionability, Quality, Anti-patterns	Average >= 7.0
L2	3-judge panel (Sonnet) — Expert (0.40), Critic (0.30), Auditor (0.30) — 5 dimensions: Methodology, Depth, Correctness, Usability, Robustness	Weighted avg >= 7.0

After scaffolding, fill in the SKILL.md protocol and run /bto-test .claude/skills/my-skill to evaluate.

dz install — install skills from any npm package

# Install skills from any npm package directly
dz install @dzhechkov/skills-devops
dz install @dzhechkov/skills-web3 --target openclaude
dz install @lythos/skill-curator --target claude-code

Runs npm install, discovers SKILL.md files in the package, copies them to the target platform directory. Works with any agentskills.io-compatible npm package.

dz sync-upstream — check for upstream updates

dz sync-upstream --list                                    # show packages with external sources
dz sync-upstream --all                                     # check ALL packages against upstream
dz sync-upstream --package packages/@dzhechkov/skills-devops  # check one package

Discovers all skill packs with sources.json, fetches SKILL.md from origin repos, reports which skills have upstream changes.

dz upgrade — check installed skills for updates

dz upgrade                           # check .claude/skills/ against canonical
dz upgrade --target openclaude       # check .openclaude/skills/

Compares installed skills with canonical source, reports which need dz init --force to update.

dz downloads — npm weekly download stats

dz downloads     # fetch weekly downloads for all 28 packages

dz benchmark — L0 quality gate

dz benchmark packages/@dzhechkov/skills-devops/terraform     # single skill
dz benchmark packages/@dzhechkov/skills-devops --all          # batch all
dz benchmark skill-a --compare skill-b                        # A/B compare

19 deterministic checks (U1-U5 universal + S1-S14 skill-specific). Grade A = 95%+. For L1/L2 LLM judges, use /bto-test inside Claude Code.

dz publish — automated npm publish

dz publish --dry-run                          # preview what would publish
dz publish --filter skills-devops             # publish specific package
dz publish --filter skills-devops --bump-only # bump version only, no publish

dz auto-canonicalize — discover skills in GitHub repos

dz auto-canonicalize --source github.com/user/repo --pack packages/@dzhechkov/skills-devops

Scans a GitHub repo for SKILL.md files, generates dz create-skill commands.

dz registry — searchable skill index

dz registry                    # visual panel: 59 skills in 5 categories
dz registry search security    # fuzzy search
dz registry --category mcp     # filter by category

dz stats + dz dashboard

dz stats        # Quick metrics: packages, skills, targets, presets
dz dashboard    # Visual panel with all packages, adapters, skill packs

How it works

dz init compiles canonical skills from the agentskills.io standard into the target platform's layout
Writing is additive — existing files are never overwritten without --force
All 5 platform adapters produce byte-identical output (ADR-005)
dz doctor runs 7 health checks (node version, adapters, config, SQLite, skills)
dz migrate detects legacy keysarium/bto installations and recommends migration path

Use Cases

1. Short-term product research (one-off study)

Goal: Quickly research a product idea, competitors, market — get a structured report.

# Option A: via dz CLI
dz init --target claude-code --preset meta
# Then in Claude Code:
#   /explore "Research the market for AI-powered code review tools"
#   /feature-adr "Summarize findings into an ADR"

# Option B: via keysarium (full 7-phase pipeline)
npx @dzhechkov/keysarium init
# Then in Claude Code:
#   /casarium "AI-powered code review tools — market analysis"
#   → Phase 0: Discovery → Phase 1: Exploration → Phase 2: Paranoid Research
#   → Phase 3: Solution Design → Phase 4: Architecture → Phase 5: Presentation

What you get:

meta preset: /explore clarifies the problem → /feature-adr structures findings as ADR decisions
keysarium: full 7-phase pipeline with dream cycles, background workers, and presentation generation

Best for: Quick study (hours), competitive analysis, technology evaluation.

2. Long-term product research (evolving over time)

Goal: Continuously gather data, add new sources, and "recalculate" the product vision as insights accumulate.

# Install keysarium (research pipeline) + evidence-wiki (knowledge base)
npx @dzhechkov/keysarium init
# Copy evidence-wiki plugin into your project:
npx @dzhechkov/evidence-wiki   # or git clone https://github.com/djd1m/evidence-wiki

npm install -g @dzhechkov/harness-cli
dz init --target claude-code --preset meta

Workflow — iterative research cycles with evidence wiki:

Week 1:  /casarium "Product X — initial research"
         → researches/ directory created with findings
         → .keysarium/memory/ stores patterns + reward scores

         /wiki-generate                              ← evidence-wiki
         → Scans researches/, ADRs, docs
         → Generates wiki/concepts/*.md (atomic pages with inline sources)
         → Builds wiki/graph.json (knowledge graph)
         → wiki/INDEX.md links everything

Week 2:  Add new data → /casarium "Product X — update with Q2 metrics"
         → Memory recalls Week 1 patterns (reward-calibrated learning)
         → New findings merged with existing, conflicts resolved

         /wiki-generate --check                      ← re-generates wiki
         → New concepts added, existing updated
         → Every claim verified: triple-pillar protocol requires N independent
           typed sources (ADR + methodology + research)
         → Stale concepts flagged, broken evidence links detected

         /triple-check wiki/concepts/pricing-model.md ← verify specific page
         → Checks that every factual claim has inline source citations
         → Flags unsupported statements

Week N:  /casarium "Product X — pivot analysis after customer feedback"
         → Full history in memory layer + evidence wiki
         → /harvest extracts reusable knowledge patterns
         → /wiki-generate rebuilds the entire knowledge graph
         → Product vision "recalculated" — the wiki IS the living product model

The evidence-wiki advantage:

Without evidence-wiki	With evidence-wiki
Research in markdown files	Atomic concept pages with inline sources
Findings scattered across `researches/`	Interlinked knowledge graph (`graph.json`)
"I think we decided X"	Every claim has a cited source (triple-pillar)
Hard to see what changed	`/wiki-generate --check` diffs the knowledge base
No verification	`/triple-check` enforces evidence discipline

Key features for long-term research:

Evidence wiki (@dzhechkov/evidence-wiki): atomic concept pages where every factual claim carries inline sources; knowledge graph for cross-referencing; triple-pillar protocol (N independent typed sources per claim)
Reward-calibrated memory (@dzhechkov/memory Reflexion): each checkpoint response trains the system — "ок" = excellent (1.0), feedback = good (0.7), rework = needs_work (0.3)
Agent SDK Dreaming: between sessions, patterns are consolidated and distilled
/harvest (knowledge-extractor skill): extracts reusable patterns from completed research into lib/ templates
SQLite + FTS5 backend: scales to 100k+ records with full-text search across all research sessions

Best for: Product strategy over months, continuous market monitoring, evolving product vision with evidence-backed decisions.

3. Product research + working prototype

Goal: Research the product AND build a functional prototype.

Option A: Sequential — research first, then code

# Step 1: Install research + development presets
npx @dzhechkov/keysarium init
# OR:
dz init --target claude-code --preset keysarium

# Step 2: Research phase
#   /casarium "SaaS platform for team retrospectives"
#   → Phase 0-2: Discovery, Exploration, Paranoid Research
#   → Phase 3: Solution Design (with CJM prototype)
#   → Result: researches/<slug>/ with full analysis

# Step 3: Switch to development
dz init --target claude-code --preset feature-adr

# Step 4: Build using research outputs
#   /feature-adr "Build the retrospective platform based on research in researches/<slug>/"
#   → Step 0: Router classifies as L/XL
#   → Step 1-5: Requirements, ADRs, DDD, Architecture (informed by research)
#   → Step 6: Implementation plan
#   → Step 7: Code generation (with /frontend-design for UI)
#   → Step 8-9: QE review + fleet assessment

What you get: Research artifacts in researches/, then code in features/<slug>/ + actual repository changes. Research directly feeds into ADR decisions.

Option B: Parallel — research and code simultaneously with p-replicator

# Install the full product development toolkit
npx @dzhechkov/p-replicator init

# Single pipeline: research → requirements → prototype
#   /replicate "SaaS platform for team retrospectives"
#   → Reverse-engineers similar products (reverse-engineering-unicorn)
#   → Generates SPARC PRD (sparc-prd-mini)
#   → Validates requirements (requirements-validator)
#   → Creates the project structure (pipeline-forge)
#   → Builds the prototype (cc-toolkit-generator-enhanced)
#   → Reviews with brutal honesty (brutal-honesty-review)

What you get: A working prototype generated from research in a single /replicate pipeline run. Faster but less deep than Option A.

Comparison

Aspect	Option A (Sequential)	Option B (p-replicator)
Research depth	Deep (7-phase keysarium)	Moderate (reverse-engineering)
Code quality	High (11-step feature-adr + QE)	Good (pipeline-forge + review)
Time	Days to weeks	Hours to days
Best for	Complex products, regulated domains	MVPs, hackathons, quick validation
Packages	`keysarium` + `feature-adr` preset	`p-replicator`
Research artifacts	`researches/` directory	Embedded in PRD
Code artifacts	`features/<slug>/` + repo changes	Generated project

Tip: For maximum rigor, combine both — use p-replicator for a quick prototype, then run /feature-adr --full-qe-extended on the generated code for production-grade quality engineering.

Status

v0.3.24 — published on npm. Part of DZ Harness Hub.