Package Exports
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@zhuerle/terminus-2-cli) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
terminus-2-cli
Standalone Node.js port of Harbor's Terminus2 agent — runs entirely inside
the task container. Same loop semantics as the Python terminus-2, but the
agent process and the tmux it drives now live on the same side of the docker
boundary, eliminating per-command docker exec round-trips.
npm install -g @zhuerle/terminus-2-cliMotivation
Harbor's terminus-2 agent (Python) orchestrates from the host: every shell
command the model issues is delivered into the task container via
docker exec round-trip, and every tmux pane capture is another docker exec
round-trip. On reasoning-heavy benchmarks that's tens of round-trips per
trial; the cost adds up to ~12-18 seconds per step of pure infra overhead
on top of the LLM call itself.
terminus-2-cli keeps the loop, parsers, prompts, and trajectory format
identical, but runs inside the task container as a single npm install-able
Node.js binary. Tool execution becomes direct tmux send-keys against the
local pane — no IPC, no docker socket — bringing pure infra overhead down to
~6.8 seconds per step (~2.8× faster on the architecture layer).
The trajectory format is ATIF-v1.6 (forward-compatible superset of t2's
ATIF-v1.2), so any tooling that consumes a t2 trajectory.json reads the
cli's output unchanged.
Performance comparison vs terminus-2
Two model families, same tasks, same gateway, aligned thinking config.
claude-opus-4-7 on terminal-bench-2 (89 tasks, n=20 concurrent, k=1)
Anthropic Messages API via
api-gateway.glm.ai, adaptive thinking +display:summarized+output_config.effort:max,max_new_tokens=32768,max_input_tokens=300000,max_turns=120,--agent-timeout 7200s, 4 cpus / 8 GB per trial. The 85 shared tasks both implementations finished are reported below.
| t2-cli (this) | t2 (Python) | Δ | |
|---|---|---|---|
| Accuracy | 70/85 = 0.824 | 61/85 = 0.718 | +10.6 pp |
| Per-trial wall | 828 s | 1091 s | cli −24 % |
| Per-step wall | 25.3 s | 31.0 s | cli −18 % |
| LLM-call wall (gateway-bound, identical for both) | 18.0 s | 15.8 s | tied |
Pure infra wall / step (step − llm_mean) |
6.8 s | 18.8 s | cli 2.8× faster |
| Pure infra wall / trial | 303 s | 531 s | cli saves 228 s/trial |
| Cross-tab cli-only ✓ / t2-only ✓ | 10 / 1 | — | cli net +9 tasks |
GLM-4.8 on terminal-bench-2 (89 tasks, n=180, k=4 → 356 trials)
SGLang OpenAI-compatible endpoint, no thinking. Both implementations hit the same endpoint at the same time.
| t2-cli v0.0.3 | t2 (Python) | Δ | |
|---|---|---|---|
| Accuracy | 0.500 (178/356) | 0.475 (169/356) | +2.5 pp |
| Per-trial wall | 1830 s | 2227 s | cli −18 % |
| Per-step wall | 51.3 s | 58.5 s | cli −14 % |
The architectural advantage shows up larger on opus-4-7 because the
adaptive-thinking LLM call is much heavier per turn — every saved
docker-exec round-trip translates more directly into wall savings, and the
extra max_turns headroom converts directly into accuracy on the
"mark_task_complete near the cap" tasks.
Full report: /workspace/swe-data/zhuerle/perf_compare/REPORT_CLI_VS_T2_中文综合.md.
What's ported
- Prompt templates (
prompts/terminus-{xml,json}-plain.txt, copied verbatim from the Python repo) - Both parsers (XML plain + JSON plain), with the same warnings and auto-fixes
- tmux session driver (
script(1)-allocated PTY →tmux new-session -d), send-key chunking under the ~16 KB tmux command-buffer limit - Main agent loop with batched send + capture, observation feedback,
task_completeconfirmation, parser-error re-prompt - Per-step trajectory in
ATIF-v1.6shape, written to<logs-dir>/trajectory.json - 3-subagent context summarization (writes sibling
trajectory.summarization-N-{summary,questions,answers}.jsonfiles, pivot-able from the parent trajectory's compaction marker)
Anthropic Messages API support
cli auto-detects Claude models by name (/^claude/i) and switches to
POST /v1/messages with:
- adaptive thinking (
thinking:{type:"adaptive", display:"summarized"}), withoutput_config.efforthonored - multi-turn signature roundtrip (each thinking block's
signatureis threaded back into the next request, so multi-turn behaviour matches Anthropic's expectation) - ephemeral prompt caching with a single breakpoint at
messages[len-3](openhands pattern, stays under Anthropic's 4-breakpoint cap) anthropic-beta: interleaved-thinking-2025-05-14header- per-call DEBUG dump to
<logs-dir>/anthropic_raw_calls.jsonl(request body / SSE event histogram / response usage; toggle withdebug_anthropic_raw_calls=false)
Switch heuristic:
model |
use_anthropic (default) |
|---|---|
claude-* |
true (Anthropic Messages API) |
claude-*-max |
true + auto-switches to enabled thinking + budget_tokens=6144 |
| anything else | false (OpenAI/SGLang chat-completions) |
Force either way with --agent-kwarg use_anthropic=true|false.
What's not ported
- Tokenizer-based exact token counting (we trust the API's
usageblock; the proactive summarize check uses a chars/4 estimate when the model doesn't expose a tokenizer) - Asciinema recording, skill discovery, subagent metrics, linear-history splitting, output-length salvage
Requirements
- Node.js >= 18.17 (uses built-in
fetchandnode:util.parseArgs) tmuxandscriptavailable on PATH inside the container- An OpenAI-compatible chat-completions endpoint, or an Anthropic
Messages-compatible endpoint (
api.anthropic.com,api-gateway.glm.ai, any/v1/messagesshim)
Standalone usage
cat > /tmp/cfg.json <<EOF
{
"model_name": "claude-opus-4-7",
"parser_name": "xml",
"max_turns": 120,
"max_new_tokens": 32768,
"model_info": {"max_input_tokens": 300000},
"interleaved_thinking": true,
"anthropic_output_effort": "max",
"anthropic_thinking_display": "summarized"
}
EOF
echo "Print 'hello world' to the terminal and stop." > /tmp/instruction.txt
mkdir -p /tmp/agent-logs
ANTHROPIC_BASE_URL=https://api.anthropic.com \
ANTHROPIC_AUTH_TOKEN=$ANTHROPIC_API_KEY \
terminus-2-cli run \
--config /tmp/cfg.json \
--instruction /tmp/instruction.txt \
--logs-dir /tmp/agent-logs \
--session-id demoAfter the run:
/tmp/agent-logs/trajectory.json— per-step ATIF-v1.6 trajectory/tmp/agent-logs/anthropic_raw_calls.jsonl— per-call request/response (Claude path only)/tmp/agent-logs/context.json— final token / cost counters/tmp/agent-logs/exception.txt— only present on failure
Config schema
The --config JSON accepts both Python-style snake_case and JS-style
camelCase keys; CLI flags override config values.
| Key | Type | Notes |
|---|---|---|
model_name |
string | required (or pass --model) |
api_base |
string | OpenAI-compatible base URL |
api_key |
string | falls back to OPENAI_API_KEY env |
parser_name |
"xml" | "json" |
default "xml" |
max_turns |
number | default 1,000,000 (i.e. unbounded) |
temperature |
number | default 0.7; auto-omitted when Claude+thinking is on |
top_p |
number | optional |
max_new_tokens |
number | optional |
reasoning_effort |
string | OpenAI-style |
max_thinking_tokens |
number | Anthropic thinking.budget_tokens |
interleaved_thinking |
bool | default false; on enables adaptive thinking on Claude |
use_anthropic |
bool | force Anthropic path on/off (default: auto by model name) |
anthropic_output_effort |
string | "high" / "max" (sets output_config.effort) |
anthropic_thinking_display |
string | e.g. "summarized" (gateway redaction control) |
anthropic_cache_control |
bool | default true; ephemeral cache breakpoint at msg[len-3] |
debug_anthropic_raw_calls |
bool | default true on Claude; per-call request/response JSONL |
llm_request_timeout_sec |
number | default 600 |
llm_extra_body |
object | merged into the request body |
llm_headers |
object | extra HTTP headers |
tmux_pane_width / tmux_pane_height |
number | default 160×40 |
model_info.max_input_tokens |
number | proactive-summarize threshold |
enable_summarize |
bool | default true |
summarization_mode |
"truncate" | "summarize" |
default "summarize" |
proactive_summarization_threshold |
number | default 8000 (tokens of headroom) |
Running under harbor
The companion host wrapper at
src/harbor/agents/installed/terminus_2_cli.py registers this CLI as
harbor's terminus-2-cli agent. The host wrapper handles tokenizer
staging and, at install time, picks the best install path:
- Local tarball (set
TERMINUS_2_CLI_LOCAL_DIR=/path/to/sourcefor dev) - Public npm registry (
npm install -g @zhuerle/terminus-2-cli@<version>) git+https://github.com/...fallback (legacy)
Reference launcher: scripts/run_terminus_2_cli.sh in the harbor
repository — env-var-driven and copy-pasteable. All harbor-level kwargs
(--agent-kwarg max_turns=120, etc.) are forwarded into the config JSON.
Tests
node --test test/54+ unit tests cover: parsers, tmux session lifecycle, agent loop, three subagent compaction flow, Anthropic SSE parsing, signature roundtrip, cache_control breakpoint placement, model-name auto-routing, raw-call DEBUG logging.