JSPM

  • Created
  • Published
  • Downloads 68
  • Score
    100M100P100Q100217F
  • License MIT

OpenCode plugin that optimizes token usage by pruning obsolete tool outputs - Aggressive fork with Head-Tail Truncation, Read Consolidation, Prune Thinking, and Placeholder Compression strategies

Package Exports

  • @tuanhung303/opencode-acp
  • @tuanhung303/opencode-acp/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@tuanhung303/opencode-acp) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

OpenCode Agent Context Pruning (ACP)

npm version

Automatically reduces token usage in OpenCode by removing obsolete tools from conversation history.

⚡ Enhanced fork with aggressive default settings and four new pruning strategies: Prune Thinking, Head-Tail Truncation, Read Consolidation, and Placeholder Compression.

ACP in action

Installation

Add to your OpenCode config:

// opencode.jsonc
{
    "plugin": ["@tuanhung303/opencode-acp@latest"],
}

Using @latest ensures you always get the newest version automatically when OpenCode starts.

Restart OpenCode. The plugin will automatically start optimizing your sessions.

How Pruning Works

DCP uses multiple tools and strategies to reduce context size:

Tools

Discard — Exposes a discard tool that the AI can call to remove completed or noisy tool content from context.

Extract — Exposes an extract tool that the AI can call to distill valuable context into concise summaries before removing the tool content.

Strategies

Deduplication — Identifies repeated tool calls (e.g., reading the same file multiple times) and keeps only the most recent output. Runs automatically on every request with zero LLM cost.

Supersede Writes — Prunes write tool inputs for files that have subsequently been read. When a file is written and later read, the original write content becomes redundant since the current file state is captured in the read result. Runs automatically on every request with zero LLM cost. ⚡ Enabled by default in this fork.

Purge Errors — Prunes tool inputs for tools that returned errors after a configurable number of turns (default: 2). Error messages are preserved for context, but the potentially large input content is removed. Runs automatically on every request with zero LLM cost. ⚡ Reduced from 4 to 2 turns in this fork.

Prune Thinking ⚡ NEW — Removes extended thinking tokens (<thinking> blocks, OpenAI reasoning fields) from older assistant messages. Thinking tokens consume significant context but provide no utility after the response is generated. Preserves recent turns for cache efficiency. Runs automatically with zero LLM cost.

Read Consolidation ⚡ NEW — When the same tool+parameter is called multiple times (e.g., reading the same file), older outputs are replaced with pointers to the newer output. The newest call keeps its full content since it has the current state. Different from deduplication: keeps both calls visible as breadcrumbs. Runs automatically with zero LLM cost.

Example:

Older read: [📍 See later read at message #15. This output is stale.]
Newer read: [Full file content preserved here]

Head-Tail Truncation ⚡ NEW — Preserves the first 20% (head) and last 30% (tail) of tool outputs, removing the middle section. Based on the "Lost in the Middle" phenomenon where LLMs perform best with information at the beginning and end. Preserves real content unlike placeholder compression. Runs automatically with zero LLM cost.

Example:

Before: [1000 tokens of file content]
After:  [First 200 tokens...]

[...📄 500 tokens truncated...]

[...Last 300 tokens]

Tool-specific icons: 📄 File content, 🔍 Search results, 💻 Command output, 🌐 Web content, 📊 Data/Excel, 📋 Default

Placeholder Compression ⚡ (Disabled by default) — Replaces verbose tool outputs with actionable placeholder hints while preserving the tool call structure (name + input) as breadcrumbs. Superseded by Head-Tail Truncation which preserves real content instead of generic hints. Can be re-enabled if you prefer minimal placeholders.

Your session history is never modified—DCP replaces pruned content with placeholders before sending requests to your LLM.

Changes from Upstream

Setting Upstream This Fork Rationale
purgeErrors.turns 4 2 Errors rarely useful after 2 turns
nudgeFrequency 10 5 More frequent prune reminders
supersedeWrites.enabled false true Safe with read-after-write pattern
pruneThinking.enabled N/A true Strip ephemeral thinking tokens
readConsolidation.enabled N/A true Older reads point to newer reads
headTailTruncation.enabled N/A true Keep 50% of content (head+tail)
placeholderCompression.enabled N/A false Superseded by head-tail truncation

Impact on Prompt Caching

LLM providers like Anthropic and OpenAI cache prompts based on exact prefix matching. When DCP prunes a tool output, it changes the message content, which invalidates cached prefixes from that point forward.

Trade-off: You lose some cache read benefits but gain larger token savings from reduced context size and performance improvements through reduced context poisoning. In most cases, the token savings outweigh the cache miss cost—especially in long sessions where context bloat becomes significant.

Note: In testing, cache hit rates were approximately 65% with DCP enabled vs 85% without.

Best use case: Providers that count usage in requests, such as Github Copilot and Google Antigravity have no negative price impact.

Configuration

DCP uses its own config file:

  • Global: ~/.config/opencode/dcp.jsonc (or dcp.json), created automatically on first run
  • Custom config directory: $OPENCODE_CONFIG_DIR/dcp.jsonc (or dcp.json), if OPENCODE_CONFIG_DIR is set
  • Project: .opencode/dcp.jsonc (or dcp.json) in your project's .opencode directory
Default Configuration (click to expand)
{
    "$schema": "https://raw.githubusercontent.com/tuanhung303/opencode-agent-context-pruning/master/dcp.schema.json",
    // Enable or disable the plugin
    "enabled": true,
    // Enable debug logging to ~/.config/opencode/logs/dcp/
    "debug": false,
    // Notification display: "off", "minimal", or "detailed"
    "pruneNotification": "detailed",
    // Slash commands configuration
    "commands": {
        "enabled": true,
        // Additional tools to protect from pruning via commands (e.g., /dcp sweep)
        "protectedTools": [],
    },
    // Protect from pruning for <turns> message turns
    "turnProtection": {
        "enabled": false,
        "turns": 4,
    },
    // Protect file operations from pruning via glob patterns
    // Patterns match tool parameters.filePath (e.g. read/write/edit)
    "protectedFilePatterns": [],
    // LLM-driven context pruning tools
    "tools": {
        // Shared settings for all prune tools
        "settings": {
            // Nudge the LLM to use prune tools (every <nudgeFrequency> tool results)
            "nudgeEnabled": true,
            "nudgeFrequency": 5, // ⚡ Changed from 10
            // Additional tools to protect from pruning
            "protectedTools": [],
        },
        // Removes tool content from context without preservation (for completed tasks or noise)
        "discard": {
            "enabled": true,
        },
        // Distills key findings into preserved knowledge before removing raw content
        "extract": {
            "enabled": true,
            // Show distillation content as an ignored message notification
            "showDistillation": false,
        },
    },
    // Automatic pruning strategies
    "strategies": {
        // Remove duplicate tool calls (same tool with same arguments)
        "deduplication": {
            "enabled": true,
            // Additional tools to protect from pruning
            "protectedTools": [],
        },
        // Prune write tool inputs when the file has been subsequently read
        "supersedeWrites": {
            "enabled": true, // ⚡ Changed from false
        },
        // Prune tool inputs for errored tools after X turns
        "purgeErrors": {
            "enabled": true,
            // Number of turns before errored tool inputs are pruned
            "turns": 2, // ⚡ Changed from 4
            // Additional tools to protect from pruning
            "protectedTools": [],
        },
        // ⚡ NEW: Remove extended thinking tokens from older messages
        "pruneThinking": {
            "enabled": true,
            // Turns to wait before pruning (preserves cache)
            "delayTurns": 1,
        },
        // ⚡ NEW: Consolidate duplicate reads - older outputs point to newer
        "readConsolidation": {
            "enabled": true,
            // Tools to track for consolidation
            "tools": ["read", "glob", "grep", "webfetch", "bash"],
        },
        // ⚡ NEW: Keep first 20% + last 30% of outputs, truncate middle
        "headTailTruncation": {
            "enabled": true,
            // Turns to wait before truncating
            "delayTurns": 2,
            // Ratio of output to keep from beginning
            "headRatio": 0.2,
            // Ratio of output to keep from end
            "tailRatio": 0.3,
            // Additional tools to protect from truncation
            "protectedTools": [],
        },
        // Replace verbose outputs with placeholder hints (disabled by default)
        // Superseded by headTailTruncation which preserves real content
        "placeholderCompression": {
            "enabled": false, // ⚡ Disabled - use headTailTruncation instead
            // Turns to wait before compressing
            "delayTurns": 2,
            // Only compress outputs larger than this token count
            "minOutputTokens": 100,
            // Additional tools to protect from compression
            "protectedTools": [],
        },
    },
}

New Strategy: Prune Thinking

Removes extended thinking content from assistant messages after a configurable delay:

  • Anthropic: Removes type: "thinking" content blocks
  • OpenAI: Removes reasoning field from messages
  • Fallback: Strips <thinking>...</thinking> tags from text
"pruneThinking": {
    "enabled": true,
    "delayTurns": 1  // Keep current turn for cache, prune older
}

New Strategy: Placeholder Compression

Replaces verbose tool outputs with actionable hints while preserving breadcrumbs:

Tool Placeholder Example
read [File read previously. Read again if needed: /path/to/file.ts]
grep [Content search completed for: pattern. Search again if needed]
bash [Command executed: npm test. Re-run if needed]
webfetch [URL fetched: https://example.com. Fetch again if needed]

Protected tools (never compressed): write, edit, todowrite, todoread, discard, extract, task, question, batch, skill

"placeholderCompression": {
    "enabled": true,
    "delayTurns": 2,        // Wait 2 turns before compressing
    "minOutputTokens": 100, // Only compress large outputs
    "protectedTools": []    // Add custom protected tools
}

Commands

DCP provides a /dcp slash command:

  • /dcp — Shows available DCP commands
  • /dcp context — Shows a breakdown of your current session's token usage by category (system, user, assistant, tools, etc.) and how much has been saved through pruning.
  • /dcp stats — Shows cumulative pruning statistics across all sessions.
  • /dcp sweep — Prunes all tools since the last user message. Accepts an optional count: /dcp sweep 10 prunes the last 10 tools. Respects commands.protectedTools.

Turn Protection

When enabled, turn protection prevents tool outputs from being pruned for a configurable number of message turns. This gives the AI time to reference recent tool outputs before they become prunable. Applies to both discard and extract tools, as well as automatic strategies.

Protected Tools

By default, these tools are always protected from pruning across all strategies: task, todowrite, todoread, discard, extract, batch, write, edit

The protectedTools arrays in each section add to this default list.

Config Precedence

Settings are merged in order: Defaults → Global (~/.config/opencode/dcp.jsonc) → Config Dir ($OPENCODE_CONFIG_DIR/dcp.jsonc) → Project (.opencode/dcp.jsonc). Each level overrides the previous, so project settings take priority over config-dir and global, which take priority over defaults.

Restart OpenCode after making config changes.

Limitations

Subagents — DCP is disabled for subagents. Subagents are not designed to be token efficient; what matters is that the final message returned to the main agent is a concise summary of findings. DCP's pruning could interfere with this summarization behavior.

References

License

MIT