Package Exports
- @compresr/sdk
- @compresr/sdk/agents
- @compresr/sdk/integrations/langchain
- @compresr/sdk/integrations/langgraph
- @compresr/sdk/integrations/llamaindex
Readme
Compresr TypeScript SDK
Query-aware LLM context compression — reduce LLM API costs by 30-70%.
Install
npm install @compresr/sdkGet an API key at compresr.ai.
Quick start
import { CompressionClient } from '@compresr/sdk';
const client = new CompressionClient({ apiKey: 'cmp_...' });
const result = await client.compress({
context: 'Long passage to compress...',
query: 'What is the main conclusion?',
targetCompressionRatio: 0.5,
});
console.log(`Saved ${result.data?.tokens_saved} tokens`);
console.log(result.data?.compressed_context);The default model is latte_v1 (query-aware). Pass any other model name your
account has access to via compressionModelName: '...' — the backend
validates.
Batch
const batch = await client.compressBatch({
contexts: ['Doc 1...', 'Doc 2...', 'Doc 3...'],
queries: 'What is self-attention?',
targetCompressionRatio: 0.5,
});
console.log(`Total saved: ${batch.data?.total_tokens_saved} tokens`);Streaming
for await (const chunk of client.compressStream({
context: '...',
query: '...',
})) {
process.stdout.write(chunk.content);
}Compression options
| Param | Purpose |
|---|---|
query |
Question the LLM is trying to answer — drives latte_v1 compression |
targetCompressionRatio |
0-1 strength or >1 for Nx factor (max 200) |
coarse |
true = paragraph-level (default, faster); false = token-level (fine-grained) |
heuristicChunking |
Structure-preserving chunking |
disablePlaceholders |
Disable placeholder tokens in output |
Error handling
import {
AuthenticationError,
RateLimitError,
ValidationError,
CompresrError,
} from '@compresr/sdk';
try {
await client.compress({ context, query });
} catch (err) {
if (err instanceof AuthenticationError) console.error('Invalid API key');
else if (err instanceof RateLimitError) console.error('Rate limited');
else if (err instanceof ValidationError) console.error('Bad request:', err);
else if (err instanceof CompresrError) console.error('API error:', err);
}Drop-in agent client
@compresr/sdk/agents ships a tiny provider-shape facade layer on top of
LangChain.js. One CompressionClient exposes three call surfaces — Anthropic
messages.create, OpenAI chat.completions.create, and a native run — all
backed by the same engine (createAgent + CompresrToolMiddleware). Every
tool output above minTokens is auto-compressed before it re-enters the
model's context.
- Pass
temperature,topP,maxTokens,stopSequences,presencePenalty,frequencyPenalty,seed, etc. to any facade — they're forwarded to the underlying chat model per call via.bind(...)(no cache pollution).
import { CompressionClient } from '@compresr/sdk';
import { WebSearchTool } from '@compresr/sdk/agents';
const client = new CompressionClient({
apiKey: process.env.COMPRESR_API_KEY!,
llm: 'anthropic', // bare provider — model lives at the call site
llmApiKey: process.env.ANTHROPIC_API_KEY!,
compression: { targetCompressionRatio: 0.5, minTokens: 300 },
});Use llm: 'anthropic:claude-haiku-4-5' if you want a default — but the call-site model: always wins.
Anthropic shape
const msg = await client.messages.create({
model: 'claude-haiku-4-5',
maxTokens: 512,
messages: [{ role: 'user', content: "What's the latest AI news?" }],
tools: [search],
});OpenAI shape
const completion = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: "Summarize today's top story." }],
tools: [search],
});Native shape
const result = await client.run({ prompt: 'Hello', tools: [search], maxTokens: 512 });
console.log(result.text, result.citations, result.compresrStats);Tools
WebSearchTool returns a LangChain.js BaseTool — usable across all three
facades.
import { WebSearchTool } from '@compresr/sdk/agents';
const tavily = await WebSearchTool.tavily({
apiKey: process.env.TAVILY_API_KEY!,
maxResults: 5,
allowedDomains: ['arxiv.org'],
});
const brave = await WebSearchTool.brave({
apiKey: process.env.BRAVE_API_KEY!,
maxResults: 5,
});WebSearchTool is also re-exported from @compresr/sdk for convenience, but
@compresr/sdk/agents is the recommended import path — heavy LangChain peers
stay dynamic, so the subpath only pulls them in when you touch the surface.
Custom tools
Any LangChain.js tool works. Use the standard tool({...}) decorator and
Compresr compresses its output the same way:
import { tool } from '@langchain/core/tools';
import { z } from 'zod';
const fetchDocs = tool(
async ({ query }) => fetchInternalDocs(query),
{
name: 'fetch_docs',
description: 'Search the internal docs corpus',
schema: z.object({ query: z.string() }),
},
);
await client.run({ prompt: 'How does billing work?', tools: [fetchDocs] });Provider switching
Swap providers by changing one line — the facades stay the same:
new CompressionClient({ apiKey, llm: 'anthropic:claude-haiku-4-5', llmApiKey });
new CompressionClient({ apiKey, llm: 'openai:gpt-4o-mini', llmApiKey });
new CompressionClient({ apiKey, llm: 'google_genai:gemini-2.5-flash', llmApiKey });Streaming (messages.stream, chat.completions.stream) is a Phase-2 item
and currently throws CompresrError('streaming not yet implemented').
Why no provider-native server search?
Anthropic web_search_20250305, OpenAI web_search_preview, and Gemini
google_search execute server-side and return encrypted or opaque content
that Compresr can't read — so it can't compress them either. Use
WebSearchTool.tavily / WebSearchTool.brave if you want compressible
search output in the agent loop.
Framework integrations
Optional peer dependencies — install only what you use:
npm install langchain @langchain/core # engine for the agents layer
npm install @langchain/anthropic # for anthropic:... models
npm install @langchain/openai # for openai:... models
npm install @langchain/google-genai # for google_genai:... models
npm install @langchain/tavily # for WebSearchTool.tavily
npm install @langchain/community # for WebSearchTool.brave
npm install @langchain/langgraph # for LangGraph integration
npm install llamaindex # for LlamaIndex integrationLangChain — middleware, tool wrapper, retriever
import { createAgent } from 'langchain';
import {
compresrToolMiddleware,
wrapToolWithCompression,
CompresrExtractor,
} from '@compresr/sdk/integrations/langchain';
const agent = createAgent({
model,
tools: [webSearch],
middleware: [
compresrToolMiddleware({
apiKey: process.env.COMPRESR_API_KEY!,
queryArg: 'query',
}),
],
});LangGraph — compression as a graph node
import { makeCompresrNode } from '@compresr/sdk/integrations/langgraph';
graph.addNode(
'compress',
makeCompresrNode<State>({
apiKey: process.env.COMPRESR_API_KEY!,
contextKey: 'retrieved_text',
queryKey: 'user_question',
})
);LlamaIndex — node postprocessor for RAG
import { CompresrNodePostprocessor } from '@compresr/sdk/integrations/llamaindex';
const queryEngine = index.asQueryEngine({
nodePostprocessors: [
new CompresrNodePostprocessor({
apiKey: process.env.COMPRESR_API_KEY!,
}),
],
});Unified query API
Every integration that accepts a query exposes the same three knobs:
| Param | Purpose |
|---|---|
query |
Static query — same for every call |
queryExtractor |
Callable that derives the query from the call context |
queryArg / queryKey |
Name of the tool arg / state key to use as the query |
Priority: query > queryExtractor > queryArg/queryKey > smart-pick
from common arg keys (query, question, search_query, ...) > last user
message in history.
Tutorials
Runnable end-to-end examples under tutorial/:
01-quickstart.ts— coreCompressionClient.02-langchain.ts— middleware + tool wrapper + retriever.03-langgraph.ts— compression node in a 3-node graph.04-llamaindex.ts— node postprocessor + tool wrapper.05-agents.md— drop-in agent client (Anthropic / OpenAI / native facades).
Run any with npx tsx --env-file=../.env tutorial/01-quickstart.ts.
Requirements
- Node.js 20+ (uses native
fetch) - TypeScript 5.0+ (optional, recommended)
- Optional peers:
@langchain/core>=0.3,langchain>=1.0,@langchain/anthropic>=1.0,@langchain/openai>=1.0,@langchain/google-genai>=0.2,@langchain/tavily>=0.1,@langchain/community>=0.3,@langchain/langgraph>=0.2,llamaindex>=0.8(install only what you use)
Development
npm install
npm test # unit tests
npm run test:integration # live tests (requires COMPRESR_API_KEY)
npm run buildLicense
Apache 2.0 — see LICENSE.
Support
- Docs: compresr.ai/docs
- Issues: GitHub
- Email: support@compresr.ai