Package Exports
- llm-burn
- llm-burn/dist/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (llm-burn) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
llm-burn
Cost tracking and budget enforcement for LLM calls in NestJS — with a single decorator.
Every OpenAI and Anthropic call burns money. llm-burn tells you exactly how much, per method, per model, in real time — and optionally blocks requests once a spending limit is reached. Zero changes to your business logic required.
Features
- One decorator —
@TrackLLMBurn()is all you need to start tracking a method - Auto-detection — parses OpenAI, Anthropic, flat, and LangChain response shapes out of the box
- Budget enforcement —
BudgetGuardthrows HTTP 403 before the LLM call is made when a cap is exceeded - Per-method and global budgets — granular control over individual methods or the entire application
- Built-in pricing — ships with up-to-date prices for all major GPT and Claude models
- Extensible — override prices, add custom models, or write a custom extractor for any SDK
Installation
npm install llm-burnPeer dependencies (already present in any NestJS project):
npm install @nestjs/common @nestjs/core reflect-metadata rxjsQuick Start
1. Register the module in AppModule:
import { LLMBurnModule } from 'llm-burn';
@Module({
imports: [
LLMBurnModule.forRoot({
globalBudget: 10.00, // block all LLM calls after $10 spent
enableLogging: true,
}),
],
})
export class AppModule {}2. Decorate the method that calls your LLM:
import { TrackLLMBurn } from 'llm-burn';
@Injectable()
export class AiService {
@TrackLLMBurn({ model: 'gpt-4o', budget: 2.00 })
async summarize(text: string) {
return this.openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: text }],
});
}
}3. Query costs anywhere in your application:
@Injectable()
export class DashboardService {
constructor(private readonly llmBurn: LLMBurnService) {}
getReport() {
return this.llmBurn.getStats();
// { totalCost, totalCalls, byMethod, byModel, records }
}
}That's it. Token usage is extracted automatically from the response — no wrappers, no interceptors to wire up manually.
Module Registration
Synchronous
LLMBurnModule.forRoot({
globalBudget: 10.00,
enableLogging: true,
})Async (with ConfigService)
LLMBurnModule.forRootAsync({
imports: [ConfigModule],
inject: [ConfigService],
useFactory: (cfg: ConfigService) => ({
globalBudget: cfg.get<number>('LLM_BUDGET'),
enableLogging: cfg.get<boolean>('LLM_LOGGING'),
}),
})With Global Interceptor
Registers LLMBurnInterceptor as an APP_INTERCEPTOR so every route in your application is automatically intercepted. Combine with @TrackLLMBurn() on specific methods to control what gets tracked.
LLMBurnModule.forRootWithGlobalInterceptor({
globalBudget: 5.00,
enableLogging: true,
})Decorator: @TrackLLMBurn
Marks a method for LLM cost tracking. Automatically attaches the interceptor to that method — no need to wire up UseInterceptors manually.
@TrackLLMBurn({ model: 'claude-3-5-sonnet-20241022', budget: 1.50 })
async generateReport(prompt: string) {
return this.anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [{ role: 'user', content: prompt }],
});
}| Option | Type | Description |
|---|---|---|
model |
string |
Model name used as fallback when the response doesn't include one. |
provider |
string |
Provider hint ("openai", "anthropic", or custom). Auto-detected from model name when omitted. |
budget |
number |
Per-method USD cap. BudgetGuard blocks calls once this is reached. |
extractUsage |
(result: unknown) => ExtractedUsage | null |
Custom extractor for non-standard response shapes. |
Budget Guard
BudgetGuard runs before the route handler. It checks two thresholds in order:
- Per-method budget — reads
budgetfrom@TrackLLMBurn({ budget: N })and compares it against the cumulative cost of all previous calls to that method. - Global budget — checks if
totalCost >= globalBudgetacross all tracked calls.
If either threshold is exceeded, it throws ForbiddenException (HTTP 403) and the LLM call is never made.
The guard checks cost accumulated from previous calls. The current call's cost is recorded after it completes (in the interceptor). This is by design — the guard acts as a spending limiter, not a per-call price check.
Applying the guard
Globally:
// main.ts
async function bootstrap() {
const app = await NestFactory.create(AppModule);
app.useGlobalGuards(app.get(BudgetGuard));
await app.listen(3000);
}Per controller:
@UseGuards(BudgetGuard)
@Controller('ai')
export class AiController {}Per route:
@UseGuards(BudgetGuard)
@Post('summarize')
@TrackLLMBurn({ model: 'gpt-4o', budget: 0.50 })
async summarize(@Body() dto: SummarizeDto) { ... }When
BudgetGuardis not registered, no request is ever blocked. Cost tracking via the interceptor still works normally.
LLMBurnService
Injectable service available anywhere after importing LLMBurnModule.
| Method | Return | Description |
|---|---|---|
getStats() |
LLMStats |
Full breakdown: totals, per-method, per-model, raw records |
getTotalCost() |
number |
Total USD spent across all calls |
getMethodCost(method) |
number |
Cumulative USD cost for a specific method |
getBudgetStatus() |
BudgetStatus |
Global budget usage (remaining, exceeded, % used) |
getGlobalBudget() |
number | undefined |
Configured global budget cap |
calculateCost(model, in, out, cached?) |
number |
Calculate USD cost for a given token count |
getPricing(model) |
ModelPricing | undefined |
Retrieve pricing for a model (supports prefix matching) |
listKnownModels() |
string[] |
All model names from built-in + custom prices |
record(method, model, provider, in, out) |
LLMCallRecord |
Manually record a call |
reset() |
void |
Clear all recorded usage |
Example — cost dashboard endpoint:
@Get('cost-report')
getCostReport() {
return {
stats: this.llmBurn.getStats(),
budget: this.llmBurn.getBudgetStatus(),
};
}Supported Response Formats
The interceptor auto-detects seven response shapes out of the box:
| Format | Shape |
|---|---|
| OpenAI SDK | { usage: { prompt_tokens, completion_tokens }, model } |
| Anthropic SDK | { usage: { input_tokens, output_tokens }, model } |
| Google Gemini SDK | { usageMetadata: { promptTokenCount, candidatesTokenCount } } |
| Cohere SDK | { meta: { tokens: { input_tokens, output_tokens } } } |
| Mistral SDK | Same as OpenAI (auto-detected) |
| Flat | { inputTokens, outputTokens, model? } |
| LangChain | { llmOutput: { tokenUsage: { promptTokens, completionTokens } } } |
Groq, Together AI, Azure OpenAI use the OpenAI SDK format and are detected automatically.
Custom extractor
For non-standard SDKs or response wrappers, provide an extractUsage function:
@TrackLLMBurn({
model: 'my-custom-model',
extractUsage: (result: unknown) => {
const r = result as MyCustomResponse;
if (!r?.meta?.tokens) return null;
return {
inputTokens: r.meta.tokens.input,
outputTokens: r.meta.tokens.output,
model: r.meta.model, // optional — overrides decorator model
provider: 'my-provider', // optional — overrides auto-detection
};
},
})
async callCustomLLM(prompt: string) { ... }Return null to skip recording for a specific call — the interceptor will log a warning.
Supported Models & Pricing
Prices are in USD per 1 million tokens (updated March 2026).
OpenAI
| Model | Input / M | Output / M | Cached Input / M |
|---|---|---|---|
gpt-4o |
$2.50 | $10.00 | $1.25 |
gpt-4o-mini |
$0.15 | $0.60 | $0.075 |
gpt-4-turbo |
$10.00 | $30.00 | — |
gpt-4 |
$30.00 | $60.00 | — |
gpt-3.5-turbo |
$0.50 | $1.50 | — |
o1 |
$15.00 | $60.00 | $7.50 |
o1-mini |
$3.00 | $12.00 | $1.50 |
o3 |
$10.00 | $40.00 | $2.50 |
o3-mini |
$1.10 | $4.40 | $0.55 |
gpt-4.5-preview |
$75.00 | $150.00 | $37.50 |
Anthropic
| Model | Input / M | Output / M |
|---|---|---|
claude-opus-4-6 |
$15.00 | $75.00 |
claude-sonnet-4-6 |
$3.00 | $15.00 |
claude-haiku-4-5 |
$0.80 | $4.00 |
claude-3-5-sonnet-20241022 |
$3.00 | $15.00 |
claude-3-5-haiku-20241022 |
$0.80 | $4.00 |
claude-3-opus-20240229 |
$15.00 | $75.00 |
claude-3-haiku-20240307 |
$0.25 | $1.25 |
claude-2.1 |
$8.00 | $24.00 |
Prefix matching: dated model variants like
gpt-4o-2024-11-20are matched automatically — if no exact match exists, the interceptor falls back to the nearest prefix entry (gpt-4o).
Google Gemini
| Model | Input / M | Output / M |
|---|---|---|
gemini-2.0-flash |
$0.10 | $0.40 |
gemini-2.0-flash-lite |
$0.075 | $0.30 |
gemini-1.5-pro |
$1.25 | $5.00 |
gemini-1.5-flash |
$0.075 | $0.30 |
gemini-1.5-flash-8b |
$0.0375 | $0.15 |
Cohere
| Model | Input / M | Output / M |
|---|---|---|
command-r-plus |
$2.50 | $10.00 |
command-r |
$0.15 | $0.60 |
command |
$1.00 | $2.00 |
command-light |
$0.30 | $0.60 |
Mistral
| Model | Input / M | Output / M |
|---|---|---|
mistral-large-latest |
$2.00 | $6.00 |
mistral-small-latest |
$0.10 | $0.30 |
codestral-latest |
$0.20 | $0.60 |
open-mistral-nemo |
$0.15 | $0.15 |
open-mixtral-8x22b |
$2.00 | $6.00 |
Custom Prices
Inline — add or override any model price at module registration:
LLMBurnModule.forRoot({
customPrices: {
'my-fine-tuned-gpt4': {
inputPricePerMillion: 5.00,
outputPricePerMillion: 20.00,
},
'local-llama': {
inputPricePerMillion: 0,
outputPricePerMillion: 0,
},
},
})External file — point to your own JSON file to manage prices independently of the package version:
LLMBurnModule.forRoot({
pricesPath: './prices.json',
})The file can be flat or nested (same shape as the built-in prices.json):
// flat
{
"gpt-4o": { "inputPricePerMillion": 2.50, "outputPricePerMillion": 10.00 },
"claude-sonnet-4-6": { "inputPricePerMillion": 3.00, "outputPricePerMillion": 15.00 }
}
// nested (grouped by provider)
{
"openai": {
"gpt-4o": { "inputPricePerMillion": 2.50, "outputPricePerMillion": 10.00 }
},
"anthropic": {
"claude-sonnet-4-6": { "inputPricePerMillion": 3.00, "outputPricePerMillion": 15.00 }
}
}Priority order: customPrices > pricesPath > built-in prices.
API Reference
LLMBurnModuleOptions
| Option | Type | Default | Description |
|---|---|---|---|
globalBudget |
number |
— | USD cap for total spend. BudgetGuard blocks when exceeded. |
enableLogging |
boolean |
false |
Log each tracked call via NestJS Logger. |
customPrices |
Record<string, ModelPricing> |
— | Inline price overrides. Takes precedence over everything. |
pricesPath |
string |
— | Path to an external JSON prices file. Takes precedence over built-in prices. |
LLMStats
interface LLMStats {
totalCost: number;
totalInputTokens: number;
totalOutputTokens: number;
totalCalls: number;
byMethod: Record<string, MethodStats>; // breakdown per decorated method
byModel: Record<string, ModelStats>; // breakdown per model name
records: LLMCallRecord[]; // all raw records
}BudgetStatus
interface BudgetStatus {
globalBudget?: number; // configured cap (undefined if not set)
totalCost: number; // total USD spent
remaining?: number; // USD left before cap (0 when exceeded)
isExceeded: boolean;
percentUsed?: number; // 0–100+
}License
MIT