Package Exports

llm-burn
llm-burn/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (llm-burn) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

llm-burn

Cost tracking and budget enforcement for LLM calls in NestJS — with a single decorator.

Every OpenAI and Anthropic call burns money. llm-burn tells you exactly how much, per method, per model, in real time — and optionally blocks requests once a spending limit is reached. Zero changes to your business logic required.

Features

One decorator — @TrackLLMBurn() is all you need to start tracking a method
Auto-detection — parses OpenAI, Anthropic, flat, and LangChain response shapes out of the box
Budget enforcement — BudgetGuard throws HTTP 403 before the LLM call is made when a cap is exceeded
Per-method and global budgets — granular control over individual methods or the entire application
Built-in pricing — ships with up-to-date prices for all major GPT and Claude models
Extensible — override prices, add custom models, or write a custom extractor for any SDK

Installation

npm install llm-burn

Peer dependencies (already present in any NestJS project):

npm install @nestjs/common @nestjs/core reflect-metadata rxjs

Quick Start

1. Register the module in AppModule:

import { LLMBurnModule } from 'llm-burn';

@Module({
  imports: [
    LLMBurnModule.forRoot({
      globalBudget: 10.00,  // block all LLM calls after $10 spent
      enableLogging: true,
    }),
  ],
})
export class AppModule {}

2. Decorate the method that calls your LLM:

import { TrackLLMBurn } from 'llm-burn';

@Injectable()
export class AiService {
  @TrackLLMBurn({ model: 'gpt-4o', budget: 2.00 })
  async summarize(text: string) {
    return this.openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: text }],
    });
  }
}

3. Query costs anywhere in your application:

@Injectable()
export class DashboardService {
  constructor(private readonly llmBurn: LLMBurnService) {}

  getReport() {
    return this.llmBurn.getStats();
    // { totalCost, totalCalls, byMethod, byModel, records }
  }
}

That's it. Token usage is extracted automatically from the response — no wrappers, no interceptors to wire up manually.

Module Registration

Synchronous

LLMBurnModule.forRoot({
  globalBudget: 10.00,
  enableLogging: true,
})

Async (with ConfigService)

LLMBurnModule.forRootAsync({
  imports: [ConfigModule],
  inject: [ConfigService],
  useFactory: (cfg: ConfigService) => ({
    globalBudget: cfg.get<number>('LLM_BUDGET'),
    enableLogging: cfg.get<boolean>('LLM_LOGGING'),
  }),
})

With Global Interceptor

Registers LLMBurnInterceptor as an APP_INTERCEPTOR so every route in your application is automatically intercepted. Combine with @TrackLLMBurn() on specific methods to control what gets tracked.

LLMBurnModule.forRootWithGlobalInterceptor({
  globalBudget: 5.00,
  enableLogging: true,
})

Decorator: `@TrackLLMBurn`

Marks a method for LLM cost tracking. Automatically attaches the interceptor to that method — no need to wire up UseInterceptors manually.

@TrackLLMBurn({ model: 'claude-3-5-sonnet-20241022', budget: 1.50 })
async generateReport(prompt: string) {
  return this.anthropic.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 1024,
    messages: [{ role: 'user', content: prompt }],
  });
}

Option	Type	Description
`model`	`string`	Model name used as fallback when the response doesn't include one.
`provider`	`string`	Provider hint (`"openai"`, `"anthropic"`, or custom). Auto-detected from model name when omitted.
`budget`	`number`	Per-method USD cap. `BudgetGuard` blocks calls once this is reached.
`extractUsage`	`(result: unknown) => ExtractedUsage \| null`	Custom extractor for non-standard response shapes.

Budget Guard

BudgetGuard runs before the route handler. It checks two thresholds in order:

Per-method budget — reads budget from @TrackLLMBurn({ budget: N }) and compares it against the cumulative cost of all previous calls to that method.
Global budget — checks if totalCost >= globalBudget across all tracked calls.

If either threshold is exceeded, it throws ForbiddenException (HTTP 403) and the LLM call is never made.

The guard checks cost accumulated from previous calls. The current call's cost is recorded after it completes (in the interceptor). This is by design — the guard acts as a spending limiter, not a per-call price check.

Applying the guard

Globally:

// main.ts
async function bootstrap() {
  const app = await NestFactory.create(AppModule);
  app.useGlobalGuards(app.get(BudgetGuard));
  await app.listen(3000);
}

Per controller:

@UseGuards(BudgetGuard)
@Controller('ai')
export class AiController {}

Per route:

@UseGuards(BudgetGuard)
@Post('summarize')
@TrackLLMBurn({ model: 'gpt-4o', budget: 0.50 })
async summarize(@Body() dto: SummarizeDto) { ... }

When BudgetGuard is not registered, no request is ever blocked. Cost tracking via the interceptor still works normally.

LLMBurnService

Injectable service available anywhere after importing LLMBurnModule.

Method	Return	Description
`getStats()`	`LLMStats`	Full breakdown: totals, per-method, per-model, raw records
`getTotalCost()`	`number`	Total USD spent across all calls
`getMethodCost(method)`	`number`	Cumulative USD cost for a specific method
`getBudgetStatus()`	`BudgetStatus`	Global budget usage (remaining, exceeded, % used)
`getGlobalBudget()`	`number \| undefined`	Configured global budget cap
`calculateCost(model, in, out, cached?)`	`number`	Calculate USD cost for a given token count
`getPricing(model)`	`ModelPricing \| undefined`	Retrieve pricing for a model (supports prefix matching)
`listKnownModels()`	`string[]`	All model names from built-in + custom prices
`record(method, model, provider, in, out)`	`LLMCallRecord`	Manually record a call
`reset()`	`void`	Clear all recorded usage

Example — cost dashboard endpoint:

@Get('cost-report')
getCostReport() {
  return {
    stats: this.llmBurn.getStats(),
    budget: this.llmBurn.getBudgetStatus(),
  };
}

Supported Response Formats

The interceptor auto-detects seven response shapes out of the box:

Format	Shape
OpenAI SDK	`{ usage: { prompt_tokens, completion_tokens }, model }`
Anthropic SDK	`{ usage: { input_tokens, output_tokens }, model }`
Google Gemini SDK	`{ usageMetadata: { promptTokenCount, candidatesTokenCount } }`
Cohere SDK	`{ meta: { tokens: { input_tokens, output_tokens } } }`
Mistral SDK	Same as OpenAI (auto-detected)
Flat	`{ inputTokens, outputTokens, model? }`
LangChain	`{ llmOutput: { tokenUsage: { promptTokens, completionTokens } } }`

Groq, Together AI, Azure OpenAI use the OpenAI SDK format and are detected automatically.

Custom extractor

For non-standard SDKs or response wrappers, provide an extractUsage function:

@TrackLLMBurn({
  model: 'my-custom-model',
  extractUsage: (result: unknown) => {
    const r = result as MyCustomResponse;
    if (!r?.meta?.tokens) return null;
    return {
      inputTokens: r.meta.tokens.input,
      outputTokens: r.meta.tokens.output,
      model: r.meta.model,       // optional — overrides decorator model
      provider: 'my-provider',   // optional — overrides auto-detection
    };
  },
})
async callCustomLLM(prompt: string) { ... }

Return null to skip recording for a specific call — the interceptor will log a warning.

Supported Models & Pricing

Prices are in USD per 1 million tokens (updated March 2026).

OpenAI

Model	Input / M	Output / M	Cached Input / M
`gpt-4o`	$2.50	$10.00	$1.25
`gpt-4o-mini`	$0.15	$0.60	$0.075
`gpt-4-turbo`	$10.00	$30.00	—
`gpt-4`	$30.00	$60.00	—
`gpt-3.5-turbo`	$0.50	$1.50	—
`o1`	$15.00	$60.00	$7.50
`o1-mini`	$3.00	$12.00	$1.50
`o3`	$10.00	$40.00	$2.50
`o3-mini`	$1.10	$4.40	$0.55
`gpt-4.5-preview`	$75.00	$150.00	$37.50

Anthropic

Model	Input / M	Output / M
`claude-opus-4-6`	$15.00	$75.00
`claude-sonnet-4-6`	$3.00	$15.00
`claude-haiku-4-5`	$0.80	$4.00
`claude-3-5-sonnet-20241022`	$3.00	$15.00
`claude-3-5-haiku-20241022`	$0.80	$4.00
`claude-3-opus-20240229`	$15.00	$75.00
`claude-3-haiku-20240307`	$0.25	$1.25
`claude-2.1`	$8.00	$24.00

Prefix matching: dated model variants like gpt-4o-2024-11-20 are matched automatically — if no exact match exists, the interceptor falls back to the nearest prefix entry (gpt-4o).

Google Gemini

Model	Input / M	Output / M
`gemini-2.0-flash`	$0.10	$0.40
`gemini-2.0-flash-lite`	$0.075	$0.30
`gemini-1.5-pro`	$1.25	$5.00
`gemini-1.5-flash`	$0.075	$0.30
`gemini-1.5-flash-8b`	$0.0375	$0.15

Cohere

Model	Input / M	Output / M
`command-r-plus`	$2.50	$10.00
`command-r`	$0.15	$0.60
`command`	$1.00	$2.00
`command-light`	$0.30	$0.60

Mistral

Model	Input / M	Output / M
`mistral-large-latest`	$2.00	$6.00
`mistral-small-latest`	$0.10	$0.30
`codestral-latest`	$0.20	$0.60
`open-mistral-nemo`	$0.15	$0.15
`open-mixtral-8x22b`	$2.00	$6.00

Custom Prices

Inline — add or override any model price at module registration:

LLMBurnModule.forRoot({
  customPrices: {
    'my-fine-tuned-gpt4': {
      inputPricePerMillion: 5.00,
      outputPricePerMillion: 20.00,
    },
    'local-llama': {
      inputPricePerMillion: 0,
      outputPricePerMillion: 0,
    },
  },
})

External file — point to your own JSON file to manage prices independently of the package version:

LLMBurnModule.forRoot({
  pricesPath: './prices.json',
})

The file can be flat or nested (same shape as the built-in prices.json):

// flat
{
  "gpt-4o": { "inputPricePerMillion": 2.50, "outputPricePerMillion": 10.00 },
  "claude-sonnet-4-6": { "inputPricePerMillion": 3.00, "outputPricePerMillion": 15.00 }
}

// nested (grouped by provider)
{
  "openai": {
    "gpt-4o": { "inputPricePerMillion": 2.50, "outputPricePerMillion": 10.00 }
  },
  "anthropic": {
    "claude-sonnet-4-6": { "inputPricePerMillion": 3.00, "outputPricePerMillion": 15.00 }
  }
}

Priority order: customPrices > pricesPath > built-in prices.

API Reference

LLMBurnModuleOptions

Option	Type	Default	Description
`globalBudget`	`number`	—	USD cap for total spend. `BudgetGuard` blocks when exceeded.
`enableLogging`	`boolean`	`false`	Log each tracked call via NestJS Logger.
`customPrices`	`Record<string, ModelPricing>`	—	Inline price overrides. Takes precedence over everything.
`pricesPath`	`string`	—	Path to an external JSON prices file. Takes precedence over built-in prices.

LLMStats

interface LLMStats {
  totalCost: number;
  totalInputTokens: number;
  totalOutputTokens: number;
  totalCalls: number;
  byMethod: Record<string, MethodStats>;  // breakdown per decorated method
  byModel: Record<string, ModelStats>;    // breakdown per model name
  records: LLMCallRecord[];               // all raw records
}

BudgetStatus

interface BudgetStatus {
  globalBudget?: number;   // configured cap (undefined if not set)
  totalCost: number;       // total USD spent
  remaining?: number;      // USD left before cap (0 when exceeded)
  isExceeded: boolean;
  percentUsed?: number;    // 0–100+
}

License

MIT

llm-burn

Package Exports

Readme

llm-burn

Features

Installation

Quick Start

Module Registration

Synchronous

Async (with ConfigService)

With Global Interceptor

Decorator: @TrackLLMBurn

Budget Guard

Applying the guard

LLMBurnService

Supported Response Formats

Custom extractor

Supported Models & Pricing

OpenAI

Anthropic

Google Gemini

Cohere

Mistral

Custom Prices

API Reference

LLMBurnModuleOptions

LLMStats

BudgetStatus

License

Decorator: `@TrackLLMBurn`