JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 15
  • Score
    100M100P100Q30073F
  • License MIT

NestJS standalone library for tracking LLM call costs per method using interceptors

Package Exports

  • llm-burn
  • llm-burn/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (llm-burn) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

llm-burn

npm version license tests

Cost tracking and budget enforcement for LLM calls in NestJS — with a single decorator.

Every OpenAI and Anthropic call burns money. llm-burn tells you exactly how much, per method, per model, in real time — and optionally blocks requests once a spending limit is reached. Zero changes to your business logic required.


Features

  • One decorator@TrackLLMBurn() is all you need to start tracking a method
  • Auto-detection — parses OpenAI, Anthropic, flat, and LangChain response shapes out of the box
  • Budget enforcementBudgetGuard throws HTTP 403 before the LLM call is made when a cap is exceeded
  • Per-method and global budgets — granular control over individual methods or the entire application
  • Built-in pricing — ships with up-to-date prices for all major GPT and Claude models
  • Extensible — override prices, add custom models, or write a custom extractor for any SDK

Installation

npm install llm-burn

Peer dependencies (already present in any NestJS project):

npm install @nestjs/common @nestjs/core reflect-metadata rxjs

Quick Start

1. Register the module in AppModule:

import { LLMBurnModule } from 'llm-burn';

@Module({
  imports: [
    LLMBurnModule.forRoot({
      globalBudget: 10.00,  // block all LLM calls after $10 spent
      enableLogging: true,
    }),
  ],
})
export class AppModule {}

2. Decorate the method that calls your LLM:

import { TrackLLMBurn } from 'llm-burn';

@Injectable()
export class AiService {
  @TrackLLMBurn({ model: 'gpt-4o', budget: 2.00 })
  async summarize(text: string) {
    return this.openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: text }],
    });
  }
}

3. Query costs anywhere in your application:

@Injectable()
export class DashboardService {
  constructor(private readonly llmBurn: LLMBurnService) {}

  getReport() {
    return this.llmBurn.getStats();
    // { totalCost, totalCalls, byMethod, byModel, records }
  }
}

That's it. Token usage is extracted automatically from the response — no wrappers, no interceptors to wire up manually.


Module Registration

Synchronous

LLMBurnModule.forRoot({
  globalBudget: 10.00,
  enableLogging: true,
})

Async (with ConfigService)

LLMBurnModule.forRootAsync({
  imports: [ConfigModule],
  inject: [ConfigService],
  useFactory: (cfg: ConfigService) => ({
    globalBudget: cfg.get<number>('LLM_BUDGET'),
    enableLogging: cfg.get<boolean>('LLM_LOGGING'),
  }),
})

With Global Interceptor

Registers LLMBurnInterceptor as an APP_INTERCEPTOR so every route in your application is automatically intercepted. Combine with @TrackLLMBurn() on specific methods to control what gets tracked.

LLMBurnModule.forRootWithGlobalInterceptor({
  globalBudget: 5.00,
  enableLogging: true,
})

Decorator: @TrackLLMBurn

Marks a method for LLM cost tracking. Automatically attaches the interceptor to that method — no need to wire up UseInterceptors manually.

@TrackLLMBurn({ model: 'claude-3-5-sonnet-20241022', budget: 1.50 })
async generateReport(prompt: string) {
  return this.anthropic.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 1024,
    messages: [{ role: 'user', content: prompt }],
  });
}
Option Type Description
model string Model name used as fallback when the response doesn't include one.
provider string Provider hint ("openai", "anthropic", or custom). Auto-detected from model name when omitted.
budget number Per-method USD cap. BudgetGuard blocks calls once this is reached.
extractUsage (result: unknown) => ExtractedUsage | null Custom extractor for non-standard response shapes.

Budget Guard

BudgetGuard runs before the route handler. It checks two thresholds in order:

  1. Per-method budget — reads budget from @TrackLLMBurn({ budget: N }) and compares it against the cumulative cost of all previous calls to that method.
  2. Global budget — checks if totalCost >= globalBudget across all tracked calls.

If either threshold is exceeded, it throws ForbiddenException (HTTP 403) and the LLM call is never made.

The guard checks cost accumulated from previous calls. The current call's cost is recorded after it completes (in the interceptor). This is by design — the guard acts as a spending limiter, not a per-call price check.

Applying the guard

Globally:

// main.ts
async function bootstrap() {
  const app = await NestFactory.create(AppModule);
  app.useGlobalGuards(app.get(BudgetGuard));
  await app.listen(3000);
}

Per controller:

@UseGuards(BudgetGuard)
@Controller('ai')
export class AiController {}

Per route:

@UseGuards(BudgetGuard)
@Post('summarize')
@TrackLLMBurn({ model: 'gpt-4o', budget: 0.50 })
async summarize(@Body() dto: SummarizeDto) { ... }

When BudgetGuard is not registered, no request is ever blocked. Cost tracking via the interceptor still works normally.


LLMBurnService

Injectable service available anywhere after importing LLMBurnModule.

Method Return Description
getStats() LLMStats Full breakdown: totals, per-method, per-model, raw records
getTotalCost() number Total USD spent across all calls
getMethodCost(method) number Cumulative USD cost for a specific method
getBudgetStatus() BudgetStatus Global budget usage (remaining, exceeded, % used)
getGlobalBudget() number | undefined Configured global budget cap
calculateCost(model, in, out, cached?) number Calculate USD cost for a given token count
getPricing(model) ModelPricing | undefined Retrieve pricing for a model (supports prefix matching)
listKnownModels() string[] All model names from built-in + custom prices
record(method, model, provider, in, out) LLMCallRecord Manually record a call
reset() void Clear all recorded usage

Example — cost dashboard endpoint:

@Get('cost-report')
getCostReport() {
  return {
    stats: this.llmBurn.getStats(),
    budget: this.llmBurn.getBudgetStatus(),
  };
}

Supported Response Formats

The interceptor auto-detects seven response shapes out of the box:

Format Shape
OpenAI SDK { usage: { prompt_tokens, completion_tokens }, model }
Anthropic SDK { usage: { input_tokens, output_tokens }, model }
Google Gemini SDK { usageMetadata: { promptTokenCount, candidatesTokenCount } }
Cohere SDK { meta: { tokens: { input_tokens, output_tokens } } }
Mistral SDK Same as OpenAI (auto-detected)
Flat { inputTokens, outputTokens, model? }
LangChain { llmOutput: { tokenUsage: { promptTokens, completionTokens } } }

Groq, Together AI, Azure OpenAI use the OpenAI SDK format and are detected automatically.

Custom extractor

For non-standard SDKs or response wrappers, provide an extractUsage function:

@TrackLLMBurn({
  model: 'my-custom-model',
  extractUsage: (result: unknown) => {
    const r = result as MyCustomResponse;
    if (!r?.meta?.tokens) return null;
    return {
      inputTokens: r.meta.tokens.input,
      outputTokens: r.meta.tokens.output,
      model: r.meta.model,       // optional — overrides decorator model
      provider: 'my-provider',   // optional — overrides auto-detection
    };
  },
})
async callCustomLLM(prompt: string) { ... }

Return null to skip recording for a specific call — the interceptor will log a warning.


Supported Models & Pricing

Prices are in USD per 1 million tokens (updated March 2026).

OpenAI

Model Input / M Output / M Cached Input / M
gpt-4o $2.50 $10.00 $1.25
gpt-4o-mini $0.15 $0.60 $0.075
gpt-4-turbo $10.00 $30.00
gpt-4 $30.00 $60.00
gpt-3.5-turbo $0.50 $1.50
o1 $15.00 $60.00 $7.50
o1-mini $3.00 $12.00 $1.50
o3 $10.00 $40.00 $2.50
o3-mini $1.10 $4.40 $0.55
gpt-4.5-preview $75.00 $150.00 $37.50

Anthropic

Model Input / M Output / M
claude-opus-4-6 $15.00 $75.00
claude-sonnet-4-6 $3.00 $15.00
claude-haiku-4-5 $0.80 $4.00
claude-3-5-sonnet-20241022 $3.00 $15.00
claude-3-5-haiku-20241022 $0.80 $4.00
claude-3-opus-20240229 $15.00 $75.00
claude-3-haiku-20240307 $0.25 $1.25
claude-2.1 $8.00 $24.00

Prefix matching: dated model variants like gpt-4o-2024-11-20 are matched automatically — if no exact match exists, the interceptor falls back to the nearest prefix entry (gpt-4o).

Google Gemini

Model Input / M Output / M
gemini-2.0-flash $0.10 $0.40
gemini-2.0-flash-lite $0.075 $0.30
gemini-1.5-pro $1.25 $5.00
gemini-1.5-flash $0.075 $0.30
gemini-1.5-flash-8b $0.0375 $0.15

Cohere

Model Input / M Output / M
command-r-plus $2.50 $10.00
command-r $0.15 $0.60
command $1.00 $2.00
command-light $0.30 $0.60

Mistral

Model Input / M Output / M
mistral-large-latest $2.00 $6.00
mistral-small-latest $0.10 $0.30
codestral-latest $0.20 $0.60
open-mistral-nemo $0.15 $0.15
open-mixtral-8x22b $2.00 $6.00

Custom Prices

Inline — add or override any model price at module registration:

LLMBurnModule.forRoot({
  customPrices: {
    'my-fine-tuned-gpt4': {
      inputPricePerMillion: 5.00,
      outputPricePerMillion: 20.00,
    },
    'local-llama': {
      inputPricePerMillion: 0,
      outputPricePerMillion: 0,
    },
  },
})

External file — point to your own JSON file to manage prices independently of the package version:

LLMBurnModule.forRoot({
  pricesPath: './prices.json',
})

The file can be flat or nested (same shape as the built-in prices.json):

// flat
{
  "gpt-4o": { "inputPricePerMillion": 2.50, "outputPricePerMillion": 10.00 },
  "claude-sonnet-4-6": { "inputPricePerMillion": 3.00, "outputPricePerMillion": 15.00 }
}

// nested (grouped by provider)
{
  "openai": {
    "gpt-4o": { "inputPricePerMillion": 2.50, "outputPricePerMillion": 10.00 }
  },
  "anthropic": {
    "claude-sonnet-4-6": { "inputPricePerMillion": 3.00, "outputPricePerMillion": 15.00 }
  }
}

Priority order: customPrices > pricesPath > built-in prices.


API Reference

LLMBurnModuleOptions

Option Type Default Description
globalBudget number USD cap for total spend. BudgetGuard blocks when exceeded.
enableLogging boolean false Log each tracked call via NestJS Logger.
customPrices Record<string, ModelPricing> Inline price overrides. Takes precedence over everything.
pricesPath string Path to an external JSON prices file. Takes precedence over built-in prices.

LLMStats

interface LLMStats {
  totalCost: number;
  totalInputTokens: number;
  totalOutputTokens: number;
  totalCalls: number;
  byMethod: Record<string, MethodStats>;  // breakdown per decorated method
  byModel: Record<string, ModelStats>;    // breakdown per model name
  records: LLMCallRecord[];               // all raw records
}

BudgetStatus

interface BudgetStatus {
  globalBudget?: number;   // configured cap (undefined if not set)
  totalCost: number;       // total USD spent
  remaining?: number;      // USD left before cap (0 when exceeded)
  isExceeded: boolean;
  percentUsed?: number;    // 0–100+
}

License

MIT