Package Exports
- tmlpd-pi
- tmlpd-pi/dist/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (tmlpd-pi) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
TMLPD PI Extension
Parallel Multi-LLM Processing with Streaming, Caching, Cost Tracking, and Reliability
Overview
TMLPD PI is an npm package that brings parallel multi-LLM execution to the PI agent system. It provides:
- Parallel Execution - Execute prompts across multiple providers simultaneously
- Response Caching - SHA-256 hashed cache with TTL and LRU eviction
- Real-time Cost Tracking - Spending by provider, model, daily/monthly breakdowns
- Reliability Hardening - Retry with jitter, circuit breakers, smart cooldown
Installation
npm install tmlpd-piQuick Start
import { createTMLPD } from "tmlpd-pi";
const tmlpd = createTMLPD({
cache: { ttl_seconds: 3600 },
budget: { daily_limit: 10.0 },
});
// Single execution with smart routing
const result = await tmlpd.execute("Explain quantum entanglement");
// Parallel execution across multiple models
const parallel = await tmlpd.executeParallel(prompt, [
"openai/gpt-4o",
"groq/llama-3.3-70b",
"cerebras/llama-3.3-70b"
]);
// Check costs
const summary = tmlpd.getCostSummary();
console.log(`Total spent: $${summary.total_cost}`);Features
Parallel Execution
Execute across multiple LLM providers simultaneously with automatic fallback:
const result = await tmlpd.executeParallel(
"Compare Python and JavaScript",
["openai/gpt-4o", "groq/llama-3.3-70b", "cerebras/llama-3.3-70b"]
);
// Returns: { responses: [...], total_models: 3, successful_models: 3, total_cost: 0.002 }Response Caching
Avoid redundant API calls with intelligent caching:
// Cache hit = instant response, $0 cost
const cached = tmlpd.execute("Same prompt again");
// Check cache stats
const stats = tmlpd.getCacheStats();
// { hits: 5, misses: 3, size: 8, hit_rate: 0.625 }Cost Tracking
Real-time spending monitoring with budget alerts:
const summary = tmlpd.getCostSummary();
// {
// total_cost: 0.547,
// by_provider: { openai: 0.30, groq: 0.247 },
// by_model: { gpt-4o: 0.30, llama-3.3-70b: 0.247 },
// daily_costs: { '2024-01-15': 0.547 },
// request_count: 42,
// average_cost_per_request: 0.013
// }Reliability
Built-in retry logic and circuit breakers:
const tmlpd = createTMLPD({
retry: {
max_attempts: 5,
base_delay_ms: 1000,
max_delay_ms: 60000,
jitter: 0.3,
}
});Supported Providers
| Provider | Models | Cost/1M tokens |
|---|---|---|
| OpenAI | GPT-4o, GPT-4 Turbo, GPT-3.5 | $2.50-10.00 |
| Groq | Llama 3.3 70B, Llama 3.1 8B | $0.59-0.79 |
| Cerebras | Llama 3.3 70B | $0.10 |
| Mistral | Mistral Large, Mistral Small | $0.20-6.00 |
| xAI | Grok 2, Grok 2 Mini | $0.20-8.00 |
| Anthropic | Claude 3.5, Claude 3 | $3.00-75.00 |
| Gemini 1.5 Pro, Gemini 1.5 Flash | $0.075-5.00 | |
| ZAI | GLM models | ~$0.10-0.30 |
PI Integration
For PI agent integration, use the skill definition:
---
name: tmlpd
description: Parallel multi-LLM execution with streaming, caching, cost tracking, and reliability. Use when user wants parallel model execution, cost optimization across providers, or comparing multiple LLM responses simultaneously.
---API Reference
createTMLPD(config)
Create a configured TMLPD instance.
interface TMLPDConfig {
cache?: Partial<CacheConfig>;
budget?: BudgetConfig;
retry?: Partial<RetryConfig>;
maxConcurrent?: number;
}tmlpd.execute(prompt, model?, streaming?)
Execute with single or multiple models.
tmlpd.executeParallel(prompt, models?, streaming?)
Execute across multiple models in parallel.
tmlpd.getCostSummary()
Get comprehensive cost breakdown.
tmlpd.getCacheStats()
Get cache hit rate and size.
tmlpd.getProviderStatus()
Get all provider statuses.
License
MIT