Package Exports

tmlpd-pi
tmlpd-pi/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (tmlpd-pi) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

TMLPD PI Extension

Parallel Multi-LLM Processing with Streaming, Caching, Cost Tracking, and Reliability

Overview

TMLPD PI is an npm package that brings parallel multi-LLM execution to the PI agent system. It provides:

Parallel Execution - Execute prompts across multiple providers simultaneously
Response Caching - SHA-256 hashed cache with TTL and LRU eviction
Real-time Cost Tracking - Spending by provider, model, daily/monthly breakdowns
Reliability Hardening - Retry with jitter, circuit breakers, smart cooldown

Installation

npm install tmlpd-pi

Quick Start

import { createTMLPD } from "tmlpd-pi";

const tmlpd = createTMLPD({
  cache: { ttl_seconds: 3600 },
  budget: { daily_limit: 10.0 },
});

// Single execution with smart routing
const result = await tmlpd.execute("Explain quantum entanglement");

// Parallel execution across multiple models
const parallel = await tmlpd.executeParallel(prompt, [
  "openai/gpt-4o",
  "groq/llama-3.3-70b",
  "cerebras/llama-3.3-70b"
]);

// Check costs
const summary = tmlpd.getCostSummary();
console.log(`Total spent: $${summary.total_cost}`);

Features

Parallel Execution

Execute across multiple LLM providers simultaneously with automatic fallback:

const result = await tmlpd.executeParallel(
  "Compare Python and JavaScript",
  ["openai/gpt-4o", "groq/llama-3.3-70b", "cerebras/llama-3.3-70b"]
);
// Returns: { responses: [...], total_models: 3, successful_models: 3, total_cost: 0.002 }

Response Caching

Avoid redundant API calls with intelligent caching:

// Cache hit = instant response, $0 cost
const cached = tmlpd.execute("Same prompt again");

// Check cache stats
const stats = tmlpd.getCacheStats();
// { hits: 5, misses: 3, size: 8, hit_rate: 0.625 }

Cost Tracking

Real-time spending monitoring with budget alerts:

const summary = tmlpd.getCostSummary();
// {
//   total_cost: 0.547,
//   by_provider: { openai: 0.30, groq: 0.247 },
//   by_model: { gpt-4o: 0.30, llama-3.3-70b: 0.247 },
//   daily_costs: { '2024-01-15': 0.547 },
//   request_count: 42,
//   average_cost_per_request: 0.013
// }

Reliability

Built-in retry logic and circuit breakers:

const tmlpd = createTMLPD({
  retry: {
    max_attempts: 5,
    base_delay_ms: 1000,
    max_delay_ms: 60000,
    jitter: 0.3,
  }
});

Supported Providers

Provider	Models	Cost/1M tokens
OpenAI	GPT-4o, GPT-4 Turbo, GPT-3.5	$2.50-10.00
Groq	Llama 3.3 70B, Llama 3.1 8B	$0.59-0.79
Cerebras	Llama 3.3 70B	$0.10
Mistral	Mistral Large, Mistral Small	$0.20-6.00
xAI	Grok 2, Grok 2 Mini	$0.20-8.00
Anthropic	Claude 3.5, Claude 3	$3.00-75.00
Google	Gemini 1.5 Pro, Gemini 1.5 Flash	$0.075-5.00
ZAI	GLM models	~$0.10-0.30

PI Integration

For PI agent integration, use the skill definition:

---
name: tmlpd
description: Parallel multi-LLM execution with streaming, caching, cost tracking, and reliability. Use when user wants parallel model execution, cost optimization across providers, or comparing multiple LLM responses simultaneously.
---

API Reference

createTMLPD(config)

Create a configured TMLPD instance.

interface TMLPDConfig {
  cache?: Partial<CacheConfig>;
  budget?: BudgetConfig;
  retry?: Partial<RetryConfig>;
  maxConcurrent?: number;
}

tmlpd.execute(prompt, model?, streaming?)

Execute with single or multiple models.

tmlpd.executeParallel(prompt, models?, streaming?)

Execute across multiple models in parallel.

tmlpd.getCostSummary()

Get comprehensive cost breakdown.

tmlpd.getCacheStats()

Get cache hit rate and size.

tmlpd.getProviderStatus()

Get all provider statuses.

License

MIT