Package Exports
- tmlpd-pi
- tmlpd-pi/dist/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (tmlpd-pi) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
TMLPD PI - v1.2.0
Memory-based Multi-LLM Router with Reddit-informed enhancements Based on analysis of r/LocalLLaMA, r/AI_Agents, r/MachineLearning, r/llm
Reddit Pain Points → TMLPD Solutions
| Reddit Issue | Engagement | TMLPD v1.2.0 Solution |
|---|---|---|
| Function calling broken | 🔥🔥🔥 (1957 pts) | execute_parallel() with fallback |
| Token costs in freefall | 🔥🔥🔥 (354 comments) | countTokens(), estimateCost() |
| 70% token reduction interest | 🔥🔥 (ISON format) | isonEncode(), compressText() |
| Serving 1B+ tokens/day locally | 🔥🔥 (KV cache) | LocalProvider (Ollama/vLLM/LM Studio) |
| Throughput optimization | 🔥🔥 (GPU clusters) | BatchProcessor with concurrency |
| Intelligent failover | 🔥🔥 (multi-provider) | Auto-fallback chain |
| Claude Code multi-agent | 🔥 (815 comments) | HALOOrchestrator |
Installation
npm install tmlpd-pi# Or use Python (no dependencies)
# Copy python/tmlpd.py to your projectQuick Start
import { createTMLPD, countTokens, compressText, BatchProcessor } from "tmlpd-pi";
// Basic parallel execution
const tmlpd = createTMLPD();
const result = await tmlpd.executeParallel(
"Explain quantum entanglement",
["gpt-4o", "claude-3.5-sonnet", "gemini-2.0-flash"]
);
// Token counting & cost estimation
const tokens = countTokens("Hello world", "gpt-4o");
const cost = estimateCost(100, 50, "gpt-4o");
// ISON compression (20-40% token reduction)
const compressed = isonEncode("The quick brown fox jumps over the lazy dog");
// → "quick brown fox jumps lazy dog"
// Batch processing
const batch = new BatchProcessor({ concurrency: 5 });
batch.add({ prompt: "Task 1", priority: "high" });
batch.add({ prompt: "Task 2", priority: "normal" });
await batch.execute(executor);from tmlpd import TMLPDLite, quick_process
# One-liner
result = quick_process("What is quantum?")
# Lite client with task routing
lite = TMLPDLite()
result = lite.process("Write Python async function")
# Auto-routes to TaskType.CODING → optimal modelsNew Features v1.2.0
Token Utilities
import { countTokens, estimateCost, listModelsByCost, findCheapestModels } from "tmlpd-pi";
// Count tokens
const tokens = countTokens("Your prompt here", "claude-3.5-sonnet");
// Estimate cost before execution
const cost = estimateCost(
prompt_tokens, // 500
completion_tokens, // 200
"gpt-4o" // → $0.0095
);
// List all models by cost
const models = listModelsByCost();
// [{ model: "gemini-2.0-flash", input: 0, output: 0 }, ...]
// Find cheapest for task
const cheap = findCheapestModels("fast", 3);
// ["gemini-2.0-flash", "groq/llama-3.1-8b", ...]Context Compression (ISON)
import { isonEncode, compressText, truncateMessages } from "tmlpd-pi";
// ISON encoding - removes articles, normalizes whitespace
const original = "The quick brown fox jumps over the lazy dog";
const encoded = isonEncode(original);
// "quick brown fox jumps lazy dog" - 33% reduction
// Full compression with stats
const result = compressText(original);
// { original_tokens: 12, compressed_tokens: 8, ratio: 0.67, compressed_text: "..." }
// Truncate conversation to fit context window
const messages = [
{ role: "system", content: "You are helpful" },
{ role: "user", content: "Long conversation..." },
{ role: "assistant", content: "Response..." }
];
const truncated = truncateMessages(messages, 4000, "smart");
// Preserves system, compresses middle, keeps recentLocal LLM Support
import { createOllamaProvider, createVLLMProvider, LocalProviderManager } from "tmlpd-pi";
// Ollama (localhost:11434)
const ollama = createOllamaProvider("llama-3.3-70b");
const models = await ollama.listModels();
const result = await ollama.generate("Your prompt");
// vLLM (localhost:8000)
const vllm = createVLLMProvider("http://localhost:8000", "meta-llama/Llama-3.3-70b");
// LM Studio (localhost:1234)
const lmstudio = createLMStudioProvider("llama-3.3-70b");
// Manager for multi-runtime
const manager = new LocalProviderManager();
manager.addProvider("ollama", { runtime: "ollama", default_model: "llama-3.3-70b" });
manager.addProvider("vllm", { runtime: "vllm", endpoint: "http://localhost:8000" });
// Parallel across local runtimes
const localResults = await manager.executeParallel(
"Your prompt",
{ models: ["ollama/llama-3.3-70b", "vllm/llama-3.3-70b"] }
);Batch Processing
import { BatchProcessor, executeBatch } from "tmlpd-pi";
// Create batch processor
const batch = new BatchProcessor({
concurrency: 5,
stop_on_error: false,
rate_limit: { requests_per_minute: 60 }
});
// Add items with priority
batch.add({ prompt: "Task 1", priority: "high" });
batch.add({ prompt: "Task 2", priority: "normal" });
batch.add({ prompt: "Task 3", priority: "low" });
// Progress callback
batch.onProgress((progress, result) => {
console.log(`Completed: ${progress.completed}/${progress.total}`);
if (result) console.log(` - ${result.id}: ${result.success ? 'OK' : result.error}`);
});
// Execute
const results = await batch.execute(async (item) => {
return tmlpd.execute(item.prompt);
});
// Simple helper for one-off batches
const simple = await executeBatch(
[
{ prompt: "Task 1", model: "gpt-4o" },
{ prompt: "Task 2", model: "claude" }
],
async (prompt, model) => tmlpd.execute(prompt, model),
{ concurrency: 3 }
);13 PI Tools
| Tool | Description |
|---|---|
tmlpd_execute |
Parallel multi-model execution |
tmlpd_execute_single |
Smart routing to optimal model |
tmlpd_cost_summary |
Real-time cost tracking |
tmlpd_cache_stats |
Cache hit rate & statistics |
tmlpd_provider_status |
Provider health monitoring |
tmlpd_invalidate_cache |
Clear stale cache |
tmlpd_get_budget |
Budget limits & remaining |
tmlpd_halo_execute |
HALO orchestration |
tmlpd_episodic_query |
Learn from past tasks |
tmlpd_count_tokens |
Token counting for estimation |
tmlpd_compress_context |
ISON compression |
tmlpd_local_generate |
Local LLM (Ollama/vLLM/LM Studio) |
tmlpd_batch_execute |
Batch with priority |
Framework Integrations
# LangChain
from langchain.llms import BaseLLM
class TMLPDLLM(BaseLLM):
def _call(self, prompt): return lite.process(prompt)["content"]
# LlamaIndex
from llama_index.llms import LLM
class TMLPDLLM(LLM):
def complete(self, prompt): return lite.process(prompt)["content"]
# AutoGen
class TMLPDAgent(AssistantAgent):
def generate_reply(self, messages):
return lite.process(messages[-1]["content"])["content"]
# CrewAI
class TMLPDAgent(Agent):
def execute_task(self, task, context=None):
return lite.process(task)["content"]Q&A for Reddit Issues
See qna/TMLPD_QNA.md for 30 Q&A addressing:
- Parallel processing with rate limits
- Cost tracking and budgets
- Reliability and automatic fallback
- Caching strategies
- Model routing intelligence
- Context and memory management
- Framework integrations
Keywords (100+ for LLM Discovery)
memory-based-router, multi-llm-router, adaptive-router, intelligent-router,
context-aware-router, task-aware-router, memory-augmented-llm,
episodic-memory-router, semantic-memory-router, task-memory,
token-compression, context-compression, ison-format, message-truncation,
context-management, local-llm, ollama, vllm, lmstudio, local-model,
privacy-llm, batch-processing, batch-execution, priority-queue,
rate-limiting, token-counting, cost-estimation, cost-prediction,
parallel-execution, multi-provider, fallback-chain, intelligent-failover,
kv-cache, langchain, llamaindex, autogen, crewai, huggingfacenpm: https://npmjs.com/package/tmlpd-pi
Version: 1.2.0 | License: MIT