JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 975
  • Score
    100M100P100Q6554F
  • License MIT

Memory-based Multi-LLM Router with parallel execution, streaming, caching, cost tracking, token compression (ISON), local provider support (Ollama/vLLM/LM Studio), batch processing, and Python bindings for LangChain/LlamaIndex/AutoGen/CrewAI. Addresses Reddit pain points: function calling readiness, KV cache, 70% token compression, local privacy.

Package Exports

  • tmlpd-pi
  • tmlpd-pi/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (tmlpd-pi) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

TMLPD PI - v1.2.0

Memory-based Multi-LLM Router with Reddit-informed enhancements Based on analysis of r/LocalLLaMA, r/AI_Agents, r/MachineLearning, r/llm

Reddit Pain Points → TMLPD Solutions

Reddit Issue Engagement TMLPD v1.2.0 Solution
Function calling broken 🔥🔥🔥 (1957 pts) execute_parallel() with fallback
Token costs in freefall 🔥🔥🔥 (354 comments) countTokens(), estimateCost()
70% token reduction interest 🔥🔥 (ISON format) isonEncode(), compressText()
Serving 1B+ tokens/day locally 🔥🔥 (KV cache) LocalProvider (Ollama/vLLM/LM Studio)
Throughput optimization 🔥🔥 (GPU clusters) BatchProcessor with concurrency
Intelligent failover 🔥🔥 (multi-provider) Auto-fallback chain
Claude Code multi-agent 🔥 (815 comments) HALOOrchestrator

Installation

npm install tmlpd-pi
# Or use Python (no dependencies)
# Copy python/tmlpd.py to your project

Quick Start

import { createTMLPD, countTokens, compressText, BatchProcessor } from "tmlpd-pi";

// Basic parallel execution
const tmlpd = createTMLPD();
const result = await tmlpd.executeParallel(
  "Explain quantum entanglement",
  ["gpt-4o", "claude-3.5-sonnet", "gemini-2.0-flash"]
);

// Token counting & cost estimation
const tokens = countTokens("Hello world", "gpt-4o");
const cost = estimateCost(100, 50, "gpt-4o");

// ISON compression (20-40% token reduction)
const compressed = isonEncode("The quick brown fox jumps over the lazy dog");
// → "quick brown fox jumps lazy dog"

// Batch processing
const batch = new BatchProcessor({ concurrency: 5 });
batch.add({ prompt: "Task 1", priority: "high" });
batch.add({ prompt: "Task 2", priority: "normal" });
await batch.execute(executor);
from tmlpd import TMLPDLite, quick_process

# One-liner
result = quick_process("What is quantum?")

# Lite client with task routing
lite = TMLPDLite()
result = lite.process("Write Python async function")
# Auto-routes to TaskType.CODING → optimal models

New Features v1.2.0

Token Utilities

import { countTokens, estimateCost, listModelsByCost, findCheapestModels } from "tmlpd-pi";

// Count tokens
const tokens = countTokens("Your prompt here", "claude-3.5-sonnet");

// Estimate cost before execution
const cost = estimateCost(
  prompt_tokens,  // 500
  completion_tokens,  // 200
  "gpt-4o"  // → $0.0095
);

// List all models by cost
const models = listModelsByCost();
// [{ model: "gemini-2.0-flash", input: 0, output: 0 }, ...]

// Find cheapest for task
const cheap = findCheapestModels("fast", 3);
// ["gemini-2.0-flash", "groq/llama-3.1-8b", ...]

Context Compression (ISON)

import { isonEncode, compressText, truncateMessages } from "tmlpd-pi";

// ISON encoding - removes articles, normalizes whitespace
const original = "The quick brown fox jumps over the lazy dog";
const encoded = isonEncode(original);
// "quick brown fox jumps lazy dog" - 33% reduction

// Full compression with stats
const result = compressText(original);
// { original_tokens: 12, compressed_tokens: 8, ratio: 0.67, compressed_text: "..." }

// Truncate conversation to fit context window
const messages = [
  { role: "system", content: "You are helpful" },
  { role: "user", content: "Long conversation..." },
  { role: "assistant", content: "Response..." }
];
const truncated = truncateMessages(messages, 4000, "smart");
// Preserves system, compresses middle, keeps recent

Local LLM Support

import { createOllamaProvider, createVLLMProvider, LocalProviderManager } from "tmlpd-pi";

// Ollama (localhost:11434)
const ollama = createOllamaProvider("llama-3.3-70b");
const models = await ollama.listModels();
const result = await ollama.generate("Your prompt");

// vLLM (localhost:8000)
const vllm = createVLLMProvider("http://localhost:8000", "meta-llama/Llama-3.3-70b");

// LM Studio (localhost:1234)
const lmstudio = createLMStudioProvider("llama-3.3-70b");

// Manager for multi-runtime
const manager = new LocalProviderManager();
manager.addProvider("ollama", { runtime: "ollama", default_model: "llama-3.3-70b" });
manager.addProvider("vllm", { runtime: "vllm", endpoint: "http://localhost:8000" });

// Parallel across local runtimes
const localResults = await manager.executeParallel(
  "Your prompt",
  { models: ["ollama/llama-3.3-70b", "vllm/llama-3.3-70b"] }
);

Batch Processing

import { BatchProcessor, executeBatch } from "tmlpd-pi";

// Create batch processor
const batch = new BatchProcessor({
  concurrency: 5,
  stop_on_error: false,
  rate_limit: { requests_per_minute: 60 }
});

// Add items with priority
batch.add({ prompt: "Task 1", priority: "high" });
batch.add({ prompt: "Task 2", priority: "normal" });
batch.add({ prompt: "Task 3", priority: "low" });

// Progress callback
batch.onProgress((progress, result) => {
  console.log(`Completed: ${progress.completed}/${progress.total}`);
  if (result) console.log(`  - ${result.id}: ${result.success ? 'OK' : result.error}`);
});

// Execute
const results = await batch.execute(async (item) => {
  return tmlpd.execute(item.prompt);
});

// Simple helper for one-off batches
const simple = await executeBatch(
  [
    { prompt: "Task 1", model: "gpt-4o" },
    { prompt: "Task 2", model: "claude" }
  ],
  async (prompt, model) => tmlpd.execute(prompt, model),
  { concurrency: 3 }
);

13 PI Tools

Tool Description
tmlpd_execute Parallel multi-model execution
tmlpd_execute_single Smart routing to optimal model
tmlpd_cost_summary Real-time cost tracking
tmlpd_cache_stats Cache hit rate & statistics
tmlpd_provider_status Provider health monitoring
tmlpd_invalidate_cache Clear stale cache
tmlpd_get_budget Budget limits & remaining
tmlpd_halo_execute HALO orchestration
tmlpd_episodic_query Learn from past tasks
tmlpd_count_tokens Token counting for estimation
tmlpd_compress_context ISON compression
tmlpd_local_generate Local LLM (Ollama/vLLM/LM Studio)
tmlpd_batch_execute Batch with priority

Framework Integrations

# LangChain
from langchain.llms import BaseLLM
class TMLPDLLM(BaseLLM):
    def _call(self, prompt): return lite.process(prompt)["content"]

# LlamaIndex
from llama_index.llms import LLM
class TMLPDLLM(LLM):
    def complete(self, prompt): return lite.process(prompt)["content"]

# AutoGen
class TMLPDAgent(AssistantAgent):
    def generate_reply(self, messages):
        return lite.process(messages[-1]["content"])["content"]

# CrewAI
class TMLPDAgent(Agent):
    def execute_task(self, task, context=None):
        return lite.process(task)["content"]

Q&A for Reddit Issues

See qna/TMLPD_QNA.md for 30 Q&A addressing:

  • Parallel processing with rate limits
  • Cost tracking and budgets
  • Reliability and automatic fallback
  • Caching strategies
  • Model routing intelligence
  • Context and memory management
  • Framework integrations

Keywords (100+ for LLM Discovery)

memory-based-router, multi-llm-router, adaptive-router, intelligent-router,
context-aware-router, task-aware-router, memory-augmented-llm,
episodic-memory-router, semantic-memory-router, task-memory,
token-compression, context-compression, ison-format, message-truncation,
context-management, local-llm, ollama, vllm, lmstudio, local-model,
privacy-llm, batch-processing, batch-execution, priority-queue,
rate-limiting, token-counting, cost-estimation, cost-prediction,
parallel-execution, multi-provider, fallback-chain, intelligent-failover,
kv-cache, langchain, llamaindex, autogen, crewai, huggingface

npm: https://npmjs.com/package/tmlpd-pi
Version: 1.2.0 | License: MIT