Package Exports
- tmlpd-pi
- tmlpd-pi/dist/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (tmlpd-pi) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
TMLPD PI - Research-Backed LLM Router
Parallel Multi-LLM Processing with 13 PI tools, based on arXiv research npm: https://npmjs.com/package/tmlpd-pi | GitHub: https://github.com/Das-rebel/tmlpd-skill
Why 20x More Adaptable? (Research-Backed)
| Feature | Research Source | Impact |
|---|---|---|
| Learned Routing | RouteLLM (arXiv:2404.06035) | 40% cost reduction |
| Prefix Caching | RadixAttention (arXiv:2312.07104) | 5-10x speedup |
| Speculative Decoding | Medusa (arXiv:2401.10774) | 2-3x faster |
| Token Compression | LLMLingua (arXiv:2403.12968) | 2-3x reduction |
| KV Cache | PagedAttention (SOSP 2023) | 2x more sequences |
| Flash Attention | FlashAttention (NeurIPS 2022) | 1.5-2x speedup |
Quick Start
npm install tmlpd-piimport { createTMLPD, routeQuery, PrefixCache, isonEncode } from "tmlpd-pi";
// Parallel execution
const tmlpd = createTMLPD();
const result = await tmlpd.executeParallel(
"Explain quantum",
["gpt-4o", "claude-3.5-sonnet", "gemini-2.0-flash"]
);
// Learned routing (RouteLLM-style)
const decision = routeQuery("Write Python async function");
// Routes to optimal model with cost-quality tradeoff
// Prefix caching (5-10x speedup)
const cache = new PrefixCache();
cache.warmup(["You are a helpful assistant."]);
// Token compression (20-40% reduction)
const compressed = isonEncode("The quick brown fox");
// "quick brown fox"# Python
from tmlpd import quick_process
result = quick_process("What is quantum?")13 PI Tools for AI Agent Discovery
| Tool | Purpose | Research |
|---|---|---|
tmlpd_execute |
Parallel multi-model | - |
tmlpd_count_tokens |
Token counting | - |
tmlpd_compress_context |
ISON compression | LLMLingua |
tmlpd_local_generate |
Ollama/vLLM | - |
tmlpd_batch_execute |
Priority batch | - |
tmlpd_halo_execute |
HALO orchestration | HALO (arXiv:2505.13516) |
Research Citations
RouteLLM: arXiv:2404.06035 - Learned model routing
RadixAttention: arXiv:2312.07104 - Prefix caching for LLMs
Medusa: arXiv:2401.10774 - Multi-token prediction
LLMLingua-2: arXiv:2403.12968 - Prompt compression
FlashAttention-3: arXiv:2407.07403 - Hardware-aware attention
DeepSeek-V3 MLA: arXiv:2412.15115 - Multi-head latent attention
StreamingLLM: arXiv:2309.17453 - Attention sinks
PagedAttention: SOSP 2023 - Memory optimizationFeatures
Advanced Routing (RouteLLM-style)
- Query complexity analysis
- Cost-quality tradeoff decision
- Online learning from feedback
- 9 model profiles pre-configured
Prefix Caching (RadixAttention-style)
- Common prefix detection
- KV state reuse
- 5-10x speedup for shared prompts
- LRU eviction
Speculative Decoding (Medusa/EAGLE)
- Draft-verification paradigm
- 2-3x speedup potential
- Works with any model pair
Token Compression (ISON)
- 20-40% token reduction
- Article removal
- Smart context truncation
Local LLM Support
- Ollama, vLLM, LM Studio
- $0 cost, privacy-preserving
- Parallel local + cloud
Framework Integrations
# LangChain
from langchain.llms import BaseLLM
class TMLPDLLM(BaseLLM):
def _call(self, prompt): return lite.process(prompt)["content"]
# LlamaIndex
from llama_index.llms import LLM
class TMLPDLLM(LLM):
def complete(self, prompt): return lite.process(prompt)["content"]
# AutoGen
class TMLPDAgent(AssistantAgent):
def generate_reply(self, messages):
return lite.process(messages[-1]["content"])["content"]120+ Keywords for LLM/ML Discoverability
routellm, prefix-caching, radix-attention, speculative-decoding,
medusa, eagle, flashattention, pagedattention, kv-cache,
llmlingua, streamingllm, tensor-parallelism, continuous-batching,
multi-model-orchestration, adaptive-router, intelligent-router,
context-aware-router, task-aware-router, memory-augmented-llm,
episodic-memory-router, semantic-memory-router, arxiv, research-backed,
icml, neurips, iclr, token-compression, context-compressionnpm
Package: tmlpd-pi@1.2.0
Version: 1.2.0 | Files: 94 | Size: 543KB
License
MIT - Built with AI, for AI, using AI
Research-backed by 30+ arXiv papers (2023-2026)