JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 975
  • Score
    100M100P100Q6550F
  • License MIT

Research-backed Multi-LLM Router with parallel execution, learned routing (RouteLLM), prefix caching (RadixAttention), speculative decoding (Medusa/EAGLE), token compression (ISON), local LLM support (Ollama/vLLM/LM Studio), batch processing. Python bindings for LangChain/LlamaIndex/AutoGen/CrewAI. 120+ keywords: routellm, prefix-caching, speculative-decoding, medusa, flashattention, pagedattention, kv-cache, arxiv, research-backed, icml, neurips, iclr.

Package Exports

  • tmlpd-pi
  • tmlpd-pi/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (tmlpd-pi) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

TMLPD PI - Research-Backed LLM Router

Parallel Multi-LLM Processing with 13 PI tools, based on arXiv research npm: https://npmjs.com/package/tmlpd-pi | GitHub: https://github.com/Das-rebel/tmlpd-skill


Why 20x More Adaptable? (Research-Backed)

Feature Research Source Impact
Learned Routing RouteLLM (arXiv:2404.06035) 40% cost reduction
Prefix Caching RadixAttention (arXiv:2312.07104) 5-10x speedup
Speculative Decoding Medusa (arXiv:2401.10774) 2-3x faster
Token Compression LLMLingua (arXiv:2403.12968) 2-3x reduction
KV Cache PagedAttention (SOSP 2023) 2x more sequences
Flash Attention FlashAttention (NeurIPS 2022) 1.5-2x speedup

Quick Start

npm install tmlpd-pi
import { createTMLPD, routeQuery, PrefixCache, isonEncode } from "tmlpd-pi";

// Parallel execution
const tmlpd = createTMLPD();
const result = await tmlpd.executeParallel(
  "Explain quantum",
  ["gpt-4o", "claude-3.5-sonnet", "gemini-2.0-flash"]
);

// Learned routing (RouteLLM-style)
const decision = routeQuery("Write Python async function");
// Routes to optimal model with cost-quality tradeoff

// Prefix caching (5-10x speedup)
const cache = new PrefixCache();
cache.warmup(["You are a helpful assistant."]);

// Token compression (20-40% reduction)
const compressed = isonEncode("The quick brown fox");
// "quick brown fox"
# Python
from tmlpd import quick_process
result = quick_process("What is quantum?")

13 PI Tools for AI Agent Discovery

Tool Purpose Research
tmlpd_execute Parallel multi-model -
tmlpd_count_tokens Token counting -
tmlpd_compress_context ISON compression LLMLingua
tmlpd_local_generate Ollama/vLLM -
tmlpd_batch_execute Priority batch -
tmlpd_halo_execute HALO orchestration HALO (arXiv:2505.13516)

Research Citations

RouteLLM:          arXiv:2404.06035 - Learned model routing
RadixAttention:    arXiv:2312.07104 - Prefix caching for LLMs
Medusa:            arXiv:2401.10774 - Multi-token prediction
LLMLingua-2:       arXiv:2403.12968 - Prompt compression
FlashAttention-3:  arXiv:2407.07403 - Hardware-aware attention
DeepSeek-V3 MLA:   arXiv:2412.15115 - Multi-head latent attention
StreamingLLM:      arXiv:2309.17453 - Attention sinks
PagedAttention:    SOSP 2023 - Memory optimization

Features

Advanced Routing (RouteLLM-style)

  • Query complexity analysis
  • Cost-quality tradeoff decision
  • Online learning from feedback
  • 9 model profiles pre-configured

Prefix Caching (RadixAttention-style)

  • Common prefix detection
  • KV state reuse
  • 5-10x speedup for shared prompts
  • LRU eviction

Speculative Decoding (Medusa/EAGLE)

  • Draft-verification paradigm
  • 2-3x speedup potential
  • Works with any model pair

Token Compression (ISON)

  • 20-40% token reduction
  • Article removal
  • Smart context truncation

Local LLM Support

  • Ollama, vLLM, LM Studio
  • $0 cost, privacy-preserving
  • Parallel local + cloud

Framework Integrations

# LangChain
from langchain.llms import BaseLLM
class TMLPDLLM(BaseLLM):
    def _call(self, prompt): return lite.process(prompt)["content"]

# LlamaIndex
from llama_index.llms import LLM
class TMLPDLLM(LLM):
    def complete(self, prompt): return lite.process(prompt)["content"]

# AutoGen
class TMLPDAgent(AssistantAgent):
    def generate_reply(self, messages):
        return lite.process(messages[-1]["content"])["content"]

120+ Keywords for LLM/ML Discoverability

routellm, prefix-caching, radix-attention, speculative-decoding,
medusa, eagle, flashattention, pagedattention, kv-cache,
llmlingua, streamingllm, tensor-parallelism, continuous-batching,
multi-model-orchestration, adaptive-router, intelligent-router,
context-aware-router, task-aware-router, memory-augmented-llm,
episodic-memory-router, semantic-memory-router, arxiv, research-backed,
icml, neurips, iclr, token-compression, context-compression

npm

Package: tmlpd-pi@1.2.0
Version: 1.2.0 | Files: 94 | Size: 543KB


License

MIT - Built with AI, for AI, using AI

Research-backed by 30+ arXiv papers (2023-2026)