Package Exports
- llm-pulse
Readme
llm-pulse
Zero-config CLI that tells you what LLMs your PC can run. Scans hardware, finds runtimes, recommends models.
npx llm-pulseInstall
# Run directly (no install)
npx llm-pulse
# Or install globally
npm install -g llm-pulseRequires Node.js 18+.
Commands
llm-pulse / llm-pulse scan
Hardware scan + model recommendations.
llm-pulse # Full scan (default)
llm-pulse --format json # JSON output
llm-pulse --category coding --top 3 # Top 3 coding models| Flag | Description | Default |
|---|---|---|
-f, --format |
table or json |
table |
-c, --category |
general, coding, reasoning, creative, multilingual |
all |
-t, --top <n> |
Number of recommendations | 5 |
-v, --verbose |
Detailed output | false |
llm-pulse doctor
System health check — scores your setup and gives suggestions.
llm-pulse doctor
llm-pulse doctor --format jsonllm-pulse models
Browse the model database filtered for your hardware.
llm-pulse models # All 45+ models
llm-pulse models --search llama # Search by name
llm-pulse models --category coding # Filter by category
llm-pulse models --fits # Only models that fit your VRAMllm-pulse monitor
Live TUI dashboard — like htop for LLMs. Press Tab to switch views, q to quit.
- Overview — CPU/GPU/RAM/VRAM bars with sparklines + smart alerts
- Inference — Throughput chart + session stats
- GPU — Per-GPU utilization, temperature, VRAM, and power sparklines with peak stats + temperature alerts
- VRAM Map — Visual VRAM breakdown (model weights / KV cache / overhead / free)
llm-pulse monitorllm-pulse benchmark
Quick inference benchmark via Ollama.
llm-pulse benchmark # Auto-picks smallest model
llm-pulse benchmark --model phi3 # Specific model
llm-pulse benchmark --rounds 5 # 5 rounds (default: 3)Programmatic API
import { detectHardware, getRecommendations } from "llm-pulse";
const hardware = await detectHardware();
const recs = getRecommendations(hardware, { category: "coding", top: 3 });
console.log(recs[0].score.model.name); // "Qwen 2.5 Coder 14B"
console.log(recs[0].score.fitLevel); // "comfortable"
console.log(recs[0].pullCommand); // "ollama pull qwen2.5-coder:14b"MCP Server
Use llm-pulse as an MCP tool from Claude Code, Cursor, or any MCP-compatible AI assistant. The assistant can scan your hardware, check model compatibility, and snapshot live GPU/VRAM state — all without leaving the chat.
Add to your Claude Code config (~/.claude.json or your project's .mcp.json):
{
"mcpServers": {
"llm-pulse": {
"command": "npx",
"args": ["-y", "llm-pulse-mcp"]
}
}
}Exposed tools:
| Tool | What it does |
|---|---|
scan |
Full hardware scan + ranked model recommendations |
check |
"Can I run this model?" verdict (yes/maybe/no) with best quantization + speed estimate |
recommend |
Ranked model list for your hardware, filterable by category |
doctor |
System health score with actionable suggestions |
models |
Browse / search the model database, optionally filtered to models that fit |
monitor |
One-shot live snapshot — CPU/GPU%, VRAM, temp, power, active Ollama model + tok/s |
Supported
Hardware: NVIDIA GPU (full CUDA/VRAM), AMD, Intel, Apple Silicon, any CPU (AVX2/NEON), DDR4/DDR5, NVMe/SSD/HDD
Runtimes: Ollama, llama.cpp, LM Studio
Models: 45+ models across general, coding, reasoning, creative, multilingual — each with Q4/Q5/Q8/F16 quantization variants
License
MIT