JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 239
  • Score
    100M100P100Q80733F
  • License MIT

Zero-config CLI for monitoring your local LLM hardware, runtimes, and model compatibility

Package Exports

  • llm-pulse

Readme

llm-pulse

npm version License: MIT Node.js

Zero-config CLI that tells you what LLMs your PC can run. Scans hardware, finds runtimes, recommends models.

npx llm-pulse

Install

# Run directly (no install)
npx llm-pulse

# Or install globally
npm install -g llm-pulse

Requires Node.js 18+.

Commands

llm-pulse / llm-pulse scan

Hardware scan + model recommendations.

llm-pulse                            # Full scan (default)
llm-pulse --format json              # JSON output
llm-pulse --category coding --top 3  # Top 3 coding models
Flag Description Default
-f, --format table or json table
-c, --category general, coding, reasoning, creative, multilingual all
-t, --top <n> Number of recommendations 5
-v, --verbose Detailed output false

llm-pulse doctor

System health check — scores your setup and gives suggestions.

llm-pulse doctor
llm-pulse doctor --format json

llm-pulse models

Browse the model database filtered for your hardware.

llm-pulse models                      # All 45+ models
llm-pulse models --search llama       # Search by name
llm-pulse models --category coding    # Filter by category
llm-pulse models --fits               # Only models that fit your VRAM

llm-pulse monitor

Live TUI dashboard — like htop for LLMs. Press Tab to switch views, q to quit.

  • Overview — CPU/GPU/RAM/VRAM bars with sparklines + smart alerts
  • Inference — Throughput chart + session stats
  • GPU — Per-GPU utilization, temperature, VRAM, and power sparklines with peak stats + temperature alerts
  • VRAM Map — Visual VRAM breakdown (model weights / KV cache / overhead / free)
llm-pulse monitor

llm-pulse benchmark

Quick inference benchmark via Ollama.

llm-pulse benchmark                  # Auto-picks smallest model
llm-pulse benchmark --model phi3     # Specific model
llm-pulse benchmark --rounds 5       # 5 rounds (default: 3)

Programmatic API

import { detectHardware, getRecommendations } from "llm-pulse";

const hardware = await detectHardware();
const recs = getRecommendations(hardware, { category: "coding", top: 3 });

console.log(recs[0].score.model.name);  // "Qwen 2.5 Coder 14B"
console.log(recs[0].score.fitLevel);     // "comfortable"
console.log(recs[0].pullCommand);        // "ollama pull qwen2.5-coder:14b"

MCP Server

Use llm-pulse as an MCP tool from Claude Code, Cursor, or any MCP-compatible AI assistant. The assistant can scan your hardware, check model compatibility, and snapshot live GPU/VRAM state — all without leaving the chat.

Add to your Claude Code config (~/.claude.json or your project's .mcp.json):

{
  "mcpServers": {
    "llm-pulse": {
      "command": "npx",
      "args": ["-y", "llm-pulse-mcp"]
    }
  }
}

Exposed tools:

Tool What it does
scan Full hardware scan + ranked model recommendations
check "Can I run this model?" verdict (yes/maybe/no) with best quantization + speed estimate
recommend Ranked model list for your hardware, filterable by category
doctor System health score with actionable suggestions
models Browse / search the model database, optionally filtered to models that fit
monitor One-shot live snapshot — CPU/GPU%, VRAM, temp, power, active Ollama model + tok/s

Supported

Hardware: NVIDIA GPU (full CUDA/VRAM), AMD, Intel, Apple Silicon, any CPU (AVX2/NEON), DDR4/DDR5, NVMe/SSD/HDD

Runtimes: Ollama, llama.cpp, LM Studio

Models: 45+ models across general, coding, reasoning, creative, multilingual — each with Q4/Q5/Q8/F16 quantization variants

License

MIT