JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 18
  • Score
    100M100P100Q56713F
  • License MIT

Unified AI creation engine — text, image, video, audio across all providers

Package Exports

  • noosphere

Readme

noosphere

Unified AI creation engine — text, image, video, and audio generation across all providers through a single interface.

One import. Every model. Every modality.

Features

  • 4 modalities — LLM chat, image generation, video generation, and text-to-speech
  • 246+ LLM models — via Pi-AI gateway (OpenAI, Anthropic, Google, Groq, Mistral, xAI, Cerebras, OpenRouter)
  • 867+ media endpoints — via FAL (Flux, SDXL, Kling, Sora 2, VEO 3, Kokoro, ElevenLabs, and hundreds more)
  • 30+ HuggingFace tasks — LLM, image, TTS, translation, summarization, classification, and more
  • Local-first architecture — Auto-detects ComfyUI, Ollama, Piper, and Kokoro on your machine
  • Agentic capabilities — Tool use, function calling, reasoning/thinking, vision, and agent loops via Pi-AI
  • Failover & retry — Automatic retries with exponential backoff and cross-provider failover
  • Usage tracking — Real-time cost, latency, and token tracking across all providers
  • TypeScript-first — Full type definitions with ESM and CommonJS support

Install

npm install noosphere

Quick Start

import { Noosphere } from 'noosphere';

const ai = new Noosphere();

// Chat with any LLM
const response = await ai.chat({
  messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.content);

// Generate an image
const image = await ai.image({
  prompt: 'A sunset over mountains',
  width: 1024,
  height: 1024,
});
console.log(image.url);

// Generate a video
const video = await ai.video({
  prompt: 'Ocean waves crashing on rocks',
  duration: 5,
});
console.log(video.url);

// Text-to-speech
const audio = await ai.speak({
  text: 'Welcome to Noosphere',
  voice: 'alloy',
  format: 'mp3',
});
// audio.buffer contains the audio data

Configuration

API keys are resolved from the constructor config or environment variables (config takes priority):

const ai = new Noosphere({
  keys: {
    openai: 'sk-...',
    anthropic: 'sk-ant-...',
    google: 'AIza...',
    fal: 'fal-...',
    huggingface: 'hf_...',
    groq: 'gsk_...',
    mistral: '...',
    xai: '...',
    openrouter: 'sk-or-...',
  },
});

Or set environment variables:

Variable Provider
OPENAI_API_KEY OpenAI
ANTHROPIC_API_KEY Anthropic
GEMINI_API_KEY Google Gemini
FAL_KEY FAL.ai
HUGGINGFACE_TOKEN Hugging Face
GROQ_API_KEY Groq
MISTRAL_API_KEY Mistral
XAI_API_KEY xAI (Grok)
OPENROUTER_API_KEY OpenRouter

Full Configuration Reference

const ai = new Noosphere({
  // API keys (or use env vars above)
  keys: { /* ... */ },

  // Default models per modality
  defaults: {
    llm: { provider: 'pi-ai', model: 'claude-sonnet-4-20250514' },
    image: { provider: 'fal', model: 'fal-ai/flux/schnell' },
    video: { provider: 'fal', model: 'fal-ai/kling-video/v2/master/text-to-video' },
    tts: { provider: 'fal', model: 'fal-ai/kokoro/american-english' },
  },

  // Local service configuration
  autoDetectLocal: true,  // env: NOOSPHERE_AUTO_DETECT_LOCAL
  local: {
    ollama: { enabled: true, host: 'http://localhost', port: 11434 },
    comfyui: { enabled: true, host: 'http://localhost', port: 8188 },
    piper: { enabled: true, host: 'http://localhost', port: 5500 },
    kokoro: { enabled: true, host: 'http://localhost', port: 5501 },
    custom: [],  // additional LocalServiceConfig[]
  },

  // Retry & failover
  retry: {
    maxRetries: 2,           // default: 2
    backoffMs: 1000,         // default: 1000 (exponential: 1s, 2s, 4s...)
    failover: true,          // default: true — try other providers on failure
    retryableErrors: ['PROVIDER_UNAVAILABLE', 'RATE_LIMITED', 'TIMEOUT'],
  },

  // Timeouts per modality (ms)
  timeout: {
    llm: 30000,    // 30s
    image: 120000, // 2min
    video: 300000, // 5min
    tts: 60000,    // 1min
  },

  // Model discovery cache (minutes)
  discoveryCacheTTL: 60,  // env: NOOSPHERE_DISCOVERY_CACHE_TTL

  // Real-time usage callback
  onUsage: (event) => {
    console.log(`${event.provider}/${event.model}: $${event.cost} (${event.latencyMs}ms)`);
  },
});

Local Service Environment Variables

Variable Default Description
OLLAMA_HOST http://localhost Ollama server host
OLLAMA_PORT 11434 Ollama server port
COMFYUI_HOST http://localhost ComfyUI server host
COMFYUI_PORT 8188 ComfyUI server port
PIPER_HOST http://localhost Piper TTS server host
PIPER_PORT 5500 Piper TTS server port
KOKORO_HOST http://localhost Kokoro TTS server host
KOKORO_PORT 5501 Kokoro TTS server port
NOOSPHERE_AUTO_DETECT_LOCAL true Enable/disable local service auto-detection
NOOSPHERE_DISCOVERY_CACHE_TTL 60 Model cache TTL in minutes

API Reference

new Noosphere(config?)

Creates a new instance. Providers are initialized lazily on first API call. Auto-detects local services via HTTP pings (2s timeout each).

Generation Methods

ai.chat(options): Promise<NoosphereResult>

Generate text with any LLM. Supports 246+ models across 8 providers.

const result = await ai.chat({
  provider: 'anthropic',                // optional — auto-resolved if omitted
  model: 'claude-sonnet-4-20250514',    // optional — uses default or first available
  messages: [
    { role: 'system', content: 'You are helpful.' },
    { role: 'user', content: 'Explain quantum computing' },
  ],
  temperature: 0.7,     // optional (0-2)
  maxTokens: 1024,      // optional
  jsonMode: false,       // optional
});

console.log(result.content);          // response text
console.log(result.thinking);         // reasoning output (Claude, GPT-5, o3, Gemini, Grok-4)
console.log(result.usage.cost);       // cost in USD
console.log(result.usage.input);      // input tokens
console.log(result.usage.output);     // output tokens
console.log(result.latencyMs);        // response time in ms

ai.stream(options): NoosphereStream

Stream LLM responses token-by-token. Same options as chat().

const stream = ai.stream({
  messages: [{ role: 'user', content: 'Write a story' }],
});

for await (const event of stream) {
  switch (event.type) {
    case 'text_delta':
      process.stdout.write(event.delta!);
      break;
    case 'thinking_delta':
      console.log('[thinking]', event.delta);
      break;
    case 'done':
      console.log('\n\nUsage:', event.result!.usage);
      break;
    case 'error':
      console.error(event.error);
      break;
  }
}

// Or consume the full result
const result = await stream.result();

// Abort at any time
stream.abort();

ai.image(options): Promise<NoosphereResult>

Generate images. Supports 200+ image models via FAL, HuggingFace, and ComfyUI.

const result = await ai.image({
  provider: 'fal',                              // optional
  model: 'fal-ai/flux-2-pro',                   // optional
  prompt: 'A futuristic cityscape at sunset',
  negativePrompt: 'blurry, low quality',         // optional
  width: 1024,                                   // optional
  height: 768,                                   // optional
  seed: 42,                                      // optional — reproducible results
  steps: 30,                                     // optional — inference steps (more = higher quality)
  guidanceScale: 7.5,                            // optional — prompt adherence (higher = stricter)
});

console.log(result.url);                // image URL (FAL)
console.log(result.buffer);             // image Buffer (HuggingFace, ComfyUI)
console.log(result.media?.width);       // actual dimensions
console.log(result.media?.height);
console.log(result.media?.format);      // 'png'

ai.video(options): Promise<NoosphereResult>

Generate videos. Supports 150+ video models via FAL (Kling, Sora 2, VEO 3, WAN, Pixverse, and more).

const result = await ai.video({
  provider: 'fal',
  model: 'fal-ai/kling-video/v2/master/text-to-video',
  prompt: 'A bird flying through clouds',
  imageUrl: 'https://...',    // optional — image-to-video
  duration: 5,                // optional — seconds
  fps: 24,                    // optional
  width: 1280,                // optional
  height: 720,                // optional
});

console.log(result.url);                // video URL
console.log(result.media?.duration);    // actual duration
console.log(result.media?.fps);         // frames per second
console.log(result.media?.format);      // 'mp4'

ai.speak(options): Promise<NoosphereResult>

Text-to-speech synthesis. Supports 50+ TTS models via FAL, HuggingFace, Piper, and Kokoro.

const result = await ai.speak({
  provider: 'fal',
  model: 'fal-ai/kokoro/american-english',
  text: 'Hello world',
  voice: 'af_heart',        // optional — voice ID
  language: 'en',            // optional
  speed: 1.0,                // optional
  format: 'mp3',             // optional — 'mp3' | 'wav' | 'ogg'
});

console.log(result.buffer);  // audio Buffer
console.log(result.url);     // audio URL (FAL)

Discovery Methods

ai.getProviders(modality?): Promise<ProviderInfo[]>

List available providers, optionally filtered by modality.

const providers = await ai.getProviders('llm');
// [{ id: 'pi-ai', name: 'Pi-AI', modalities: ['llm'], local: false, status: 'online', modelCount: 246 }]

ai.getModels(modality?): Promise<ModelInfo[]>

List all available models with full metadata.

const models = await ai.getModels('image');
// Returns ModelInfo[] with id, provider, name, modality, local, cost, capabilities

ai.getModel(provider, modelId): Promise<ModelInfo | null>

Get details about a specific model.

ai.syncModels(): Promise<SyncResult>

Refresh model lists from all providers. Returns sync count, per-provider breakdown, and any errors.

Usage Tracking

ai.getUsage(options?): UsageSummary

Get aggregated usage statistics with optional filtering.

const usage = ai.getUsage({
  since: '2024-01-01',    // optional — ISO date or Date object
  until: '2024-12-31',    // optional
  provider: 'openai',     // optional — filter by provider
  modality: 'llm',        // optional — filter by modality
});

console.log(usage.totalCost);        // total USD spent
console.log(usage.totalRequests);    // number of requests
console.log(usage.byProvider);       // { openai: 2.50, anthropic: 1.20, fal: 0.30 }
console.log(usage.byModality);       // { llm: 3.00, image: 0.70, video: 0.30, tts: 0.00 }

Lifecycle

ai.registerProvider(provider): void

Register a custom provider (see Custom Providers).

ai.dispose(): Promise<void>

Cleanup all provider resources, clear model cache, and reset usage tracker.

NoosphereResult

Every generation method returns a NoosphereResult:

interface NoosphereResult {
  content?: string;        // LLM response text
  thinking?: string;       // reasoning/thinking output (supported models)
  url?: string;            // media URL (images, videos, audio from cloud providers)
  buffer?: Buffer;         // media binary data (local providers, HuggingFace)
  provider: string;        // which provider handled the request
  model: string;           // which model was used
  modality: Modality;      // 'llm' | 'image' | 'video' | 'tts'
  latencyMs: number;       // request duration in milliseconds
  usage: {
    cost: number;          // cost in USD
    input?: number;        // input tokens/characters
    output?: number;       // output tokens
    unit?: string;         // 'tokens' | 'characters' | 'per_image' | 'per_second' | 'free'
  };
  media?: {
    width?: number;        // image/video width
    height?: number;       // image/video height
    duration?: number;     // video/audio duration in seconds
    format?: string;       // 'png' | 'mp4' | 'mp3' | 'wav'
    fps?: number;          // video frames per second
  };
}

Providers In Depth

Pi-AI — LLM Gateway (246+ models)

Provider ID: pi-ai Modalities: LLM (chat + streaming) Library: @mariozechner/pi-ai

A unified gateway that routes to 8 LLM providers through 4 different API protocols:

API Protocol Providers
anthropic-messages Anthropic
google-generative-ai Google
openai-responses OpenAI (reasoning models)
openai-completions OpenAI, xAI, Groq, Cerebras, Zai, OpenRouter

Anthropic Models (19)

Model Context Reasoning Vision Input Cost Output Cost
claude-opus-4-0 200k Yes Yes $15/M $75/M
claude-opus-4-1 200k Yes Yes $15/M $75/M
claude-sonnet-4-20250514 200k Yes Yes $3/M $15/M
claude-sonnet-4-5-20250929 200k Yes Yes $3/M $15/M
claude-3-7-sonnet-20250219 200k Yes Yes $3/M $15/M
claude-3-5-sonnet-20241022 200k No Yes $3/M $15/M
claude-haiku-4-5-20251001 200k No Yes $0.80/M $4/M
claude-3-5-haiku-20241022 200k No Yes $0.80/M $4/M
claude-3-haiku-20240307 200k No Yes $0.25/M $1.25/M
...and 10 more variants

OpenAI Models (24)

Model Context Reasoning Vision Input Cost Output Cost
gpt-5 200k Yes Yes $10/M $30/M
gpt-5-mini 200k Yes Yes $2.50/M $10/M
gpt-4.1 128k No Yes $2/M $8/M
gpt-4.1-mini 128k No Yes $0.40/M $1.60/M
gpt-4.1-nano 128k No Yes $0.10/M $0.40/M
gpt-4o 128k No Yes $2.50/M $10/M
gpt-4o-mini 128k No Yes $0.15/M $0.60/M
o3-pro 200k Yes Yes $20/M $80/M
o3-mini 200k Yes Yes $1.10/M $4.40/M
o4-mini 200k Yes Yes $1.10/M $4.40/M
codex-mini-latest 200k Yes No $1.50/M $6/M
...and 13 more variants

Google Gemini Models (19)

Model Context Reasoning Vision Cost
gemini-2.5-flash 1M Yes Yes $0.15-0.60/M
gemini-2.5-pro 1M Yes Yes $1.25-10/M
gemini-2.0-flash 1M No Yes $0.10-0.40/M
gemini-2.0-flash-lite 1M No Yes $0.025-0.10/M
gemini-1.5-flash 1M No Yes $0.075-0.30/M
gemini-1.5-pro 2M No Yes $1.25-5/M
...and 13 more variants

xAI Grok Models (20)

Model Context Reasoning Vision Input Cost
grok-4 256k Yes Yes $5/M
grok-4-fast 256k Yes Yes $3/M
grok-3 131k No Yes $3/M
grok-3-fast 131k No Yes $5/M
grok-3-mini-fast-latest 131k Yes No $0.30/M
grok-2-vision 32k No Yes $2/M
...and 14 more variants

Groq Models (15)

Model Context Cost
llama-3.3-70b-versatile 128k $0.59/M
llama-3.1-8b-instant 128k $0.05/M
mistral-saba-24b 32k $0.40/M
qwen-qwq-32b 128k $0.29/M
deepseek-r1-distill-llama-70b 128k $0.75/M
...and 10 more

Cerebras Models (3)

gpt-oss-120b, qwen-3-235b-a22b-instruct-2507, qwen-3-coder-480b

Zai Models (5)

glm-4.6, glm-4.5, glm-4.5-flash, glm-4.5v, glm-4.5-air

OpenRouter (141 models)

Aggregator providing access to hundreds of additional models including Llama, Deepseek, Mistral, Qwen, and many more. Full list available via ai.getModels('llm').

The Pi-AI Engine — Deep Dive

Noosphere's LLM provider is powered by @mariozechner/pi-ai, part of the Pi mono-repo by Mario Zechner (badlogic). Pi is NOT a wrapper like LangChain or Mastra — it's a micro-framework for agentic AI (~15K LOC, 4 npm packages) that was built from scratch as a minimalist alternative to Claude Code.

Pi consists of 4 packages in 3 tiers:

TIER 1 — FOUNDATION
  @mariozechner/pi-ai             LLM API: stream(), complete(), model registry
                                  0 internal deps, talks to 20+ providers

TIER 2 — INFRASTRUCTURE
  @mariozechner/pi-agent-core     Agent loop, tool execution, lifecycle events
                                  Depends on pi-ai

  @mariozechner/pi-tui            Terminal UI with differential rendering
                                  Standalone, 0 internal deps

TIER 3 — APPLICATION
  @mariozechner/pi-coding-agent   CLI + SDK: sessions, compaction, extensions
                                  Depends on all above

Noosphere uses @mariozechner/pi-ai (Tier 1) directly for LLM access. But the full Pi ecosystem provides capabilities that can be layered on top.


How Pi Keeps 200+ Models Updated

Pi does NOT hardcode models. It has an auto-generation pipeline that runs at build time:

STEP 1: FETCH (3 sources in parallel)
┌──────────────────┐  ┌──────────────────┐  ┌───────────────┐
│   models.dev     │  │   OpenRouter     │  │  Vercel AI    │
│   /api.json      │  │   /v1/models     │  │  Gateway      │
│                  │  │                  │  │  /v1/models   │
│ Context windows  │  │ Pricing ($/M)    │  │ Capability    │
│ Capabilities     │  │ Availability     │  │ tags          │
│ Tool support     │  │ Provider routing │  │               │
└────────┬─────────┘  └────────┬─────────┘  └──────┬────────┘
         └─────────┬───────────┴────────────────────┘
                   ▼
STEP 2: MERGE & DEDUPLICATE
         Priority: models.dev > OpenRouter > Vercel
         Key: provider + modelId
                   │
                   ▼
STEP 3: FILTER
         ✅ tool_call === true
         ✅ streaming supported
         ✅ system messages supported
         ✅ not deprecated
                   │
                   ▼
STEP 4: NORMALIZE
         Costs → $/million tokens
         API type → one of 4 protocols
         Input modes → ["text"] or ["text","image"]
                   │
                   ▼
STEP 5: PATCH (manual corrections)
         Claude Opus: cache pricing fix
         GPT-5.4: context window override
         Kimi K2.5: hardcoded pricing
                   │
                   ▼
STEP 6: GENERATE TypeScript
         → models.generated.ts (~330KB)
         → 200+ models with full type safety

Each generated model entry looks like:

{
  id: "claude-opus-4-6",
  name: "Claude Opus 4.6",
  api: "anthropic-messages",
  provider: "anthropic",
  baseUrl: "https://api.anthropic.com",
  reasoning: true,
  input: ["text", "image"],
  cost: {
    input: 15,          // $15/M tokens
    output: 75,         // $75/M tokens
    cacheRead: 1.5,     // prompt cache hit
    cacheWrite: 18.75,  // prompt cache write
  },
  contextWindow: 200_000,
  maxTokens: 32_000,
} satisfies Model<"anthropic-messages">

When a new model is released (e.g., Gemini 3.0), it appears in models.dev/OpenRouter → the script captures it → a new Pi version is published → Noosphere updates its dependency.


4 API Protocols — How Pi Talks to Every Provider

Pi abstracts all LLM providers into 4 wire protocols. Each protocol handles the differences in request format, streaming format, auth headers, and response parsing:

Protocol Providers Key Differences
anthropic-messages Anthropic, AWS Bedrock system as top-level field, content as [{type:"text", text:"..."}] blocks, x-api-key auth, anthropic-beta headers
openai-completions OpenAI, xAI, Groq, Cerebras, OpenRouter, Ollama, vLLM system as message with role:"system", content as string, Authorization: Bearer auth, tool_calls array
openai-responses OpenAI (reasoning models) New Responses API with server-side context, store: true, reasoning summaries
google-generative-ai Google Gemini, Vertex AI systemInstruction.parts[{text}], role "model" instead of "assistant", functionCall instead of tool_calls, thinkingConfig

The core function streamSimple() detects which protocol to use based on model.api and handles all the formatting/parsing transparently:

// What happens inside Pi when you call Noosphere's chat():
async function* streamSimple(
  model: Model,           // includes model.api to determine protocol
  context: Context,       // { systemPrompt, messages, tools }
  options?: StreamOptions  // { signal, onPayload, thinkingLevel, ... }
): AsyncIterable<AssistantMessageEvent> {
  // 1. Format request according to model.api protocol
  // 2. Open SSE/WebSocket stream
  // 3. Parse provider-specific chunks
  // 4. Emit normalized events:
  //    → text_delta, thinking_delta, tool_call, message_end
}

Agentic Capabilities

These are the capabilities people get access to through the Pi-AI engine:

1. Tool Use / Function Calling

Full structured tool calling supported across all major providers. Tool definitions use TypeBox schemas with runtime validation via AJV:

import { type Tool, StringEnum } from '@mariozechner/pi-ai';
import { Type } from '@sinclair/typebox';

// Define a tool with typed parameters
const searchTool: Tool = {
  name: 'web_search',
  description: 'Search the web for information',
  parameters: Type.Object({
    query: Type.String({ description: 'Search query' }),
    maxResults: Type.Optional(Type.Number({ default: 5 })),
    type: StringEnum(['web', 'images', 'news'], { description: 'Search type' }),
  }),
};

// Pass tools in context — Pi handles the rest
const context = {
  systemPrompt: 'You are a helpful assistant.',
  messages: [{ role: 'user', content: 'Search for recent AI news' }],
  tools: [searchTool],
};

How tool calling works internally:

User prompt → LLM → "I need to call web_search"
                         │
                         ▼
              Pi validates arguments with AJV
              against the TypeBox schema
                         │
                   ┌─────┴─────┐
                   │ Valid?     │
                   ├─Yes───────┤
                   │ Execute   │
                   │ tool      │
                   ├───────────┤
                   │ No        │
                   │ Return    │
                   │ validation│
                   │ error to  │
                   │ LLM       │
                   └───────────┘
                         │
                         ▼
              Tool result → back into context → LLM continues

Provider-specific tool_choice control:

  • Anthropic: "auto" | "any" | "none" | { type: "tool", name: "specific_tool" }
  • OpenAI: "auto" | "none" | "required" | { type: "function", function: { name: "..." } }
  • Google: "auto" | "none" | "any"

Partial JSON streaming: During streaming, Pi parses tool call arguments incrementally using partial JSON parsing. This means you can see tool arguments being built in real-time, not just after the tool call completes.

2. Reasoning / Extended Thinking

Pi provides unified thinking support across all providers that support it. Thinking blocks are automatically extracted, separated from regular text, and streamed as distinct events:

Provider Models Control Parameters How It Works
Anthropic Claude Opus, Sonnet 4+ thinkingEnabled: boolean, thinkingBudgetTokens: number Extended thinking blocks in response, separate thinking content type
OpenAI o1, o3, o4, GPT-5 reasoningEffort: "minimal" | "low" | "medium" | "high" Reasoning via Responses API, reasoningSummary: "auto" | "detailed" | "concise"
Google Gemini 2.5 Flash/Pro thinking.enabled: boolean, thinking.budgetTokens: number Thinking via thinkingConfig, mapped to effort levels
xAI Grok-4, Grok-3-mini Native reasoning Automatic when model supports it

Cross-provider thinking portability: When switching models mid-conversation, Pi converts thinking blocks between formats. Anthropic thinking blocks become <thinking> tagged text when sent to OpenAI/Google, and vice versa.

// Thinking is automatically extracted in Noosphere responses:
const result = await ai.chat({
  model: 'claude-opus-4-6',
  messages: [{ role: 'user', content: 'Solve this step by step: 15! / 13!' }],
});

console.log(result.thinking);  // "Let me work through this... 15! = 15 × 14 × 13!..."
console.log(result.content);   // "15! / 13! = 15 × 14 = 210"

// During streaming, thinking arrives as separate events:
const stream = ai.stream({ messages: [...] });
for await (const event of stream) {
  if (event.type === 'thinking_delta') console.log('[THINKING]', event.delta);
  if (event.type === 'text_delta') console.log('[RESPONSE]', event.delta);
}
3. Vision / Multimodal Input

Models with input: ["text", "image"] accept images alongside text. Pi handles the encoding and format differences per provider:

// Send images to vision-capable models
const messages = [{
  role: 'user',
  content: [
    { type: 'text', text: 'What is in this image?' },
    { type: 'image', data: base64PngString, mimeType: 'image/png' },
  ],
}];

// Supported MIME types: image/png, image/jpeg, image/gif, image/webp
// Images are silently ignored when sent to non-vision models

Vision-capable models include: All Claude models, all GPT-4o/GPT-5 models, Gemini models, Grok-2-vision, Grok-4, and select Groq models.

4. Agent Loop — Autonomous Tool Execution

The @mariozechner/pi-agent-core package provides a complete agent loop that automatically cycles through prompt → LLM → tool call → result → repeat until the task is done:

import { agentLoop } from '@mariozechner/pi-ai';

const events = agentLoop(userMessage, agentContext, {
  model: getModel('anthropic', 'claude-opus-4-6'),
  tools: [searchTool, readFileTool, writeFileTool],
  signal: abortController.signal,
});

for await (const event of events) {
  switch (event.type) {
    case 'agent_start':           // Agent begins
    case 'turn_start':            // New LLM turn begins
    case 'message_start':         // LLM starts responding
    case 'message_update':        // Text/thinking delta received
    case 'tool_execution_start':  // About to execute a tool
    case 'tool_execution_end':    // Tool finished, result available
    case 'message_end':           // LLM finished this message
    case 'turn_end':              // Turn complete (may loop if tools were called)
    case 'agent_end':             // All done, final messages available
  }
}

The agent loop state machine:

[User sends prompt]
        │
        ▼
  ┌─[Build Context]──▶ [Check Queues]──▶ [Stream LLM]◄── streamFn()
  │                                           │
  │                                     ┌─────┴──────┐
  │                                     │            │
  │                                   text      tool_call
  │                                     │            │
  │                                     ▼            ▼
  │                                  [Done]    [Execute Tool]
  │                                                  │
  │                                            tool result
  │                                                  │
  └──────────────────────────────────────────────────┘
                                    (loops back to Stream LLM)

Key design decisions:

  • Tools execute sequentially by default (parallelism can be added on top)
  • The streamFn is injectable — you can wrap it with middleware to modify requests per-provider
  • Tool arguments are validated at runtime using TypeBox + AJV before execution
  • Aborted/failed responses preserve partial content and usage data
  • Tool results are automatically added to the conversation context
5. The streamFn Pattern — Injectable Middleware

This is Pi's most powerful architectural feature. The streamFn is the function that actually talks to the LLM, and it can be wrapped with middleware like Express.js request handlers:

import type { StreamFn } from '@mariozechner/pi-agent-core';
import { streamSimple } from '@mariozechner/pi-ai';

// Start with Pi's base streaming function
let fn: StreamFn = streamSimple;

// Wrap it with middleware that modifies requests per-provider
fn = createMyCustomWrapper(fn, {
  // Add custom headers for Anthropic
  onPayload: (payload) => {
    if (model.provider === 'anthropic') {
      payload.headers['anthropic-beta'] = 'fine-grained-tool-streaming-2025-05-14';
    }
  },
});

// Each wrapper calls the previous one, forming a chain:
// request → wrapper3 → wrapper2 → wrapper1 → streamSimple → API

This pattern is what allows projects like OpenClaw to stack 16 provider-specific wrappers on top of Pi's base streaming — adding beta headers for Anthropic, WebSocket transport for OpenAI, thinking sanitization for Google, reasoning effort headers for OpenRouter, and more — without modifying Pi's source code.

6. Session Management (via pi-coding-agent)

The @mariozechner/pi-coding-agent package provides persistent session management with JSONL-based storage:

import { createAgentSession, SessionManager } from '@mariozechner/pi-coding-agent';

// Create a session with full persistence
const session = await createAgentSession({
  model: 'claude-opus-4-6',
  tools: myTools,
  sessionManager,  // handles JSONL persistence
});

const result = await session.run('Build a REST API');
// Session is automatically saved to:
// ~/.pi/agent/sessions/session_abc123.jsonl

Session file format (append-only JSONL):

{"role":"user","content":"Build a REST API","timestamp":1710000000}
{"role":"assistant","content":"I'll create...","model":"claude-opus-4-6","usage":{...}}
{"role":"toolResult","toolCallId":"tc_001","toolName":"bash","content":"OK"}
{"type":"compaction","summary":"The user asked to build...","preservedMessages":[...]}

Session operations:

  • create() — new session
  • open(id) — restore existing session
  • continueRecent() — continue the most recent session
  • forkFrom(id) — create a branch (new JSONL referencing parent)
  • inMemory() — RAM-only session (for SDK/testing)
7. Context Compaction — Automatic Context Window Management

When the conversation approaches the model's context window limit, Pi automatically compacts the history:

1. DETECT: Calculate inputTokens + outputTokens vs model.contextWindow
2. TRIGGER: Proactively before overflow, or as recovery after overflow error
3. SUMMARIZE: Send history to LLM with a compaction prompt
4. WRITE: Append compaction entry to JSONL:
   {"type":"compaction","summary":"...","preservedMessages":[last N messages]}
5. CONTINUE: Context is now summary + recent messages instead of full history

The JSONL file is never rewritten — compaction entries are appended, maintaining a complete audit trail.

8. Cost Tracking — Cache-Aware Pricing

Pi tracks costs per-request with cache-aware pricing for providers that support prompt caching:

// Every model has 4 cost dimensions:
{
  input: 15,          // $15 per 1M input tokens
  output: 75,         // $75 per 1M output tokens
  cacheRead: 1.5,     // $1.50 per 1M cached prompt tokens (read)
  cacheWrite: 18.75,  // $18.75 per 1M cached prompt tokens (write)
}

// Usage tracking on every response:
{
  input: 1500,        // tokens consumed as input
  output: 800,        // tokens generated
  cacheRead: 5000,    // prompt cache hits
  cacheWrite: 1500,   // prompt cache writes
  cost: {
    total: 0.082,     // total cost in USD
    input: 0.0225,
    output: 0.06,
    cacheRead: 0.0075,
    cacheWrite: 0.028,
  },
}

Anthropic and OpenAI support prompt caching. For providers without caching, cacheRead and cacheWrite are always 0.

9. Extension System (via pi-coding-agent)

Pi supports a plugin system where extensions can register tools, commands, and lifecycle hooks:

// Extensions are TypeScript modules loaded at runtime via jiti
export default function(api: ExtensionAPI) {
  // Register a custom tool
  api.registerTool('my_tool', {
    description: 'Does something useful',
    parameters: { /* TypeBox schema */ },
    execute: async (args) => 'result',
  });

  // Register a slash command
  api.registerCommand('/mycommand', {
    handler: async (args) => { /* ... */ },
    description: 'Custom command',
  });

  // Hook into the agent lifecycle
  api.on('before_agent_start', async (context) => {
    context.systemPrompt += '\nExtra instructions';
  });

  api.on('tool_execution_end', async (event) => {
    // Post-process tool results
  });
}

Resource discovery chain (priority):

  1. Project .pi/ directory (highest)
  2. User ~/.pi/agent/
  3. npm packages with Pi metadata
  4. Built-in defaults
10. The Anti-MCP Philosophy — Why Pi Uses CLI Instead

Pi explicitly rejects MCP (Model Context Protocol). Mario Zechner's argument, backed by benchmarks:

The token cost problem:

Approach Tools Tokens Consumed % of Claude's Context
Playwright MCP 21 tools 13,700 tokens 6.8%
Chrome DevTools MCP 26 tools 18,000 tokens 9.0%
Pi CLI + README N/A 225 tokens ~0.1%

That's a 60-80x reduction in token consumption. With 5 MCP servers, you lose ~55,000 tokens before doing any work.

Benchmark results (120 evaluations):

Approach Avg Cost Success Rate
CLI (tmux) $0.37 100%
CLI (terminalcp) $0.39 100%
MCP (terminalcp) $0.48 100%

Same success rate, MCP costs 30% more.

Pi's alternative: Progressive Disclosure via CLI tools + READMEs

Instead of loading all tool definitions upfront, Pi's agent has bash as a built-in tool and discovers CLI tools only when needed:

MCP approach:                          Pi approach:
─────────────                          ──────────
Session start →                        Session start →
  Load 21 Playwright tools               Load 4 tools: read, write, edit, bash
  Load 26 Chrome DevTools tools           (225 tokens)
  Load N more MCP tools
  (~55,000 tokens wasted)

When browser needed:                   When browser needed:
  Tools already loaded                   Agent reads SKILL.md (225 tokens)
  (but context is polluted)              Runs: browser-start.js
                                         Runs: browser-nav.js https://...
                                         Runs: browser-screenshot.js

When browser NOT needed:               When browser NOT needed:
  Tools still consume context             0 tokens wasted

The 4 built-in tools (what Pi argues is sufficient):

Tool What It Does Why It's Enough
read Read files (text + images) Supports offset/limit for large files
write Create/overwrite files Creates directories automatically
edit Replace text (oldText→newText) Surgical edits, like a diff
bash Execute any shell command bash can do everything else — replaces MCP entirely

The key insight: bash replaces MCP. Any CLI tool, API call, database query, or system operation can be invoked through bash. The agent reads the tool's README only when it needs it, paying tokens on-demand instead of upfront.


FAL — Media Generation (867+ endpoints)

Provider ID: fal Modalities: Image, Video, TTS Library: @fal-ai/client

The largest media generation provider with dynamic pricing fetched at runtime from https://api.fal.ai/v1/models/pricing.

Image Models (200+)

FLUX Family (20+ variants):

Model Description
fal-ai/flux/schnell Fast generation (default)
fal-ai/flux/dev Higher quality
fal-ai/flux-2 Next generation
fal-ai/flux-2-pro Professional quality
fal-ai/flux-2-flex Flexible variant
fal-ai/flux-2/edit Image editing
fal-ai/flux-2/lora LoRA fine-tuning
fal-ai/flux-pro/v1.1-ultra Ultra high quality
fal-ai/flux-pro/kontext Context-aware generation
fal-ai/flux-lora Custom style training
fal-ai/flux-vision-upscaler AI upscaling
fal-ai/flux-krea-trainer Model training
fal-ai/flux-lora-fast-training Fast fine-tuning
fal-ai/flux-lora-portrait-trainer Portrait specialist

Stable Diffusion: fal-ai/stable-diffusion-v15, fal-ai/stable-diffusion-v35-large, fal-ai/stable-diffusion-v35-medium, fal-ai/stable-diffusion-v3-medium

Other Image Models:

Model Description
fal-ai/recraft/v3/text-to-image Artistic generation
fal-ai/ideogram/v2, v2a, v3 Ideogram series
fal-ai/imagen3, fal-ai/imagen4/preview Google Imagen
fal-ai/gpt-image-1 GPT image generation
fal-ai/gpt-image-1/edit-image GPT image editing
fal-ai/reve/text-to-image Reve generation
fal-ai/sana, fal-ai/sana/sprint Sana models
fal-ai/pixart-sigma PixArt Sigma
fal-ai/bria/text-to-image/base Bria AI

Pre-trained LoRA Styles: fal-ai/flux-2-lora-gallery/sepia-vintage, virtual-tryon, satellite-view-style, realism, multiple-angles, hdr-style, face-to-full-portrait, digital-comic-art, ballpoint-pen-sketch, apartment-staging, add-background

Image Editing/Enhancement (30+ tools): fal-ai/image-editing/age-progression, baby-version, background-change, hair-change, expression-change, object-removal, photo-restoration, style-transfer, and many more.

Video Models (150+)

Kling Video (20+ variants):

Model Description
fal-ai/kling-video/v2/master/text-to-video Default text-to-video
fal-ai/kling-video/v2/master/image-to-video Image-to-video
fal-ai/kling-video/v2.5-turbo/pro/text-to-video Turbo pro
fal-ai/kling-video/o1/image-to-video O1 quality
fal-ai/kling-video/o1/video-to-video/edit Video editing
fal-ai/kling-video/lipsync/audio-to-video Lip sync
fal-ai/kling-video/video-to-audio Audio extraction

Sora 2 (OpenAI):

Model Description
fal-ai/sora-2/text-to-video Text-to-video
fal-ai/sora-2/text-to-video/pro Pro quality
fal-ai/sora-2/image-to-video Image-to-video
fal-ai/sora-2/video-to-video/remix Video remixing

VEO 3 (Google):

Model Description
fal-ai/veo3 VEO 3 standard
fal-ai/veo3/fast Fast variant
fal-ai/veo3/image-to-video Image-to-video
fal-ai/veo3.1 Latest version
fal-ai/veo3.1/reference-to-video Reference-guided
fal-ai/veo3.1/first-last-frame-to-video Frame interpolation

WAN (15+ variants): fal-ai/wan-pro/text-to-video, fal-ai/wan-pro/image-to-video, fal-ai/wan/v2.2-a14b/text-to-video, fal-ai/wan-vace-14b/depth, fal-ai/wan-vace-14b/inpainting, fal-ai/wan-vace-14b/pose, fal-ai/wan-effects

Pixverse (20+ variants): fal-ai/pixverse/v5.5/text-to-video, fal-ai/pixverse/v5.5/image-to-video, fal-ai/pixverse/v5.5/effects, fal-ai/pixverse/lipsync, fal-ai/pixverse/sound-effects

Minimax / Hailuo: fal-ai/minimax/hailuo-2.3/text-to-video/pro, fal-ai/minimax/hailuo-2.3/image-to-video/pro, fal-ai/minimax/video-01-director, fal-ai/minimax/video-01-live

Other Video Models:

Provider Models
Hunyuan fal-ai/hunyuan-video/text-to-video, image-to-video, video-to-video, foley
Pika fal-ai/pika/v2.2/text-to-video, pikascenes, pikaffects
LTX fal-ai/ltx-2/text-to-video, image-to-video, retake-video
Luma fal-ai/luma-dream-machine/ray-2, ray-2-flash, luma-photon
Vidu fal-ai/vidu/q2/text-to-video, image-to-video/pro
CogVideoX fal-ai/cogvideox-5b/text-to-video, video-to-video
Seedance fal-ai/bytedance/seedance/v1/text-to-video, image-to-video
Magi fal-ai/magi/text-to-video, extend-video

TTS / Speech Models (50+)

Kokoro (9 languages, 20+ voices per language):

Model Language Example Voices
fal-ai/kokoro/american-english English (US) af_heart, af_alloy, af_bella, af_nova, am_adam, am_echo, am_onyx
fal-ai/kokoro/british-english English (UK) British voice set
fal-ai/kokoro/french French French voice set
fal-ai/kokoro/japanese Japanese Japanese voice set
fal-ai/kokoro/spanish Spanish Spanish voice set
fal-ai/kokoro/mandarin-chinese Chinese Mandarin voice set
fal-ai/kokoro/italian Italian Italian voice set
fal-ai/kokoro/hindi Hindi Hindi voice set
fal-ai/kokoro/brazilian-portuguese Portuguese Portuguese voice set

ElevenLabs:

Model Description
fal-ai/elevenlabs/tts/eleven-v3 Professional quality
fal-ai/elevenlabs/tts/turbo-v2.5 Faster inference
fal-ai/elevenlabs/tts/multilingual-v2 Multi-language
fal-ai/elevenlabs/text-to-dialogue/eleven-v3 Dialogue generation
fal-ai/elevenlabs/sound-effects/v2 Sound effects
fal-ai/elevenlabs/speech-to-text Transcription
fal-ai/elevenlabs/audio-isolation Background removal

Other TTS: fal-ai/f5-tts (voice cloning), fal-ai/dia-tts, fal-ai/minimax/speech-2.6-turbo, fal-ai/minimax/speech-2.6-hd, fal-ai/chatterbox/text-to-speech, fal-ai/index-tts-2/text-to-speech

FAL Client Capabilities

The @fal-ai/client provides additional features beyond what Noosphere surfaces:

  • Queue API — Submit jobs, poll status, get results, cancel. Supports webhooks and priority levels
  • Streaming API — Real-time streaming responses via async iterators
  • Realtime API — WebSocket connections for interactive use (e.g., real-time image generation)
  • Storage API — File upload with configurable TTL (1h, 1d, 7d, 30d, 1y, never)
  • Retry logic — Configurable retries with exponential backoff and jitter
  • Request middleware — Custom request interceptors and proxy support

Hugging Face — Open Source AI (30+ tasks)

Provider ID: huggingface Modalities: LLM, Image, TTS Library: @huggingface/inference

Access to the entire Hugging Face Hub ecosystem. Any model hosted on HuggingFace can be used by passing its ID directly.

Default Models

Modality Default Model Description
LLM meta-llama/Llama-3.1-8B-Instruct Llama 3.1 8B
Image stabilityai/stable-diffusion-xl-base-1.0 SDXL Base
TTS facebook/mms-tts-eng MMS TTS English

Any HuggingFace model ID works — just pass it as the model parameter:

await ai.chat({
  provider: 'huggingface',
  model: 'mistralai/Mixtral-8x7B-v0.1',
  messages: [{ role: 'user', content: 'Hello' }],
});

Full Library Capabilities

The @huggingface/inference library (v3.15.0) provides 30+ AI tasks, including capabilities not yet surfaced by Noosphere:

Natural Language Processing:

Task Method Description
Chat chatCompletion() OpenAI-compatible chat completions
Chat Streaming chatCompletionStream() Token-by-token streaming
Text Generation textGeneration() Raw text completion
Summarization summarization() Text summarization
Translation translation() Language translation
Question Answering questionAnswering() Extract answers from context
Text Classification textClassification() Sentiment, topic classification
Zero-Shot Classification zeroShotClassification() Classify without training
Token Classification tokenClassification() NER, POS tagging
Sentence Similarity sentenceSimilarity() Semantic similarity scores
Feature Extraction featureExtraction() Text embeddings
Fill Mask fillMask() Fill in masked tokens
Table QA tableQuestionAnswering() Answer questions about tables

Computer Vision:

Task Method Description
Text-to-Image textToImage() Generate images from text
Image-to-Image imageToImage() Transform/edit images
Image Captioning imageToText() Describe images
Classification imageClassification() Classify image content
Object Detection objectDetection() Detect and locate objects
Segmentation imageSegmentation() Pixel-level segmentation
Zero-Shot Image zeroShotImageClassification() Classify without training
Text-to-Video textToVideo() Generate videos

Audio:

Task Method Description
Text-to-Speech textToSpeech() Generate speech
Speech-to-Text automaticSpeechRecognition() Transcription
Audio Classification audioClassification() Classify sounds
Audio-to-Audio audioToAudio() Source separation, enhancement

Multimodal:

Task Method Description
Visual QA visualQuestionAnswering() Answer questions about images
Document QA documentQuestionAnswering() Answer questions about documents

Tabular:

Task Method Description
Classification tabularClassification() Classify tabular data
Regression tabularRegression() Predict continuous values

HuggingFace Agentic Features

  • Tool/Function Calling: Full support via tools parameter with tool_choice control (auto/none/required)
  • JSON Schema Responses: response_format: { type: 'json_schema', json_schema: {...} }
  • Reasoning: reasoning_effort parameter (none/minimal/low/medium/high/xhigh)
  • Multimodal Input: Images via image_url content chunks in chat messages
  • 17 Inference Providers: Route through Groq, Together, Fireworks, Replicate, Cerebras, Cohere, and more

ComfyUI — Local Image Generation

Provider ID: comfyui Modalities: Image, Video (planned) Type: Local Default Port: 8188

Connects to a local ComfyUI instance for Stable Diffusion workflows.

How It Works

  1. Clones a built-in txt2img workflow template (KSampler + SDXL pipeline)
  2. Injects your parameters (prompt, dimensions, seed, steps, guidance)
  3. POSTs the workflow to ComfyUI's /prompt endpoint
  4. Polls /history/{promptId} every second until completion (max 5 minutes)
  5. Fetches the generated image from /view
  6. Returns a PNG buffer

Configuration

const ai = new Noosphere({
  local: {
    comfyui: {
      enabled: true,
      host: 'http://localhost',
      port: 8188,
    },
  },
});

Default Workflow

  • Checkpoint: sd_xl_base_1.0.safetensors
  • Sampler: euler with normal scheduler
  • Default Steps: 20
  • Default CFG/Guidance: 7
  • Default Size: 1024x1024
  • Max Size: 2048x2048
  • Output: PNG

Models Exposed

Model ID Modality Description
comfyui-txt2img Image Text-to-image via workflow
comfyui-txt2vid Video Planned (requires AnimateDiff workflow)

Local TTS — Piper & Kokoro

Provider IDs: piper, kokoro Modality: TTS Type: Local

Connects to local OpenAI-compatible TTS servers.

Supported Engines

Engine Default Port Health Check Voice Discovery
Piper 5500 GET /health GET /voices
Kokoro 5501 GET /health GET /v1/models (fallback)

API

Uses the OpenAI-compatible TTS endpoint:

POST /v1/audio/speech
{
  "model": "tts-1",
  "input": "Hello world",
  "voice": "default",
  "speed": 1.0,
  "response_format": "mp3"
}

Supports mp3, wav, and ogg formats. Returns audio as a Buffer.


Architecture

Provider Resolution (Local-First)

When you call a generation method without specifying a provider, Noosphere resolves one automatically:

  1. If model is specified without provider → looks up model in registry cache
  2. If a default is configured for the modality → uses that
  3. Otherwise → local providers first, then cloud providers
resolveProvider(modality):
  1. Check user-specified provider ID → return if found
  2. Check configured defaults → return if found
  3. Scan all providers:
     → Return first LOCAL provider supporting this modality
     → Fallback to first CLOUD provider
  4. Throw NO_PROVIDER error

Retry & Failover Logic

executeWithRetry(modality, provider, fn):
  for attempt = 0..maxRetries:
    try: return fn()
    catch:
      if error is retryable AND attempts remain:
        wait backoffMs * 2^attempt (exponential backoff)
        retry same provider
      if error is NOT GENERATION_FAILED AND failover enabled:
        try each alternative provider for this modality
      throw last error

Retryable errors (same provider): PROVIDER_UNAVAILABLE, RATE_LIMITED, TIMEOUT, GENERATION_FAILED

Failover-eligible errors (cross-provider): PROVIDER_UNAVAILABLE, RATE_LIMITED, TIMEOUT (NOT GENERATION_FAILED)

Model Registry & Caching

  • Models are fetched from providers via listModels() and cached in memory
  • Cache TTL is configurable (default: 60 minutes)
  • syncModels() forces a refresh of all provider model lists
  • Registry tracks model → provider mappings for fast resolution

Usage Tracking

Every API call (success or failure) records a UsageEvent:

interface UsageEvent {
  modality: 'llm' | 'image' | 'video' | 'tts';
  provider: string;
  model: string;
  cost: number;           // USD
  latencyMs: number;
  input?: number;         // tokens or characters
  output?: number;        // tokens
  unit?: string;
  timestamp: string;      // ISO 8601
  success: boolean;
  error?: string;         // error message if failed
  metadata?: Record<string, unknown>;
}

Error Handling

All errors are instances of NoosphereError:

import { NoosphereError } from 'noosphere';

try {
  await ai.chat({ messages: [{ role: 'user', content: 'Hello' }] });
} catch (err) {
  if (err instanceof NoosphereError) {
    console.log(err.code);           // error code
    console.log(err.provider);       // which provider failed
    console.log(err.modality);       // which modality
    console.log(err.model);          // which model (if known)
    console.log(err.cause);          // underlying error
    console.log(err.isRetryable());  // whether retry might help
  }
}

Error Codes

Code Description Retryable Failover
PROVIDER_UNAVAILABLE Provider is down or unreachable Yes Yes
RATE_LIMITED API rate limit exceeded Yes Yes
TIMEOUT Request exceeded timeout Yes Yes
GENERATION_FAILED Generation error (bad prompt, model issue) Yes No
AUTH_FAILED Invalid or missing API key No No
MODEL_NOT_FOUND Requested model doesn't exist No No
INVALID_INPUT Bad parameters or unsupported operation No No
NO_PROVIDER No provider available for the requested modality No No

Custom Providers

Extend Noosphere with your own providers:

import type { NoosphereProvider, ModelInfo, ChatOptions, NoosphereResult, Modality } from 'noosphere';

const myProvider: NoosphereProvider = {
  // Required properties
  id: 'my-provider',
  name: 'My Custom Provider',
  modalities: ['llm', 'image'] as Modality[],
  isLocal: false,

  // Required methods
  async ping() { return true; },
  async listModels(modality?: Modality): Promise<ModelInfo[]> {
    return [{
      id: 'my-model',
      provider: 'my-provider',
      name: 'My Model',
      modality: 'llm',
      local: false,
      cost: { price: 1.0, unit: 'per_1m_tokens' },
      capabilities: {
        contextWindow: 128000,
        maxTokens: 4096,
        supportsVision: false,
        supportsStreaming: true,
      },
    }];
  },

  // Optional methods — implement per modality
  async chat(options: ChatOptions): Promise<NoosphereResult> {
    const start = Date.now();
    // ... your implementation
    return {
      content: 'Response text',
      provider: 'my-provider',
      model: 'my-model',
      modality: 'llm',
      latencyMs: Date.now() - start,
      usage: { cost: 0.001, input: 100, output: 50, unit: 'tokens' },
    };
  },

  // stream?(options): NoosphereStream
  // image?(options): Promise<NoosphereResult>
  // video?(options): Promise<NoosphereResult>
  // speak?(options): Promise<NoosphereResult>
  // dispose?(): Promise<void>
};

ai.registerProvider(myProvider);

Provider Summary

Provider ID Modalities Type Models Library
Pi-AI Gateway pi-ai LLM Cloud 246+ @mariozechner/pi-ai
FAL.ai fal Image, Video, TTS Cloud 867+ @fal-ai/client
Hugging Face huggingface LLM, Image, TTS Cloud Unlimited (any HF model) @huggingface/inference
ComfyUI comfyui Image Local SDXL workflows Direct HTTP
Piper TTS piper TTS Local Piper voices Direct HTTP
Kokoro TTS kokoro TTS Local Kokoro voices Direct HTTP

Requirements

  • Node.js >= 18.0.0

License

MIT