Package Exports
- noosphere
Readme
noosphere
Unified AI creation engine — text, image, video, and audio generation across all providers through a single interface.
One import. Every model. Every modality.
Features
- 4 modalities — LLM chat, image generation, video generation, and text-to-speech
- 246+ LLM models — via Pi-AI gateway (OpenAI, Anthropic, Google, Groq, Mistral, xAI, Cerebras, OpenRouter)
- 867+ media endpoints — via FAL (Flux, SDXL, Kling, Sora 2, VEO 3, Kokoro, ElevenLabs, and hundreds more)
- 30+ HuggingFace tasks — LLM, image, TTS, translation, summarization, classification, and more
- Local-first architecture — Auto-detects ComfyUI, Ollama, Piper, and Kokoro on your machine
- Agentic capabilities — Tool use, function calling, reasoning/thinking, vision, and agent loops via Pi-AI
- Failover & retry — Automatic retries with exponential backoff and cross-provider failover
- Usage tracking — Real-time cost, latency, and token tracking across all providers
- TypeScript-first — Full type definitions with ESM and CommonJS support
Install
npm install noosphereQuick Start
import { Noosphere } from 'noosphere';
const ai = new Noosphere();
// Chat with any LLM
const response = await ai.chat({
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.content);
// Generate an image
const image = await ai.image({
prompt: 'A sunset over mountains',
width: 1024,
height: 1024,
});
console.log(image.url);
// Generate a video
const video = await ai.video({
prompt: 'Ocean waves crashing on rocks',
duration: 5,
});
console.log(video.url);
// Text-to-speech
const audio = await ai.speak({
text: 'Welcome to Noosphere',
voice: 'alloy',
format: 'mp3',
});
// audio.buffer contains the audio dataConfiguration
API keys are resolved from the constructor config or environment variables (config takes priority):
const ai = new Noosphere({
keys: {
openai: 'sk-...',
anthropic: 'sk-ant-...',
google: 'AIza...',
fal: 'fal-...',
huggingface: 'hf_...',
groq: 'gsk_...',
mistral: '...',
xai: '...',
openrouter: 'sk-or-...',
},
});Or set environment variables:
| Variable | Provider |
|---|---|
OPENAI_API_KEY |
OpenAI |
ANTHROPIC_API_KEY |
Anthropic |
GEMINI_API_KEY |
Google Gemini |
FAL_KEY |
FAL.ai |
HUGGINGFACE_TOKEN |
Hugging Face |
GROQ_API_KEY |
Groq |
MISTRAL_API_KEY |
Mistral |
XAI_API_KEY |
xAI (Grok) |
OPENROUTER_API_KEY |
OpenRouter |
Full Configuration Reference
const ai = new Noosphere({
// API keys (or use env vars above)
keys: { /* ... */ },
// Default models per modality
defaults: {
llm: { provider: 'pi-ai', model: 'claude-sonnet-4-20250514' },
image: { provider: 'fal', model: 'fal-ai/flux/schnell' },
video: { provider: 'fal', model: 'fal-ai/kling-video/v2/master/text-to-video' },
tts: { provider: 'fal', model: 'fal-ai/kokoro/american-english' },
},
// Local service configuration
autoDetectLocal: true, // env: NOOSPHERE_AUTO_DETECT_LOCAL
local: {
ollama: { enabled: true, host: 'http://localhost', port: 11434 },
comfyui: { enabled: true, host: 'http://localhost', port: 8188 },
piper: { enabled: true, host: 'http://localhost', port: 5500 },
kokoro: { enabled: true, host: 'http://localhost', port: 5501 },
custom: [], // additional LocalServiceConfig[]
},
// Retry & failover
retry: {
maxRetries: 2, // default: 2
backoffMs: 1000, // default: 1000 (exponential: 1s, 2s, 4s...)
failover: true, // default: true — try other providers on failure
retryableErrors: ['PROVIDER_UNAVAILABLE', 'RATE_LIMITED', 'TIMEOUT'],
},
// Timeouts per modality (ms)
timeout: {
llm: 30000, // 30s
image: 120000, // 2min
video: 300000, // 5min
tts: 60000, // 1min
},
// Model discovery cache (minutes)
discoveryCacheTTL: 60, // env: NOOSPHERE_DISCOVERY_CACHE_TTL
// Real-time usage callback
onUsage: (event) => {
console.log(`${event.provider}/${event.model}: $${event.cost} (${event.latencyMs}ms)`);
},
});Local Service Environment Variables
| Variable | Default | Description |
|---|---|---|
OLLAMA_HOST |
http://localhost |
Ollama server host |
OLLAMA_PORT |
11434 |
Ollama server port |
COMFYUI_HOST |
http://localhost |
ComfyUI server host |
COMFYUI_PORT |
8188 |
ComfyUI server port |
PIPER_HOST |
http://localhost |
Piper TTS server host |
PIPER_PORT |
5500 |
Piper TTS server port |
KOKORO_HOST |
http://localhost |
Kokoro TTS server host |
KOKORO_PORT |
5501 |
Kokoro TTS server port |
NOOSPHERE_AUTO_DETECT_LOCAL |
true |
Enable/disable local service auto-detection |
NOOSPHERE_DISCOVERY_CACHE_TTL |
60 |
Model cache TTL in minutes |
API Reference
new Noosphere(config?)
Creates a new instance. Providers are initialized lazily on first API call. Auto-detects local services via HTTP pings (2s timeout each).
Generation Methods
ai.chat(options): Promise<NoosphereResult>
Generate text with any LLM. Supports 246+ models across 8 providers.
const result = await ai.chat({
provider: 'anthropic', // optional — auto-resolved if omitted
model: 'claude-sonnet-4-20250514', // optional — uses default or first available
messages: [
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'Explain quantum computing' },
],
temperature: 0.7, // optional (0-2)
maxTokens: 1024, // optional
jsonMode: false, // optional
});
console.log(result.content); // response text
console.log(result.thinking); // reasoning output (Claude, GPT-5, o3, Gemini, Grok-4)
console.log(result.usage.cost); // cost in USD
console.log(result.usage.input); // input tokens
console.log(result.usage.output); // output tokens
console.log(result.latencyMs); // response time in msai.stream(options): NoosphereStream
Stream LLM responses token-by-token. Same options as chat().
const stream = ai.stream({
messages: [{ role: 'user', content: 'Write a story' }],
});
for await (const event of stream) {
switch (event.type) {
case 'text_delta':
process.stdout.write(event.delta!);
break;
case 'thinking_delta':
console.log('[thinking]', event.delta);
break;
case 'done':
console.log('\n\nUsage:', event.result!.usage);
break;
case 'error':
console.error(event.error);
break;
}
}
// Or consume the full result
const result = await stream.result();
// Abort at any time
stream.abort();ai.image(options): Promise<NoosphereResult>
Generate images. Supports 200+ image models via FAL, HuggingFace, and ComfyUI.
const result = await ai.image({
provider: 'fal', // optional
model: 'fal-ai/flux-2-pro', // optional
prompt: 'A futuristic cityscape at sunset',
negativePrompt: 'blurry, low quality', // optional
width: 1024, // optional
height: 768, // optional
seed: 42, // optional — reproducible results
steps: 30, // optional — inference steps (more = higher quality)
guidanceScale: 7.5, // optional — prompt adherence (higher = stricter)
});
console.log(result.url); // image URL (FAL)
console.log(result.buffer); // image Buffer (HuggingFace, ComfyUI)
console.log(result.media?.width); // actual dimensions
console.log(result.media?.height);
console.log(result.media?.format); // 'png'ai.video(options): Promise<NoosphereResult>
Generate videos. Supports 150+ video models via FAL (Kling, Sora 2, VEO 3, WAN, Pixverse, and more).
const result = await ai.video({
provider: 'fal',
model: 'fal-ai/kling-video/v2/master/text-to-video',
prompt: 'A bird flying through clouds',
imageUrl: 'https://...', // optional — image-to-video
duration: 5, // optional — seconds
fps: 24, // optional
width: 1280, // optional
height: 720, // optional
});
console.log(result.url); // video URL
console.log(result.media?.duration); // actual duration
console.log(result.media?.fps); // frames per second
console.log(result.media?.format); // 'mp4'ai.speak(options): Promise<NoosphereResult>
Text-to-speech synthesis. Supports 50+ TTS models via FAL, HuggingFace, Piper, and Kokoro.
const result = await ai.speak({
provider: 'fal',
model: 'fal-ai/kokoro/american-english',
text: 'Hello world',
voice: 'af_heart', // optional — voice ID
language: 'en', // optional
speed: 1.0, // optional
format: 'mp3', // optional — 'mp3' | 'wav' | 'ogg'
});
console.log(result.buffer); // audio Buffer
console.log(result.url); // audio URL (FAL)Discovery Methods
ai.getProviders(modality?): Promise<ProviderInfo[]>
List available providers, optionally filtered by modality.
const providers = await ai.getProviders('llm');
// [{ id: 'pi-ai', name: 'Pi-AI', modalities: ['llm'], local: false, status: 'online', modelCount: 246 }]ai.getModels(modality?): Promise<ModelInfo[]>
List all available models with full metadata.
const models = await ai.getModels('image');
// Returns ModelInfo[] with id, provider, name, modality, local, cost, capabilitiesai.getModel(provider, modelId): Promise<ModelInfo | null>
Get details about a specific model.
ai.syncModels(): Promise<SyncResult>
Refresh model lists from all providers. Returns sync count, per-provider breakdown, and any errors.
Usage Tracking
ai.getUsage(options?): UsageSummary
Get aggregated usage statistics with optional filtering.
const usage = ai.getUsage({
since: '2024-01-01', // optional — ISO date or Date object
until: '2024-12-31', // optional
provider: 'openai', // optional — filter by provider
modality: 'llm', // optional — filter by modality
});
console.log(usage.totalCost); // total USD spent
console.log(usage.totalRequests); // number of requests
console.log(usage.byProvider); // { openai: 2.50, anthropic: 1.20, fal: 0.30 }
console.log(usage.byModality); // { llm: 3.00, image: 0.70, video: 0.30, tts: 0.00 }Lifecycle
ai.registerProvider(provider): void
Register a custom provider (see Custom Providers).
ai.dispose(): Promise<void>
Cleanup all provider resources, clear model cache, and reset usage tracker.
NoosphereResult
Every generation method returns a NoosphereResult:
interface NoosphereResult {
content?: string; // LLM response text
thinking?: string; // reasoning/thinking output (supported models)
url?: string; // media URL (images, videos, audio from cloud providers)
buffer?: Buffer; // media binary data (local providers, HuggingFace)
provider: string; // which provider handled the request
model: string; // which model was used
modality: Modality; // 'llm' | 'image' | 'video' | 'tts'
latencyMs: number; // request duration in milliseconds
usage: {
cost: number; // cost in USD
input?: number; // input tokens/characters
output?: number; // output tokens
unit?: string; // 'tokens' | 'characters' | 'per_image' | 'per_second' | 'free'
};
media?: {
width?: number; // image/video width
height?: number; // image/video height
duration?: number; // video/audio duration in seconds
format?: string; // 'png' | 'mp4' | 'mp3' | 'wav'
fps?: number; // video frames per second
};
}Providers In Depth
Pi-AI — LLM Gateway (246+ models)
Provider ID: pi-ai
Modalities: LLM (chat + streaming)
Library: @mariozechner/pi-ai
A unified gateway that routes to 8 LLM providers through 4 different API protocols:
| API Protocol | Providers |
|---|---|
anthropic-messages |
Anthropic |
google-generative-ai |
|
openai-responses |
OpenAI (reasoning models) |
openai-completions |
OpenAI, xAI, Groq, Cerebras, Zai, OpenRouter |
Anthropic Models (19)
| Model | Context | Reasoning | Vision | Input Cost | Output Cost |
|---|---|---|---|---|---|
claude-opus-4-0 |
200k | Yes | Yes | $15/M | $75/M |
claude-opus-4-1 |
200k | Yes | Yes | $15/M | $75/M |
claude-sonnet-4-20250514 |
200k | Yes | Yes | $3/M | $15/M |
claude-sonnet-4-5-20250929 |
200k | Yes | Yes | $3/M | $15/M |
claude-3-7-sonnet-20250219 |
200k | Yes | Yes | $3/M | $15/M |
claude-3-5-sonnet-20241022 |
200k | No | Yes | $3/M | $15/M |
claude-haiku-4-5-20251001 |
200k | No | Yes | $0.80/M | $4/M |
claude-3-5-haiku-20241022 |
200k | No | Yes | $0.80/M | $4/M |
claude-3-haiku-20240307 |
200k | No | Yes | $0.25/M | $1.25/M |
| ...and 10 more variants |
OpenAI Models (24)
| Model | Context | Reasoning | Vision | Input Cost | Output Cost |
|---|---|---|---|---|---|
gpt-5 |
200k | Yes | Yes | $10/M | $30/M |
gpt-5-mini |
200k | Yes | Yes | $2.50/M | $10/M |
gpt-4.1 |
128k | No | Yes | $2/M | $8/M |
gpt-4.1-mini |
128k | No | Yes | $0.40/M | $1.60/M |
gpt-4.1-nano |
128k | No | Yes | $0.10/M | $0.40/M |
gpt-4o |
128k | No | Yes | $2.50/M | $10/M |
gpt-4o-mini |
128k | No | Yes | $0.15/M | $0.60/M |
o3-pro |
200k | Yes | Yes | $20/M | $80/M |
o3-mini |
200k | Yes | Yes | $1.10/M | $4.40/M |
o4-mini |
200k | Yes | Yes | $1.10/M | $4.40/M |
codex-mini-latest |
200k | Yes | No | $1.50/M | $6/M |
| ...and 13 more variants |
Google Gemini Models (19)
| Model | Context | Reasoning | Vision | Cost |
|---|---|---|---|---|
gemini-2.5-flash |
1M | Yes | Yes | $0.15-0.60/M |
gemini-2.5-pro |
1M | Yes | Yes | $1.25-10/M |
gemini-2.0-flash |
1M | No | Yes | $0.10-0.40/M |
gemini-2.0-flash-lite |
1M | No | Yes | $0.025-0.10/M |
gemini-1.5-flash |
1M | No | Yes | $0.075-0.30/M |
gemini-1.5-pro |
2M | No | Yes | $1.25-5/M |
| ...and 13 more variants |
xAI Grok Models (20)
| Model | Context | Reasoning | Vision | Input Cost |
|---|---|---|---|---|
grok-4 |
256k | Yes | Yes | $5/M |
grok-4-fast |
256k | Yes | Yes | $3/M |
grok-3 |
131k | No | Yes | $3/M |
grok-3-fast |
131k | No | Yes | $5/M |
grok-3-mini-fast-latest |
131k | Yes | No | $0.30/M |
grok-2-vision |
32k | No | Yes | $2/M |
| ...and 14 more variants |
Groq Models (15)
| Model | Context | Cost |
|---|---|---|
llama-3.3-70b-versatile |
128k | $0.59/M |
llama-3.1-8b-instant |
128k | $0.05/M |
mistral-saba-24b |
32k | $0.40/M |
qwen-qwq-32b |
128k | $0.29/M |
deepseek-r1-distill-llama-70b |
128k | $0.75/M |
| ...and 10 more |
Cerebras Models (3)
gpt-oss-120b, qwen-3-235b-a22b-instruct-2507, qwen-3-coder-480b
Zai Models (5)
glm-4.6, glm-4.5, glm-4.5-flash, glm-4.5v, glm-4.5-air
OpenRouter (141 models)
Aggregator providing access to hundreds of additional models including Llama, Deepseek, Mistral, Qwen, and many more. Full list available via ai.getModels('llm').
The Pi-AI Engine — Deep Dive
Noosphere's LLM provider is powered by @mariozechner/pi-ai, part of the Pi mono-repo by Mario Zechner (badlogic). Pi is NOT a wrapper like LangChain or Mastra — it's a micro-framework for agentic AI (~15K LOC, 4 npm packages) that was built from scratch as a minimalist alternative to Claude Code.
Pi consists of 4 packages in 3 tiers:
TIER 1 — FOUNDATION
@mariozechner/pi-ai LLM API: stream(), complete(), model registry
0 internal deps, talks to 20+ providers
TIER 2 — INFRASTRUCTURE
@mariozechner/pi-agent-core Agent loop, tool execution, lifecycle events
Depends on pi-ai
@mariozechner/pi-tui Terminal UI with differential rendering
Standalone, 0 internal deps
TIER 3 — APPLICATION
@mariozechner/pi-coding-agent CLI + SDK: sessions, compaction, extensions
Depends on all aboveNoosphere uses @mariozechner/pi-ai (Tier 1) directly for LLM access. But the full Pi ecosystem provides capabilities that can be layered on top.
How Pi Keeps 200+ Models Updated
Pi does NOT hardcode models. It has an auto-generation pipeline that runs at build time:
STEP 1: FETCH (3 sources in parallel)
┌──────────────────┐ ┌──────────────────┐ ┌───────────────┐
│ models.dev │ │ OpenRouter │ │ Vercel AI │
│ /api.json │ │ /v1/models │ │ Gateway │
│ │ │ │ │ /v1/models │
│ Context windows │ │ Pricing ($/M) │ │ Capability │
│ Capabilities │ │ Availability │ │ tags │
│ Tool support │ │ Provider routing │ │ │
└────────┬─────────┘ └────────┬─────────┘ └──────┬────────┘
└─────────┬───────────┴────────────────────┘
▼
STEP 2: MERGE & DEDUPLICATE
Priority: models.dev > OpenRouter > Vercel
Key: provider + modelId
│
▼
STEP 3: FILTER
✅ tool_call === true
✅ streaming supported
✅ system messages supported
✅ not deprecated
│
▼
STEP 4: NORMALIZE
Costs → $/million tokens
API type → one of 4 protocols
Input modes → ["text"] or ["text","image"]
│
▼
STEP 5: PATCH (manual corrections)
Claude Opus: cache pricing fix
GPT-5.4: context window override
Kimi K2.5: hardcoded pricing
│
▼
STEP 6: GENERATE TypeScript
→ models.generated.ts (~330KB)
→ 200+ models with full type safetyEach generated model entry looks like:
{
id: "claude-opus-4-6",
name: "Claude Opus 4.6",
api: "anthropic-messages",
provider: "anthropic",
baseUrl: "https://api.anthropic.com",
reasoning: true,
input: ["text", "image"],
cost: {
input: 15, // $15/M tokens
output: 75, // $75/M tokens
cacheRead: 1.5, // prompt cache hit
cacheWrite: 18.75, // prompt cache write
},
contextWindow: 200_000,
maxTokens: 32_000,
} satisfies Model<"anthropic-messages">When a new model is released (e.g., Gemini 3.0), it appears in models.dev/OpenRouter → the script captures it → a new Pi version is published → Noosphere updates its dependency.
4 API Protocols — How Pi Talks to Every Provider
Pi abstracts all LLM providers into 4 wire protocols. Each protocol handles the differences in request format, streaming format, auth headers, and response parsing:
| Protocol | Providers | Key Differences |
|---|---|---|
anthropic-messages |
Anthropic, AWS Bedrock | system as top-level field, content as [{type:"text", text:"..."}] blocks, x-api-key auth, anthropic-beta headers |
openai-completions |
OpenAI, xAI, Groq, Cerebras, OpenRouter, Ollama, vLLM | system as message with role:"system", content as string, Authorization: Bearer auth, tool_calls array |
openai-responses |
OpenAI (reasoning models) | New Responses API with server-side context, store: true, reasoning summaries |
google-generative-ai |
Google Gemini, Vertex AI | systemInstruction.parts[{text}], role "model" instead of "assistant", functionCall instead of tool_calls, thinkingConfig |
The core function streamSimple() detects which protocol to use based on model.api and handles all the formatting/parsing transparently:
// What happens inside Pi when you call Noosphere's chat():
async function* streamSimple(
model: Model, // includes model.api to determine protocol
context: Context, // { systemPrompt, messages, tools }
options?: StreamOptions // { signal, onPayload, thinkingLevel, ... }
): AsyncIterable<AssistantMessageEvent> {
// 1. Format request according to model.api protocol
// 2. Open SSE/WebSocket stream
// 3. Parse provider-specific chunks
// 4. Emit normalized events:
// → text_delta, thinking_delta, tool_call, message_end
}Agentic Capabilities
These are the capabilities people get access to through the Pi-AI engine:
1. Tool Use / Function Calling
Full structured tool calling supported across all major providers. Tool definitions use TypeBox schemas with runtime validation via AJV:
import { type Tool, StringEnum } from '@mariozechner/pi-ai';
import { Type } from '@sinclair/typebox';
// Define a tool with typed parameters
const searchTool: Tool = {
name: 'web_search',
description: 'Search the web for information',
parameters: Type.Object({
query: Type.String({ description: 'Search query' }),
maxResults: Type.Optional(Type.Number({ default: 5 })),
type: StringEnum(['web', 'images', 'news'], { description: 'Search type' }),
}),
};
// Pass tools in context — Pi handles the rest
const context = {
systemPrompt: 'You are a helpful assistant.',
messages: [{ role: 'user', content: 'Search for recent AI news' }],
tools: [searchTool],
};How tool calling works internally:
User prompt → LLM → "I need to call web_search"
│
▼
Pi validates arguments with AJV
against the TypeBox schema
│
┌─────┴─────┐
│ Valid? │
├─Yes───────┤
│ Execute │
│ tool │
├───────────┤
│ No │
│ Return │
│ validation│
│ error to │
│ LLM │
└───────────┘
│
▼
Tool result → back into context → LLM continuesProvider-specific tool_choice control:
- Anthropic:
"auto" | "any" | "none" | { type: "tool", name: "specific_tool" } - OpenAI:
"auto" | "none" | "required" | { type: "function", function: { name: "..." } } - Google:
"auto" | "none" | "any"
Partial JSON streaming: During streaming, Pi parses tool call arguments incrementally using partial JSON parsing. This means you can see tool arguments being built in real-time, not just after the tool call completes.
2. Reasoning / Extended Thinking
Pi provides unified thinking support across all providers that support it. Thinking blocks are automatically extracted, separated from regular text, and streamed as distinct events:
| Provider | Models | Control Parameters | How It Works |
|---|---|---|---|
| Anthropic | Claude Opus, Sonnet 4+ | thinkingEnabled: boolean, thinkingBudgetTokens: number |
Extended thinking blocks in response, separate thinking content type |
| OpenAI | o1, o3, o4, GPT-5 | reasoningEffort: "minimal" | "low" | "medium" | "high" |
Reasoning via Responses API, reasoningSummary: "auto" | "detailed" | "concise" |
| Gemini 2.5 Flash/Pro | thinking.enabled: boolean, thinking.budgetTokens: number |
Thinking via thinkingConfig, mapped to effort levels |
|
| xAI | Grok-4, Grok-3-mini | Native reasoning | Automatic when model supports it |
Cross-provider thinking portability: When switching models mid-conversation, Pi converts thinking blocks between formats. Anthropic thinking blocks become <thinking> tagged text when sent to OpenAI/Google, and vice versa.
// Thinking is automatically extracted in Noosphere responses:
const result = await ai.chat({
model: 'claude-opus-4-6',
messages: [{ role: 'user', content: 'Solve this step by step: 15! / 13!' }],
});
console.log(result.thinking); // "Let me work through this... 15! = 15 × 14 × 13!..."
console.log(result.content); // "15! / 13! = 15 × 14 = 210"
// During streaming, thinking arrives as separate events:
const stream = ai.stream({ messages: [...] });
for await (const event of stream) {
if (event.type === 'thinking_delta') console.log('[THINKING]', event.delta);
if (event.type === 'text_delta') console.log('[RESPONSE]', event.delta);
}3. Vision / Multimodal Input
Models with input: ["text", "image"] accept images alongside text. Pi handles the encoding and format differences per provider:
// Send images to vision-capable models
const messages = [{
role: 'user',
content: [
{ type: 'text', text: 'What is in this image?' },
{ type: 'image', data: base64PngString, mimeType: 'image/png' },
],
}];
// Supported MIME types: image/png, image/jpeg, image/gif, image/webp
// Images are silently ignored when sent to non-vision modelsVision-capable models include: All Claude models, all GPT-4o/GPT-5 models, Gemini models, Grok-2-vision, Grok-4, and select Groq models.
4. Agent Loop — Autonomous Tool Execution
The @mariozechner/pi-agent-core package provides a complete agent loop that automatically cycles through prompt → LLM → tool call → result → repeat until the task is done:
import { agentLoop } from '@mariozechner/pi-ai';
const events = agentLoop(userMessage, agentContext, {
model: getModel('anthropic', 'claude-opus-4-6'),
tools: [searchTool, readFileTool, writeFileTool],
signal: abortController.signal,
});
for await (const event of events) {
switch (event.type) {
case 'agent_start': // Agent begins
case 'turn_start': // New LLM turn begins
case 'message_start': // LLM starts responding
case 'message_update': // Text/thinking delta received
case 'tool_execution_start': // About to execute a tool
case 'tool_execution_end': // Tool finished, result available
case 'message_end': // LLM finished this message
case 'turn_end': // Turn complete (may loop if tools were called)
case 'agent_end': // All done, final messages available
}
}The agent loop state machine:
[User sends prompt]
│
▼
┌─[Build Context]──▶ [Check Queues]──▶ [Stream LLM]◄── streamFn()
│ │
│ ┌─────┴──────┐
│ │ │
│ text tool_call
│ │ │
│ ▼ ▼
│ [Done] [Execute Tool]
│ │
│ tool result
│ │
└──────────────────────────────────────────────────┘
(loops back to Stream LLM)Key design decisions:
- Tools execute sequentially by default (parallelism can be added on top)
- The
streamFnis injectable — you can wrap it with middleware to modify requests per-provider - Tool arguments are validated at runtime using TypeBox + AJV before execution
- Aborted/failed responses preserve partial content and usage data
- Tool results are automatically added to the conversation context
5. The streamFn Pattern — Injectable Middleware
This is Pi's most powerful architectural feature. The streamFn is the function that actually talks to the LLM, and it can be wrapped with middleware like Express.js request handlers:
import type { StreamFn } from '@mariozechner/pi-agent-core';
import { streamSimple } from '@mariozechner/pi-ai';
// Start with Pi's base streaming function
let fn: StreamFn = streamSimple;
// Wrap it with middleware that modifies requests per-provider
fn = createMyCustomWrapper(fn, {
// Add custom headers for Anthropic
onPayload: (payload) => {
if (model.provider === 'anthropic') {
payload.headers['anthropic-beta'] = 'fine-grained-tool-streaming-2025-05-14';
}
},
});
// Each wrapper calls the previous one, forming a chain:
// request → wrapper3 → wrapper2 → wrapper1 → streamSimple → APIThis pattern is what allows projects like OpenClaw to stack 16 provider-specific wrappers on top of Pi's base streaming — adding beta headers for Anthropic, WebSocket transport for OpenAI, thinking sanitization for Google, reasoning effort headers for OpenRouter, and more — without modifying Pi's source code.
6. Session Management (via pi-coding-agent)
The @mariozechner/pi-coding-agent package provides persistent session management with JSONL-based storage:
import { createAgentSession, SessionManager } from '@mariozechner/pi-coding-agent';
// Create a session with full persistence
const session = await createAgentSession({
model: 'claude-opus-4-6',
tools: myTools,
sessionManager, // handles JSONL persistence
});
const result = await session.run('Build a REST API');
// Session is automatically saved to:
// ~/.pi/agent/sessions/session_abc123.jsonlSession file format (append-only JSONL):
{"role":"user","content":"Build a REST API","timestamp":1710000000}
{"role":"assistant","content":"I'll create...","model":"claude-opus-4-6","usage":{...}}
{"role":"toolResult","toolCallId":"tc_001","toolName":"bash","content":"OK"}
{"type":"compaction","summary":"The user asked to build...","preservedMessages":[...]}Session operations:
create()— new sessionopen(id)— restore existing sessioncontinueRecent()— continue the most recent sessionforkFrom(id)— create a branch (new JSONL referencing parent)inMemory()— RAM-only session (for SDK/testing)
7. Context Compaction — Automatic Context Window Management
When the conversation approaches the model's context window limit, Pi automatically compacts the history:
1. DETECT: Calculate inputTokens + outputTokens vs model.contextWindow
2. TRIGGER: Proactively before overflow, or as recovery after overflow error
3. SUMMARIZE: Send history to LLM with a compaction prompt
4. WRITE: Append compaction entry to JSONL:
{"type":"compaction","summary":"...","preservedMessages":[last N messages]}
5. CONTINUE: Context is now summary + recent messages instead of full historyThe JSONL file is never rewritten — compaction entries are appended, maintaining a complete audit trail.
8. Cost Tracking — Cache-Aware Pricing
Pi tracks costs per-request with cache-aware pricing for providers that support prompt caching:
// Every model has 4 cost dimensions:
{
input: 15, // $15 per 1M input tokens
output: 75, // $75 per 1M output tokens
cacheRead: 1.5, // $1.50 per 1M cached prompt tokens (read)
cacheWrite: 18.75, // $18.75 per 1M cached prompt tokens (write)
}
// Usage tracking on every response:
{
input: 1500, // tokens consumed as input
output: 800, // tokens generated
cacheRead: 5000, // prompt cache hits
cacheWrite: 1500, // prompt cache writes
cost: {
total: 0.082, // total cost in USD
input: 0.0225,
output: 0.06,
cacheRead: 0.0075,
cacheWrite: 0.028,
},
}Anthropic and OpenAI support prompt caching. For providers without caching, cacheRead and cacheWrite are always 0.
9. Extension System (via pi-coding-agent)
Pi supports a plugin system where extensions can register tools, commands, and lifecycle hooks:
// Extensions are TypeScript modules loaded at runtime via jiti
export default function(api: ExtensionAPI) {
// Register a custom tool
api.registerTool('my_tool', {
description: 'Does something useful',
parameters: { /* TypeBox schema */ },
execute: async (args) => 'result',
});
// Register a slash command
api.registerCommand('/mycommand', {
handler: async (args) => { /* ... */ },
description: 'Custom command',
});
// Hook into the agent lifecycle
api.on('before_agent_start', async (context) => {
context.systemPrompt += '\nExtra instructions';
});
api.on('tool_execution_end', async (event) => {
// Post-process tool results
});
}Resource discovery chain (priority):
- Project
.pi/directory (highest) - User
~/.pi/agent/ - npm packages with Pi metadata
- Built-in defaults
10. The Anti-MCP Philosophy — Why Pi Uses CLI Instead
Pi explicitly rejects MCP (Model Context Protocol). Mario Zechner's argument, backed by benchmarks:
The token cost problem:
| Approach | Tools | Tokens Consumed | % of Claude's Context |
|---|---|---|---|
| Playwright MCP | 21 tools | 13,700 tokens | 6.8% |
| Chrome DevTools MCP | 26 tools | 18,000 tokens | 9.0% |
| Pi CLI + README | N/A | 225 tokens | ~0.1% |
That's a 60-80x reduction in token consumption. With 5 MCP servers, you lose ~55,000 tokens before doing any work.
Benchmark results (120 evaluations):
| Approach | Avg Cost | Success Rate |
|---|---|---|
| CLI (tmux) | $0.37 | 100% |
| CLI (terminalcp) | $0.39 | 100% |
| MCP (terminalcp) | $0.48 | 100% |
Same success rate, MCP costs 30% more.
Pi's alternative: Progressive Disclosure via CLI tools + READMEs
Instead of loading all tool definitions upfront, Pi's agent has bash as a built-in tool and discovers CLI tools only when needed:
MCP approach: Pi approach:
───────────── ──────────
Session start → Session start →
Load 21 Playwright tools Load 4 tools: read, write, edit, bash
Load 26 Chrome DevTools tools (225 tokens)
Load N more MCP tools
(~55,000 tokens wasted)
When browser needed: When browser needed:
Tools already loaded Agent reads SKILL.md (225 tokens)
(but context is polluted) Runs: browser-start.js
Runs: browser-nav.js https://...
Runs: browser-screenshot.js
When browser NOT needed: When browser NOT needed:
Tools still consume context 0 tokens wastedThe 4 built-in tools (what Pi argues is sufficient):
| Tool | What It Does | Why It's Enough |
|---|---|---|
read |
Read files (text + images) | Supports offset/limit for large files |
write |
Create/overwrite files | Creates directories automatically |
edit |
Replace text (oldText→newText) | Surgical edits, like a diff |
bash |
Execute any shell command | bash can do everything else — replaces MCP entirely |
The key insight: bash replaces MCP. Any CLI tool, API call, database query, or system operation can be invoked through bash. The agent reads the tool's README only when it needs it, paying tokens on-demand instead of upfront.
FAL — Media Generation (867+ endpoints)
Provider ID: fal
Modalities: Image, Video, TTS
Library: @fal-ai/client
The largest media generation provider with dynamic pricing fetched at runtime from https://api.fal.ai/v1/models/pricing.
Image Models (200+)
FLUX Family (20+ variants):
| Model | Description |
|---|---|
fal-ai/flux/schnell |
Fast generation (default) |
fal-ai/flux/dev |
Higher quality |
fal-ai/flux-2 |
Next generation |
fal-ai/flux-2-pro |
Professional quality |
fal-ai/flux-2-flex |
Flexible variant |
fal-ai/flux-2/edit |
Image editing |
fal-ai/flux-2/lora |
LoRA fine-tuning |
fal-ai/flux-pro/v1.1-ultra |
Ultra high quality |
fal-ai/flux-pro/kontext |
Context-aware generation |
fal-ai/flux-lora |
Custom style training |
fal-ai/flux-vision-upscaler |
AI upscaling |
fal-ai/flux-krea-trainer |
Model training |
fal-ai/flux-lora-fast-training |
Fast fine-tuning |
fal-ai/flux-lora-portrait-trainer |
Portrait specialist |
Stable Diffusion:
fal-ai/stable-diffusion-v15, fal-ai/stable-diffusion-v35-large, fal-ai/stable-diffusion-v35-medium, fal-ai/stable-diffusion-v3-medium
Other Image Models:
| Model | Description |
|---|---|
fal-ai/recraft/v3/text-to-image |
Artistic generation |
fal-ai/ideogram/v2, v2a, v3 |
Ideogram series |
fal-ai/imagen3, fal-ai/imagen4/preview |
Google Imagen |
fal-ai/gpt-image-1 |
GPT image generation |
fal-ai/gpt-image-1/edit-image |
GPT image editing |
fal-ai/reve/text-to-image |
Reve generation |
fal-ai/sana, fal-ai/sana/sprint |
Sana models |
fal-ai/pixart-sigma |
PixArt Sigma |
fal-ai/bria/text-to-image/base |
Bria AI |
Pre-trained LoRA Styles:
fal-ai/flux-2-lora-gallery/sepia-vintage, virtual-tryon, satellite-view-style, realism, multiple-angles, hdr-style, face-to-full-portrait, digital-comic-art, ballpoint-pen-sketch, apartment-staging, add-background
Image Editing/Enhancement (30+ tools):
fal-ai/image-editing/age-progression, baby-version, background-change, hair-change, expression-change, object-removal, photo-restoration, style-transfer, and many more.
Video Models (150+)
Kling Video (20+ variants):
| Model | Description |
|---|---|
fal-ai/kling-video/v2/master/text-to-video |
Default text-to-video |
fal-ai/kling-video/v2/master/image-to-video |
Image-to-video |
fal-ai/kling-video/v2.5-turbo/pro/text-to-video |
Turbo pro |
fal-ai/kling-video/o1/image-to-video |
O1 quality |
fal-ai/kling-video/o1/video-to-video/edit |
Video editing |
fal-ai/kling-video/lipsync/audio-to-video |
Lip sync |
fal-ai/kling-video/video-to-audio |
Audio extraction |
Sora 2 (OpenAI):
| Model | Description |
|---|---|
fal-ai/sora-2/text-to-video |
Text-to-video |
fal-ai/sora-2/text-to-video/pro |
Pro quality |
fal-ai/sora-2/image-to-video |
Image-to-video |
fal-ai/sora-2/video-to-video/remix |
Video remixing |
VEO 3 (Google):
| Model | Description |
|---|---|
fal-ai/veo3 |
VEO 3 standard |
fal-ai/veo3/fast |
Fast variant |
fal-ai/veo3/image-to-video |
Image-to-video |
fal-ai/veo3.1 |
Latest version |
fal-ai/veo3.1/reference-to-video |
Reference-guided |
fal-ai/veo3.1/first-last-frame-to-video |
Frame interpolation |
WAN (15+ variants):
fal-ai/wan-pro/text-to-video, fal-ai/wan-pro/image-to-video, fal-ai/wan/v2.2-a14b/text-to-video, fal-ai/wan-vace-14b/depth, fal-ai/wan-vace-14b/inpainting, fal-ai/wan-vace-14b/pose, fal-ai/wan-effects
Pixverse (20+ variants):
fal-ai/pixverse/v5.5/text-to-video, fal-ai/pixverse/v5.5/image-to-video, fal-ai/pixverse/v5.5/effects, fal-ai/pixverse/lipsync, fal-ai/pixverse/sound-effects
Minimax / Hailuo:
fal-ai/minimax/hailuo-2.3/text-to-video/pro, fal-ai/minimax/hailuo-2.3/image-to-video/pro, fal-ai/minimax/video-01-director, fal-ai/minimax/video-01-live
Other Video Models:
| Provider | Models |
|---|---|
| Hunyuan | fal-ai/hunyuan-video/text-to-video, image-to-video, video-to-video, foley |
| Pika | fal-ai/pika/v2.2/text-to-video, pikascenes, pikaffects |
| LTX | fal-ai/ltx-2/text-to-video, image-to-video, retake-video |
| Luma | fal-ai/luma-dream-machine/ray-2, ray-2-flash, luma-photon |
| Vidu | fal-ai/vidu/q2/text-to-video, image-to-video/pro |
| CogVideoX | fal-ai/cogvideox-5b/text-to-video, video-to-video |
| Seedance | fal-ai/bytedance/seedance/v1/text-to-video, image-to-video |
| Magi | fal-ai/magi/text-to-video, extend-video |
TTS / Speech Models (50+)
Kokoro (9 languages, 20+ voices per language):
| Model | Language | Example Voices |
|---|---|---|
fal-ai/kokoro/american-english |
English (US) | af_heart, af_alloy, af_bella, af_nova, am_adam, am_echo, am_onyx |
fal-ai/kokoro/british-english |
English (UK) | British voice set |
fal-ai/kokoro/french |
French | French voice set |
fal-ai/kokoro/japanese |
Japanese | Japanese voice set |
fal-ai/kokoro/spanish |
Spanish | Spanish voice set |
fal-ai/kokoro/mandarin-chinese |
Chinese | Mandarin voice set |
fal-ai/kokoro/italian |
Italian | Italian voice set |
fal-ai/kokoro/hindi |
Hindi | Hindi voice set |
fal-ai/kokoro/brazilian-portuguese |
Portuguese | Portuguese voice set |
ElevenLabs:
| Model | Description |
|---|---|
fal-ai/elevenlabs/tts/eleven-v3 |
Professional quality |
fal-ai/elevenlabs/tts/turbo-v2.5 |
Faster inference |
fal-ai/elevenlabs/tts/multilingual-v2 |
Multi-language |
fal-ai/elevenlabs/text-to-dialogue/eleven-v3 |
Dialogue generation |
fal-ai/elevenlabs/sound-effects/v2 |
Sound effects |
fal-ai/elevenlabs/speech-to-text |
Transcription |
fal-ai/elevenlabs/audio-isolation |
Background removal |
Other TTS:
fal-ai/f5-tts (voice cloning), fal-ai/dia-tts, fal-ai/minimax/speech-2.6-turbo, fal-ai/minimax/speech-2.6-hd, fal-ai/chatterbox/text-to-speech, fal-ai/index-tts-2/text-to-speech
FAL Client Capabilities
The @fal-ai/client provides additional features beyond what Noosphere surfaces:
- Queue API — Submit jobs, poll status, get results, cancel. Supports webhooks and priority levels
- Streaming API — Real-time streaming responses via async iterators
- Realtime API — WebSocket connections for interactive use (e.g., real-time image generation)
- Storage API — File upload with configurable TTL (1h, 1d, 7d, 30d, 1y, never)
- Retry logic — Configurable retries with exponential backoff and jitter
- Request middleware — Custom request interceptors and proxy support
Hugging Face — Open Source AI (30+ tasks)
Provider ID: huggingface
Modalities: LLM, Image, TTS
Library: @huggingface/inference
Access to the entire Hugging Face Hub ecosystem. Any model hosted on HuggingFace can be used by passing its ID directly.
Default Models
| Modality | Default Model | Description |
|---|---|---|
| LLM | meta-llama/Llama-3.1-8B-Instruct |
Llama 3.1 8B |
| Image | stabilityai/stable-diffusion-xl-base-1.0 |
SDXL Base |
| TTS | facebook/mms-tts-eng |
MMS TTS English |
Any HuggingFace model ID works — just pass it as the model parameter:
await ai.chat({
provider: 'huggingface',
model: 'mistralai/Mixtral-8x7B-v0.1',
messages: [{ role: 'user', content: 'Hello' }],
});Full Library Capabilities
The @huggingface/inference library (v3.15.0) provides 30+ AI tasks, including capabilities not yet surfaced by Noosphere:
Natural Language Processing:
| Task | Method | Description |
|---|---|---|
| Chat | chatCompletion() |
OpenAI-compatible chat completions |
| Chat Streaming | chatCompletionStream() |
Token-by-token streaming |
| Text Generation | textGeneration() |
Raw text completion |
| Summarization | summarization() |
Text summarization |
| Translation | translation() |
Language translation |
| Question Answering | questionAnswering() |
Extract answers from context |
| Text Classification | textClassification() |
Sentiment, topic classification |
| Zero-Shot Classification | zeroShotClassification() |
Classify without training |
| Token Classification | tokenClassification() |
NER, POS tagging |
| Sentence Similarity | sentenceSimilarity() |
Semantic similarity scores |
| Feature Extraction | featureExtraction() |
Text embeddings |
| Fill Mask | fillMask() |
Fill in masked tokens |
| Table QA | tableQuestionAnswering() |
Answer questions about tables |
Computer Vision:
| Task | Method | Description |
|---|---|---|
| Text-to-Image | textToImage() |
Generate images from text |
| Image-to-Image | imageToImage() |
Transform/edit images |
| Image Captioning | imageToText() |
Describe images |
| Classification | imageClassification() |
Classify image content |
| Object Detection | objectDetection() |
Detect and locate objects |
| Segmentation | imageSegmentation() |
Pixel-level segmentation |
| Zero-Shot Image | zeroShotImageClassification() |
Classify without training |
| Text-to-Video | textToVideo() |
Generate videos |
Audio:
| Task | Method | Description |
|---|---|---|
| Text-to-Speech | textToSpeech() |
Generate speech |
| Speech-to-Text | automaticSpeechRecognition() |
Transcription |
| Audio Classification | audioClassification() |
Classify sounds |
| Audio-to-Audio | audioToAudio() |
Source separation, enhancement |
Multimodal:
| Task | Method | Description |
|---|---|---|
| Visual QA | visualQuestionAnswering() |
Answer questions about images |
| Document QA | documentQuestionAnswering() |
Answer questions about documents |
Tabular:
| Task | Method | Description |
|---|---|---|
| Classification | tabularClassification() |
Classify tabular data |
| Regression | tabularRegression() |
Predict continuous values |
HuggingFace Agentic Features
- Tool/Function Calling: Full support via
toolsparameter withtool_choicecontrol (auto/none/required) - JSON Schema Responses:
response_format: { type: 'json_schema', json_schema: {...} } - Reasoning:
reasoning_effortparameter (none/minimal/low/medium/high/xhigh) - Multimodal Input: Images via
image_urlcontent chunks in chat messages - 17 Inference Providers: Route through Groq, Together, Fireworks, Replicate, Cerebras, Cohere, and more
ComfyUI — Local Image Generation
Provider ID: comfyui
Modalities: Image, Video (planned)
Type: Local
Default Port: 8188
Connects to a local ComfyUI instance for Stable Diffusion workflows.
How It Works
- Clones a built-in txt2img workflow template (KSampler + SDXL pipeline)
- Injects your parameters (prompt, dimensions, seed, steps, guidance)
- POSTs the workflow to ComfyUI's
/promptendpoint - Polls
/history/{promptId}every second until completion (max 5 minutes) - Fetches the generated image from
/view - Returns a PNG buffer
Configuration
const ai = new Noosphere({
local: {
comfyui: {
enabled: true,
host: 'http://localhost',
port: 8188,
},
},
});Default Workflow
- Checkpoint:
sd_xl_base_1.0.safetensors - Sampler: euler with normal scheduler
- Default Steps: 20
- Default CFG/Guidance: 7
- Default Size: 1024x1024
- Max Size: 2048x2048
- Output: PNG
Models Exposed
| Model ID | Modality | Description |
|---|---|---|
comfyui-txt2img |
Image | Text-to-image via workflow |
comfyui-txt2vid |
Video | Planned (requires AnimateDiff workflow) |
Local TTS — Piper & Kokoro
Provider IDs: piper, kokoro
Modality: TTS
Type: Local
Connects to local OpenAI-compatible TTS servers.
Supported Engines
| Engine | Default Port | Health Check | Voice Discovery |
|---|---|---|---|
| Piper | 5500 | GET /health |
GET /voices |
| Kokoro | 5501 | GET /health |
GET /v1/models (fallback) |
API
Uses the OpenAI-compatible TTS endpoint:
POST /v1/audio/speech
{
"model": "tts-1",
"input": "Hello world",
"voice": "default",
"speed": 1.0,
"response_format": "mp3"
}Supports mp3, wav, and ogg formats. Returns audio as a Buffer.
Architecture
Provider Resolution (Local-First)
When you call a generation method without specifying a provider, Noosphere resolves one automatically:
- If
modelis specified withoutprovider→ looks up model in registry cache - If a
defaultis configured for the modality → uses that - Otherwise → local providers first, then cloud providers
resolveProvider(modality):
1. Check user-specified provider ID → return if found
2. Check configured defaults → return if found
3. Scan all providers:
→ Return first LOCAL provider supporting this modality
→ Fallback to first CLOUD provider
4. Throw NO_PROVIDER errorRetry & Failover Logic
executeWithRetry(modality, provider, fn):
for attempt = 0..maxRetries:
try: return fn()
catch:
if error is retryable AND attempts remain:
wait backoffMs * 2^attempt (exponential backoff)
retry same provider
if error is NOT GENERATION_FAILED AND failover enabled:
try each alternative provider for this modality
throw last errorRetryable errors (same provider): PROVIDER_UNAVAILABLE, RATE_LIMITED, TIMEOUT, GENERATION_FAILED
Failover-eligible errors (cross-provider): PROVIDER_UNAVAILABLE, RATE_LIMITED, TIMEOUT (NOT GENERATION_FAILED)
Model Registry & Caching
- Models are fetched from providers via
listModels()and cached in memory - Cache TTL is configurable (default: 60 minutes)
syncModels()forces a refresh of all provider model lists- Registry tracks model → provider mappings for fast resolution
Usage Tracking
Every API call (success or failure) records a UsageEvent:
interface UsageEvent {
modality: 'llm' | 'image' | 'video' | 'tts';
provider: string;
model: string;
cost: number; // USD
latencyMs: number;
input?: number; // tokens or characters
output?: number; // tokens
unit?: string;
timestamp: string; // ISO 8601
success: boolean;
error?: string; // error message if failed
metadata?: Record<string, unknown>;
}Error Handling
All errors are instances of NoosphereError:
import { NoosphereError } from 'noosphere';
try {
await ai.chat({ messages: [{ role: 'user', content: 'Hello' }] });
} catch (err) {
if (err instanceof NoosphereError) {
console.log(err.code); // error code
console.log(err.provider); // which provider failed
console.log(err.modality); // which modality
console.log(err.model); // which model (if known)
console.log(err.cause); // underlying error
console.log(err.isRetryable()); // whether retry might help
}
}Error Codes
| Code | Description | Retryable | Failover |
|---|---|---|---|
PROVIDER_UNAVAILABLE |
Provider is down or unreachable | Yes | Yes |
RATE_LIMITED |
API rate limit exceeded | Yes | Yes |
TIMEOUT |
Request exceeded timeout | Yes | Yes |
GENERATION_FAILED |
Generation error (bad prompt, model issue) | Yes | No |
AUTH_FAILED |
Invalid or missing API key | No | No |
MODEL_NOT_FOUND |
Requested model doesn't exist | No | No |
INVALID_INPUT |
Bad parameters or unsupported operation | No | No |
NO_PROVIDER |
No provider available for the requested modality | No | No |
Custom Providers
Extend Noosphere with your own providers:
import type { NoosphereProvider, ModelInfo, ChatOptions, NoosphereResult, Modality } from 'noosphere';
const myProvider: NoosphereProvider = {
// Required properties
id: 'my-provider',
name: 'My Custom Provider',
modalities: ['llm', 'image'] as Modality[],
isLocal: false,
// Required methods
async ping() { return true; },
async listModels(modality?: Modality): Promise<ModelInfo[]> {
return [{
id: 'my-model',
provider: 'my-provider',
name: 'My Model',
modality: 'llm',
local: false,
cost: { price: 1.0, unit: 'per_1m_tokens' },
capabilities: {
contextWindow: 128000,
maxTokens: 4096,
supportsVision: false,
supportsStreaming: true,
},
}];
},
// Optional methods — implement per modality
async chat(options: ChatOptions): Promise<NoosphereResult> {
const start = Date.now();
// ... your implementation
return {
content: 'Response text',
provider: 'my-provider',
model: 'my-model',
modality: 'llm',
latencyMs: Date.now() - start,
usage: { cost: 0.001, input: 100, output: 50, unit: 'tokens' },
};
},
// stream?(options): NoosphereStream
// image?(options): Promise<NoosphereResult>
// video?(options): Promise<NoosphereResult>
// speak?(options): Promise<NoosphereResult>
// dispose?(): Promise<void>
};
ai.registerProvider(myProvider);Provider Summary
| Provider | ID | Modalities | Type | Models | Library |
|---|---|---|---|---|---|
| Pi-AI Gateway | pi-ai |
LLM | Cloud | 246+ | @mariozechner/pi-ai |
| FAL.ai | fal |
Image, Video, TTS | Cloud | 867+ | @fal-ai/client |
| Hugging Face | huggingface |
LLM, Image, TTS | Cloud | Unlimited (any HF model) | @huggingface/inference |
| ComfyUI | comfyui |
Image | Local | SDXL workflows | Direct HTTP |
| Piper TTS | piper |
TTS | Local | Piper voices | Direct HTTP |
| Kokoro TTS | kokoro |
TTS | Local | Kokoro voices | Direct HTTP |
Requirements
- Node.js >= 18.0.0
License
MIT