Package Exports

noosphere

Readme

noosphere

Unified AI creation engine — text, image, video, and audio generation across all providers through a single interface.

One import. Every model. Every modality.

Features

4 modalities — LLM chat, image generation, video generation, and text-to-speech
246+ LLM models — via Pi-AI gateway (OpenAI, Anthropic, Google, Groq, Mistral, xAI, Cerebras, OpenRouter)
867+ media endpoints — via FAL (Flux, SDXL, Kling, Sora 2, VEO 3, Kokoro, ElevenLabs, and hundreds more)
30+ HuggingFace tasks — LLM, image, TTS, translation, summarization, classification, and more
Local-first architecture — Auto-detects ComfyUI, Ollama, Piper, and Kokoro on your machine
Agentic capabilities — Tool use, function calling, reasoning/thinking, vision, and agent loops via Pi-AI
Failover & retry — Automatic retries with exponential backoff and cross-provider failover
Usage tracking — Real-time cost, latency, and token tracking across all providers
TypeScript-first — Full type definitions with ESM and CommonJS support

Install

npm install noosphere

Quick Start

import { Noosphere } from 'noosphere';

const ai = new Noosphere();

// Chat with any LLM
const response = await ai.chat({
  messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.content);

// Generate an image
const image = await ai.image({
  prompt: 'A sunset over mountains',
  width: 1024,
  height: 1024,
});
console.log(image.url);

// Generate a video
const video = await ai.video({
  prompt: 'Ocean waves crashing on rocks',
  duration: 5,
});
console.log(video.url);

// Text-to-speech
const audio = await ai.speak({
  text: 'Welcome to Noosphere',
  voice: 'alloy',
  format: 'mp3',
});
// audio.buffer contains the audio data

Configuration

API keys are resolved from the constructor config or environment variables (config takes priority):

const ai = new Noosphere({
  keys: {
    openai: 'sk-...',
    anthropic: 'sk-ant-...',
    google: 'AIza...',
    fal: 'fal-...',
    huggingface: 'hf_...',
    groq: 'gsk_...',
    mistral: '...',
    xai: '...',
    openrouter: 'sk-or-...',
  },
});

Or set environment variables:

Variable	Provider
`OPENAI_API_KEY`	OpenAI
`ANTHROPIC_API_KEY`	Anthropic
`GEMINI_API_KEY`	Google Gemini
`FAL_KEY`	FAL.ai
`HUGGINGFACE_TOKEN`	Hugging Face
`GROQ_API_KEY`	Groq
`MISTRAL_API_KEY`	Mistral
`XAI_API_KEY`	xAI (Grok)
`OPENROUTER_API_KEY`	OpenRouter

Full Configuration Reference

const ai = new Noosphere({
  // API keys (or use env vars above)
  keys: { /* ... */ },

  // Default models per modality
  defaults: {
    llm: { provider: 'pi-ai', model: 'claude-sonnet-4-20250514' },
    image: { provider: 'fal', model: 'fal-ai/flux/schnell' },
    video: { provider: 'fal', model: 'fal-ai/kling-video/v2/master/text-to-video' },
    tts: { provider: 'fal', model: 'fal-ai/kokoro/american-english' },
  },

  // Local service configuration
  autoDetectLocal: true,  // env: NOOSPHERE_AUTO_DETECT_LOCAL
  local: {
    ollama: { enabled: true, host: 'http://localhost', port: 11434 },
    comfyui: { enabled: true, host: 'http://localhost', port: 8188 },
    piper: { enabled: true, host: 'http://localhost', port: 5500 },
    kokoro: { enabled: true, host: 'http://localhost', port: 5501 },
    custom: [],  // additional LocalServiceConfig[]
  },

  // Retry & failover
  retry: {
    maxRetries: 2,           // default: 2
    backoffMs: 1000,         // default: 1000 (exponential: 1s, 2s, 4s...)
    failover: true,          // default: true — try other providers on failure
    retryableErrors: ['PROVIDER_UNAVAILABLE', 'RATE_LIMITED', 'TIMEOUT'],
  },

  // Timeouts per modality (ms)
  timeout: {
    llm: 30000,    // 30s
    image: 120000, // 2min
    video: 300000, // 5min
    tts: 60000,    // 1min
  },

  // Model discovery cache (minutes)
  discoveryCacheTTL: 60,  // env: NOOSPHERE_DISCOVERY_CACHE_TTL

  // Real-time usage callback
  onUsage: (event) => {
    console.log(`${event.provider}/${event.model}: $${event.cost} (${event.latencyMs}ms)`);
  },
});

Local Service Environment Variables

Variable	Default	Description
`OLLAMA_HOST`	`http://localhost`	Ollama server host
`OLLAMA_PORT`	`11434`	Ollama server port
`COMFYUI_HOST`	`http://localhost`	ComfyUI server host
`COMFYUI_PORT`	`8188`	ComfyUI server port
`PIPER_HOST`	`http://localhost`	Piper TTS server host
`PIPER_PORT`	`5500`	Piper TTS server port
`KOKORO_HOST`	`http://localhost`	Kokoro TTS server host
`KOKORO_PORT`	`5501`	Kokoro TTS server port
`NOOSPHERE_AUTO_DETECT_LOCAL`	`true`	Enable/disable local service auto-detection
`NOOSPHERE_DISCOVERY_CACHE_TTL`	`60`	Model cache TTL in minutes

API Reference

`new Noosphere(config?)`

Creates a new instance. Providers are initialized lazily on first API call. Auto-detects local services via HTTP pings (2s timeout each).

Generation Methods

`ai.chat(options): Promise<NoosphereResult>`

Generate text with any LLM. Supports 246+ models across 8 providers.

const result = await ai.chat({
  provider: 'anthropic',                // optional — auto-resolved if omitted
  model: 'claude-sonnet-4-20250514',    // optional — uses default or first available
  messages: [
    { role: 'system', content: 'You are helpful.' },
    { role: 'user', content: 'Explain quantum computing' },
  ],
  temperature: 0.7,     // optional (0-2)
  maxTokens: 1024,      // optional
  jsonMode: false,       // optional
});

console.log(result.content);          // response text
console.log(result.thinking);         // reasoning output (Claude, GPT-5, o3, Gemini, Grok-4)
console.log(result.usage.cost);       // cost in USD
console.log(result.usage.input);      // input tokens
console.log(result.usage.output);     // output tokens
console.log(result.latencyMs);        // response time in ms

`ai.stream(options): NoosphereStream`

Stream LLM responses token-by-token. Same options as chat().

const stream = ai.stream({
  messages: [{ role: 'user', content: 'Write a story' }],
});

for await (const event of stream) {
  switch (event.type) {
    case 'text_delta':
      process.stdout.write(event.delta!);
      break;
    case 'thinking_delta':
      console.log('[thinking]', event.delta);
      break;
    case 'done':
      console.log('\n\nUsage:', event.result!.usage);
      break;
    case 'error':
      console.error(event.error);
      break;
  }
}

// Or consume the full result
const result = await stream.result();

// Abort at any time
stream.abort();

`ai.image(options): Promise<NoosphereResult>`

Generate images. Supports 200+ image models via FAL, HuggingFace, and ComfyUI.

const result = await ai.image({
  provider: 'fal',                              // optional
  model: 'fal-ai/flux-2-pro',                   // optional
  prompt: 'A futuristic cityscape at sunset',
  negativePrompt: 'blurry, low quality',         // optional
  width: 1024,                                   // optional
  height: 768,                                   // optional
  seed: 42,                                      // optional — reproducible results
  steps: 30,                                     // optional — inference steps (more = higher quality)
  guidanceScale: 7.5,                            // optional — prompt adherence (higher = stricter)
});

console.log(result.url);                // image URL (FAL)
console.log(result.buffer);             // image Buffer (HuggingFace, ComfyUI)
console.log(result.media?.width);       // actual dimensions
console.log(result.media?.height);
console.log(result.media?.format);      // 'png'

`ai.video(options): Promise<NoosphereResult>`

Generate videos. Supports 150+ video models via FAL (Kling, Sora 2, VEO 3, WAN, Pixverse, and more).

const result = await ai.video({
  provider: 'fal',
  model: 'fal-ai/kling-video/v2/master/text-to-video',
  prompt: 'A bird flying through clouds',
  imageUrl: 'https://...',    // optional — image-to-video
  duration: 5,                // optional — seconds
  fps: 24,                    // optional
  width: 1280,                // optional
  height: 720,                // optional
});

console.log(result.url);                // video URL
console.log(result.media?.duration);    // actual duration
console.log(result.media?.fps);         // frames per second
console.log(result.media?.format);      // 'mp4'

`ai.speak(options): Promise<NoosphereResult>`

Text-to-speech synthesis. Supports 50+ TTS models via FAL, HuggingFace, Piper, and Kokoro.

const result = await ai.speak({
  provider: 'fal',
  model: 'fal-ai/kokoro/american-english',
  text: 'Hello world',
  voice: 'af_heart',        // optional — voice ID
  language: 'en',            // optional
  speed: 1.0,                // optional
  format: 'mp3',             // optional — 'mp3' | 'wav' | 'ogg'
});

console.log(result.buffer);  // audio Buffer
console.log(result.url);     // audio URL (FAL)

Discovery Methods

`ai.getProviders(modality?): Promise<ProviderInfo[]>`

List available providers, optionally filtered by modality.

const providers = await ai.getProviders('llm');
// [{ id: 'pi-ai', name: 'Pi-AI', modalities: ['llm'], local: false, status: 'online', modelCount: 246 }]

`ai.getModels(modality?): Promise<ModelInfo[]>`

List all available models with full metadata.

const models = await ai.getModels('image');
// Returns ModelInfo[] with id, provider, name, modality, local, cost, capabilities

`ai.getModel(provider, modelId): Promise<ModelInfo | null>`

Get details about a specific model.

`ai.syncModels(): Promise<SyncResult>`

Refresh model lists from all providers. Returns sync count, per-provider breakdown, and any errors.

Usage Tracking

`ai.getUsage(options?): UsageSummary`

Get aggregated usage statistics with optional filtering.

const usage = ai.getUsage({
  since: '2024-01-01',    // optional — ISO date or Date object
  until: '2024-12-31',    // optional
  provider: 'openai',     // optional — filter by provider
  modality: 'llm',        // optional — filter by modality
});

console.log(usage.totalCost);        // total USD spent
console.log(usage.totalRequests);    // number of requests
console.log(usage.byProvider);       // { openai: 2.50, anthropic: 1.20, fal: 0.30 }
console.log(usage.byModality);       // { llm: 3.00, image: 0.70, video: 0.30, tts: 0.00 }

Lifecycle

`ai.registerProvider(provider): void`

`ai.dispose(): Promise<void>`

Cleanup all provider resources, clear model cache, and reset usage tracker.

NoosphereResult

Every generation method returns a NoosphereResult:

interface NoosphereResult {
  content?: string;        // LLM response text
  thinking?: string;       // reasoning/thinking output (supported models)
  url?: string;            // media URL (images, videos, audio from cloud providers)
  buffer?: Buffer;         // media binary data (local providers, HuggingFace)
  provider: string;        // which provider handled the request
  model: string;           // which model was used
  modality: Modality;      // 'llm' | 'image' | 'video' | 'tts'
  latencyMs: number;       // request duration in milliseconds
  usage: {
    cost: number;          // cost in USD
    input?: number;        // input tokens/characters
    output?: number;       // output tokens
    unit?: string;         // 'tokens' | 'characters' | 'per_image' | 'per_second' | 'free'
  };
  media?: {
    width?: number;        // image/video width
    height?: number;       // image/video height
    duration?: number;     // video/audio duration in seconds
    format?: string;       // 'png' | 'mp4' | 'mp3' | 'wav'
    fps?: number;          // video frames per second
  };
}

Providers In Depth

Pi-AI — LLM Gateway (246+ models)

Provider ID: pi-ai Modalities: LLM (chat + streaming) Library: @mariozechner/pi-ai

A unified gateway that routes to 8 LLM providers through 4 different API protocols:

API Protocol	Providers
`anthropic-messages`	Anthropic
`google-generative-ai`	Google
`openai-responses`	OpenAI (reasoning models)
`openai-completions`	OpenAI, xAI, Groq, Cerebras, Zai, OpenRouter

Anthropic Models (19)

Model	Context	Reasoning	Vision	Input Cost	Output Cost
`claude-opus-4-0`	200k	Yes	Yes	$15/M	$75/M
`claude-opus-4-1`	200k	Yes	Yes	$15/M	$75/M
`claude-sonnet-4-20250514`	200k	Yes	Yes	$3/M	$15/M
`claude-sonnet-4-5-20250929`	200k	Yes	Yes	$3/M	$15/M
`claude-3-7-sonnet-20250219`	200k	Yes	Yes	$3/M	$15/M
`claude-3-5-sonnet-20241022`	200k	No	Yes	$3/M	$15/M
`claude-haiku-4-5-20251001`	200k	No	Yes	$0.80/M	$4/M
`claude-3-5-haiku-20241022`	200k	No	Yes	$0.80/M	$4/M
`claude-3-haiku-20240307`	200k	No	Yes	$0.25/M	$1.25/M
...and 10 more variants

OpenAI Models (24)

Model	Context	Reasoning	Vision	Input Cost	Output Cost
`gpt-5`	200k	Yes	Yes	$10/M	$30/M
`gpt-5-mini`	200k	Yes	Yes	$2.50/M	$10/M
`gpt-4.1`	128k	No	Yes	$2/M	$8/M
`gpt-4.1-mini`	128k	No	Yes	$0.40/M	$1.60/M
`gpt-4.1-nano`	128k	No	Yes	$0.10/M	$0.40/M
`gpt-4o`	128k	No	Yes	$2.50/M	$10/M
`gpt-4o-mini`	128k	No	Yes	$0.15/M	$0.60/M
`o3-pro`	200k	Yes	Yes	$20/M	$80/M
`o3-mini`	200k	Yes	Yes	$1.10/M	$4.40/M
`o4-mini`	200k	Yes	Yes	$1.10/M	$4.40/M
`codex-mini-latest`	200k	Yes	No	$1.50/M	$6/M
...and 13 more variants

Google Gemini Models (19)

Model	Context	Reasoning	Vision	Cost
`gemini-2.5-flash`	1M	Yes	Yes	$0.15-0.60/M
`gemini-2.5-pro`	1M	Yes	Yes	$1.25-10/M
`gemini-2.0-flash`	1M	No	Yes	$0.10-0.40/M
`gemini-2.0-flash-lite`	1M	No	Yes	$0.025-0.10/M
`gemini-1.5-flash`	1M	No	Yes	$0.075-0.30/M
`gemini-1.5-pro`	2M	No	Yes	$1.25-5/M
...and 13 more variants

xAI Grok Models (20)

Model	Context	Reasoning	Vision	Input Cost
`grok-4`	256k	Yes	Yes	$5/M
`grok-4-fast`	256k	Yes	Yes	$3/M
`grok-3`	131k	No	Yes	$3/M
`grok-3-fast`	131k	No	Yes	$5/M
`grok-3-mini-fast-latest`	131k	Yes	No	$0.30/M
`grok-2-vision`	32k	No	Yes	$2/M
...and 14 more variants

Groq Models (15)

Model	Context	Cost
`llama-3.3-70b-versatile`	128k	$0.59/M
`llama-3.1-8b-instant`	128k	$0.05/M
`mistral-saba-24b`	32k	$0.40/M
`qwen-qwq-32b`	128k	$0.29/M
`deepseek-r1-distill-llama-70b`	128k	$0.75/M
...and 10 more

Cerebras Models (3)

gpt-oss-120b, qwen-3-235b-a22b-instruct-2507, qwen-3-coder-480b

Zai Models (5)

glm-4.6, glm-4.5, glm-4.5-flash, glm-4.5v, glm-4.5-air

OpenRouter (141 models)

Aggregator providing access to hundreds of additional models including Llama, Deepseek, Mistral, Qwen, and many more. Full list available via ai.getModels('llm').

The Pi-AI Engine — Deep Dive

Noosphere's LLM provider is powered by @mariozechner/pi-ai, part of the Pi mono-repo by Mario Zechner (badlogic). Pi is NOT a wrapper like LangChain or Mastra — it's a micro-framework for agentic AI (~15K LOC, 4 npm packages) that was built from scratch as a minimalist alternative to Claude Code.

Pi consists of 4 packages in 3 tiers:

TIER 1 — FOUNDATION
  @mariozechner/pi-ai             LLM API: stream(), complete(), model registry
                                  0 internal deps, talks to 20+ providers

TIER 2 — INFRASTRUCTURE
  @mariozechner/pi-agent-core     Agent loop, tool execution, lifecycle events
                                  Depends on pi-ai

  @mariozechner/pi-tui            Terminal UI with differential rendering
                                  Standalone, 0 internal deps

TIER 3 — APPLICATION
  @mariozechner/pi-coding-agent   CLI + SDK: sessions, compaction, extensions
                                  Depends on all above

Noosphere uses @mariozechner/pi-ai (Tier 1) directly for LLM access. But the full Pi ecosystem provides capabilities that can be layered on top.

How Pi Keeps 200+ Models Updated

Pi does NOT hardcode models. It has an auto-generation pipeline that runs at build time:

STEP 1: FETCH (3 sources in parallel)
┌──────────────────┐  ┌──────────────────┐  ┌───────────────┐
│   models.dev     │  │   OpenRouter     │  │  Vercel AI    │
│   /api.json      │  │   /v1/models     │  │  Gateway      │
│                  │  │                  │  │  /v1/models   │
│ Context windows  │  │ Pricing ($/M)    │  │ Capability    │
│ Capabilities     │  │ Availability     │  │ tags          │
│ Tool support     │  │ Provider routing │  │               │
└────────┬─────────┘  └────────┬─────────┘  └──────┬────────┘
         └─────────┬───────────┴────────────────────┘
                   ▼
STEP 2: MERGE & DEDUPLICATE
         Priority: models.dev > OpenRouter > Vercel
         Key: provider + modelId
                   │
                   ▼
STEP 3: FILTER
         ✅ tool_call === true
         ✅ streaming supported
         ✅ system messages supported
         ✅ not deprecated
                   │
                   ▼
STEP 4: NORMALIZE
         Costs → $/million tokens
         API type → one of 4 protocols
         Input modes → ["text"] or ["text","image"]
                   │
                   ▼
STEP 5: PATCH (manual corrections)
         Claude Opus: cache pricing fix
         GPT-5.4: context window override
         Kimi K2.5: hardcoded pricing
                   │
                   ▼
STEP 6: GENERATE TypeScript
         → models.generated.ts (~330KB)
         → 200+ models with full type safety

Each generated model entry looks like:

{
  id: "claude-opus-4-6",
  name: "Claude Opus 4.6",
  api: "anthropic-messages",
  provider: "anthropic",
  baseUrl: "https://api.anthropic.com",
  reasoning: true,
  input: ["text", "image"],
  cost: {
    input: 15,          // $15/M tokens
    output: 75,         // $75/M tokens
    cacheRead: 1.5,     // prompt cache hit
    cacheWrite: 18.75,  // prompt cache write
  },
  contextWindow: 200_000,
  maxTokens: 32_000,
} satisfies Model<"anthropic-messages">

When a new model is released (e.g., Gemini 3.0), it appears in models.dev/OpenRouter → the script captures it → a new Pi version is published → Noosphere updates its dependency.

4 API Protocols — How Pi Talks to Every Provider

Pi abstracts all LLM providers into 4 wire protocols. Each protocol handles the differences in request format, streaming format, auth headers, and response parsing:

Protocol	Providers	Key Differences
`anthropic-messages`	Anthropic, AWS Bedrock	`system` as top-level field, content as `[{type:"text", text:"..."}]` blocks, `x-api-key` auth, `anthropic-beta` headers
`openai-completions`	OpenAI, xAI, Groq, Cerebras, OpenRouter, Ollama, vLLM	`system` as message with `role:"system"`, content as string, `Authorization: Bearer` auth, `tool_calls` array
`openai-responses`	OpenAI (reasoning models)	New Responses API with server-side context, `store: true`, reasoning summaries
`google-generative-ai`	Google Gemini, Vertex AI	`systemInstruction.parts[{text}]`, role `"model"` instead of `"assistant"`, `functionCall` instead of `tool_calls`, `thinkingConfig`

The core function streamSimple() detects which protocol to use based on model.api and handles all the formatting/parsing transparently:

// What happens inside Pi when you call Noosphere's chat():
async function* streamSimple(
  model: Model,           // includes model.api to determine protocol
  context: Context,       // { systemPrompt, messages, tools }
  options?: StreamOptions  // { signal, onPayload, thinkingLevel, ... }
): AsyncIterable<AssistantMessageEvent> {
  // 1. Format request according to model.api protocol
  // 2. Open SSE/WebSocket stream
  // 3. Parse provider-specific chunks
  // 4. Emit normalized events:
  //    → text_delta, thinking_delta, tool_call, message_end
}

Agentic Capabilities

These are the capabilities people get access to through the Pi-AI engine:

1. Tool Use / Function Calling

Full structured tool calling supported across all major providers. Tool definitions use TypeBox schemas with runtime validation via AJV:

import { type Tool, StringEnum } from '@mariozechner/pi-ai';
import { Type } from '@sinclair/typebox';

// Define a tool with typed parameters
const searchTool: Tool = {
  name: 'web_search',
  description: 'Search the web for information',
  parameters: Type.Object({
    query: Type.String({ description: 'Search query' }),
    maxResults: Type.Optional(Type.Number({ default: 5 })),
    type: StringEnum(['web', 'images', 'news'], { description: 'Search type' }),
  }),
};

// Pass tools in context — Pi handles the rest
const context = {
  systemPrompt: 'You are a helpful assistant.',
  messages: [{ role: 'user', content: 'Search for recent AI news' }],
  tools: [searchTool],
};

How tool calling works internally:

User prompt → LLM → "I need to call web_search"
                         │
                         ▼
              Pi validates arguments with AJV
              against the TypeBox schema
                         │
                   ┌─────┴─────┐
                   │ Valid?     │
                   ├─Yes───────┤
                   │ Execute   │
                   │ tool      │
                   ├───────────┤
                   │ No        │
                   │ Return    │
                   │ validation│
                   │ error to  │
                   │ LLM       │
                   └───────────┘
                         │
                         ▼
              Tool result → back into context → LLM continues

Provider-specific tool_choice control:

Anthropic: "auto" | "any" | "none" | { type: "tool", name: "specific_tool" }
OpenAI: "auto" | "none" | "required" | { type: "function", function: { name: "..." } }
Google: "auto" | "none" | "any"

Partial JSON streaming: During streaming, Pi parses tool call arguments incrementally using partial JSON parsing. This means you can see tool arguments being built in real-time, not just after the tool call completes.

2. Reasoning / Extended Thinking

Pi provides unified thinking support across all providers that support it. Thinking blocks are automatically extracted, separated from regular text, and streamed as distinct events:

Provider	Models	Control Parameters	How It Works
Anthropic	Claude Opus, Sonnet 4+	`thinkingEnabled: boolean`, `thinkingBudgetTokens: number`	Extended thinking blocks in response, separate `thinking` content type
OpenAI	o1, o3, o4, GPT-5	`reasoningEffort: "minimal" \| "low" \| "medium" \| "high"`	Reasoning via Responses API, `reasoningSummary: "auto" \| "detailed" \| "concise"`
Google	Gemini 2.5 Flash/Pro	`thinking.enabled: boolean`, `thinking.budgetTokens: number`	Thinking via `thinkingConfig`, mapped to effort levels
xAI	Grok-4, Grok-3-mini	Native reasoning	Automatic when model supports it

Cross-provider thinking portability: When switching models mid-conversation, Pi converts thinking blocks between formats. Anthropic thinking blocks become <thinking> tagged text when sent to OpenAI/Google, and vice versa.

// Thinking is automatically extracted in Noosphere responses:
const result = await ai.chat({
  model: 'claude-opus-4-6',
  messages: [{ role: 'user', content: 'Solve this step by step: 15! / 13!' }],
});

console.log(result.thinking);  // "Let me work through this... 15! = 15 × 14 × 13!..."
console.log(result.content);   // "15! / 13! = 15 × 14 = 210"

// During streaming, thinking arrives as separate events:
const stream = ai.stream({ messages: [...] });
for await (const event of stream) {
  if (event.type === 'thinking_delta') console.log('[THINKING]', event.delta);
  if (event.type === 'text_delta') console.log('[RESPONSE]', event.delta);
}

3. Vision / Multimodal Input

Models with input: ["text", "image"] accept images alongside text. Pi handles the encoding and format differences per provider:

// Send images to vision-capable models
const messages = [{
  role: 'user',
  content: [
    { type: 'text', text: 'What is in this image?' },
    { type: 'image', data: base64PngString, mimeType: 'image/png' },
  ],
}];

// Supported MIME types: image/png, image/jpeg, image/gif, image/webp
// Images are silently ignored when sent to non-vision models

Vision-capable models include: All Claude models, all GPT-4o/GPT-5 models, Gemini models, Grok-2-vision, Grok-4, and select Groq models.

4. Agent Loop — Autonomous Tool Execution

The @mariozechner/pi-agent-core package provides a complete agent loop that automatically cycles through prompt → LLM → tool call → result → repeat until the task is done:

import { agentLoop } from '@mariozechner/pi-ai';

const events = agentLoop(userMessage, agentContext, {
  model: getModel('anthropic', 'claude-opus-4-6'),
  tools: [searchTool, readFileTool, writeFileTool],
  signal: abortController.signal,
});

for await (const event of events) {
  switch (event.type) {
    case 'agent_start':           // Agent begins
    case 'turn_start':            // New LLM turn begins
    case 'message_start':         // LLM starts responding
    case 'message_update':        // Text/thinking delta received
    case 'tool_execution_start':  // About to execute a tool
    case 'tool_execution_end':    // Tool finished, result available
    case 'message_end':           // LLM finished this message
    case 'turn_end':              // Turn complete (may loop if tools were called)
    case 'agent_end':             // All done, final messages available
  }
}

The agent loop state machine:

[User sends prompt]
        │
        ▼
  ┌─[Build Context]──▶ [Check Queues]──▶ [Stream LLM]◄── streamFn()
  │                                           │
  │                                     ┌─────┴──────┐
  │                                     │            │
  │                                   text      tool_call
  │                                     │            │
  │                                     ▼            ▼
  │                                  [Done]    [Execute Tool]
  │                                                  │
  │                                            tool result
  │                                                  │
  └──────────────────────────────────────────────────┘
                                    (loops back to Stream LLM)

Key design decisions:

Tools execute sequentially by default (parallelism can be added on top)
The streamFn is injectable — you can wrap it with middleware to modify requests per-provider
Tool arguments are validated at runtime using TypeBox + AJV before execution
Aborted/failed responses preserve partial content and usage data
Tool results are automatically added to the conversation context

5. The `streamFn` Pattern — Injectable Middleware

This is Pi's most powerful architectural feature. The streamFn is the function that actually talks to the LLM, and it can be wrapped with middleware like Express.js request handlers:

import type { StreamFn } from '@mariozechner/pi-agent-core';
import { streamSimple } from '@mariozechner/pi-ai';

// Start with Pi's base streaming function
let fn: StreamFn = streamSimple;

// Wrap it with middleware that modifies requests per-provider
fn = createMyCustomWrapper(fn, {
  // Add custom headers for Anthropic
  onPayload: (payload) => {
    if (model.provider === 'anthropic') {
      payload.headers['anthropic-beta'] = 'fine-grained-tool-streaming-2025-05-14';
    }
  },
});

// Each wrapper calls the previous one, forming a chain:
// request → wrapper3 → wrapper2 → wrapper1 → streamSimple → API

This pattern is what allows projects like OpenClaw to stack 16 provider-specific wrappers on top of Pi's base streaming — adding beta headers for Anthropic, WebSocket transport for OpenAI, thinking sanitization for Google, reasoning effort headers for OpenRouter, and more — without modifying Pi's source code.

6. Session Management (via pi-coding-agent)

The @mariozechner/pi-coding-agent package provides persistent session management with JSONL-based storage:

import { createAgentSession, SessionManager } from '@mariozechner/pi-coding-agent';

// Create a session with full persistence
const session = await createAgentSession({
  model: 'claude-opus-4-6',
  tools: myTools,
  sessionManager,  // handles JSONL persistence
});

const result = await session.run('Build a REST API');
// Session is automatically saved to:
// ~/.pi/agent/sessions/session_abc123.jsonl

Session file format (append-only JSONL):

{"role":"user","content":"Build a REST API","timestamp":1710000000}
{"role":"assistant","content":"I'll create...","model":"claude-opus-4-6","usage":{...}}
{"role":"toolResult","toolCallId":"tc_001","toolName":"bash","content":"OK"}
{"type":"compaction","summary":"The user asked to build...","preservedMessages":[...]}

Session operations:

create() — new session
open(id) — restore existing session
continueRecent() — continue the most recent session
forkFrom(id) — create a branch (new JSONL referencing parent)
inMemory() — RAM-only session (for SDK/testing)

7. Context Compaction — Automatic Context Window Management

When the conversation approaches the model's context window limit, Pi automatically compacts the history:

1. DETECT: Calculate inputTokens + outputTokens vs model.contextWindow
2. TRIGGER: Proactively before overflow, or as recovery after overflow error
3. SUMMARIZE: Send history to LLM with a compaction prompt
4. WRITE: Append compaction entry to JSONL:
   {"type":"compaction","summary":"...","preservedMessages":[last N messages]}
5. CONTINUE: Context is now summary + recent messages instead of full history

The JSONL file is never rewritten — compaction entries are appended, maintaining a complete audit trail.

8. Cost Tracking — Cache-Aware Pricing

Pi tracks costs per-request with cache-aware pricing for providers that support prompt caching:

// Every model has 4 cost dimensions:
{
  input: 15,          // $15 per 1M input tokens
  output: 75,         // $75 per 1M output tokens
  cacheRead: 1.5,     // $1.50 per 1M cached prompt tokens (read)
  cacheWrite: 18.75,  // $18.75 per 1M cached prompt tokens (write)
}

// Usage tracking on every response:
{
  input: 1500,        // tokens consumed as input
  output: 800,        // tokens generated
  cacheRead: 5000,    // prompt cache hits
  cacheWrite: 1500,   // prompt cache writes
  cost: {
    total: 0.082,     // total cost in USD
    input: 0.0225,
    output: 0.06,
    cacheRead: 0.0075,
    cacheWrite: 0.028,
  },
}

Anthropic and OpenAI support prompt caching. For providers without caching, cacheRead and cacheWrite are always 0.

9. Extension System (via pi-coding-agent)

Pi supports a plugin system where extensions can register tools, commands, and lifecycle hooks:

// Extensions are TypeScript modules loaded at runtime via jiti
export default function(api: ExtensionAPI) {
  // Register a custom tool
  api.registerTool('my_tool', {
    description: 'Does something useful',
    parameters: { /* TypeBox schema */ },
    execute: async (args) => 'result',
  });

  // Register a slash command
  api.registerCommand('/mycommand', {
    handler: async (args) => { /* ... */ },
    description: 'Custom command',
  });

  // Hook into the agent lifecycle
  api.on('before_agent_start', async (context) => {
    context.systemPrompt += '\nExtra instructions';
  });

  api.on('tool_execution_end', async (event) => {
    // Post-process tool results
  });
}

Resource discovery chain (priority):

Project .pi/ directory (highest)
User ~/.pi/agent/
npm packages with Pi metadata
Built-in defaults

10. The Anti-MCP Philosophy — Why Pi Uses CLI Instead

Pi explicitly rejects MCP (Model Context Protocol). Mario Zechner's argument, backed by benchmarks:

The token cost problem:

Approach	Tools	Tokens Consumed	% of Claude's Context
Playwright MCP	21 tools	13,700 tokens	6.8%
Chrome DevTools MCP	26 tools	18,000 tokens	9.0%
Pi CLI + README	N/A	225 tokens	~0.1%

That's a 60-80x reduction in token consumption. With 5 MCP servers, you lose ~55,000 tokens before doing any work.

Benchmark results (120 evaluations):

Approach	Avg Cost	Success Rate
CLI (tmux)	$0.37	100%
CLI (terminalcp)	$0.39	100%
MCP (terminalcp)	$0.48	100%

Same success rate, MCP costs 30% more.

Pi's alternative: Progressive Disclosure via CLI tools + READMEs

Instead of loading all tool definitions upfront, Pi's agent has bash as a built-in tool and discovers CLI tools only when needed:

MCP approach:                          Pi approach:
─────────────                          ──────────
Session start →                        Session start →
  Load 21 Playwright tools               Load 4 tools: read, write, edit, bash
  Load 26 Chrome DevTools tools           (225 tokens)
  Load N more MCP tools
  (~55,000 tokens wasted)

When browser needed:                   When browser needed:
  Tools already loaded                   Agent reads SKILL.md (225 tokens)
  (but context is polluted)              Runs: browser-start.js
                                         Runs: browser-nav.js https://...
                                         Runs: browser-screenshot.js

When browser NOT needed:               When browser NOT needed:
  Tools still consume context             0 tokens wasted

The 4 built-in tools (what Pi argues is sufficient):

Tool	What It Does	Why It's Enough
`read`	Read files (text + images)	Supports offset/limit for large files
`write`	Create/overwrite files	Creates directories automatically
`edit`	Replace text (oldText→newText)	Surgical edits, like a diff
`bash`	Execute any shell command	bash can do everything else — replaces MCP entirely

The key insight: bash replaces MCP. Any CLI tool, API call, database query, or system operation can be invoked through bash. The agent reads the tool's README only when it needs it, paying tokens on-demand instead of upfront.

FAL — Media Generation (867+ endpoints)

Provider ID: fal Modalities: Image, Video, TTS Library: @fal-ai/client

The largest media generation provider with dynamic pricing fetched at runtime from https://api.fal.ai/v1/models/pricing.

Image Models (200+)

FLUX Family (20+ variants):

Model	Description
`fal-ai/flux/schnell`	Fast generation (default)
`fal-ai/flux/dev`	Higher quality
`fal-ai/flux-2`	Next generation
`fal-ai/flux-2-pro`	Professional quality
`fal-ai/flux-2-flex`	Flexible variant
`fal-ai/flux-2/edit`	Image editing
`fal-ai/flux-2/lora`	LoRA fine-tuning
`fal-ai/flux-pro/v1.1-ultra`	Ultra high quality
`fal-ai/flux-pro/kontext`	Context-aware generation
`fal-ai/flux-lora`	Custom style training
`fal-ai/flux-vision-upscaler`	AI upscaling
`fal-ai/flux-krea-trainer`	Model training
`fal-ai/flux-lora-fast-training`	Fast fine-tuning
`fal-ai/flux-lora-portrait-trainer`	Portrait specialist

Stable Diffusion: fal-ai/stable-diffusion-v15, fal-ai/stable-diffusion-v35-large, fal-ai/stable-diffusion-v35-medium, fal-ai/stable-diffusion-v3-medium

Other Image Models:

Model	Description
`fal-ai/recraft/v3/text-to-image`	Artistic generation
`fal-ai/ideogram/v2`, `v2a`, `v3`	Ideogram series
`fal-ai/imagen3`, `fal-ai/imagen4/preview`	Google Imagen
`fal-ai/gpt-image-1`	GPT image generation
`fal-ai/gpt-image-1/edit-image`	GPT image editing
`fal-ai/reve/text-to-image`	Reve generation
`fal-ai/sana`, `fal-ai/sana/sprint`	Sana models
`fal-ai/pixart-sigma`	PixArt Sigma
`fal-ai/bria/text-to-image/base`	Bria AI

Pre-trained LoRA Styles: fal-ai/flux-2-lora-gallery/sepia-vintage, virtual-tryon, satellite-view-style, realism, multiple-angles, hdr-style, face-to-full-portrait, digital-comic-art, ballpoint-pen-sketch, apartment-staging, add-background

Image Editing/Enhancement (30+ tools): fal-ai/image-editing/age-progression, baby-version, background-change, hair-change, expression-change, object-removal, photo-restoration, style-transfer, and many more.

Video Models (150+)

Kling Video (20+ variants):

Model	Description
`fal-ai/kling-video/v2/master/text-to-video`	Default text-to-video
`fal-ai/kling-video/v2/master/image-to-video`	Image-to-video
`fal-ai/kling-video/v2.5-turbo/pro/text-to-video`	Turbo pro
`fal-ai/kling-video/o1/image-to-video`	O1 quality
`fal-ai/kling-video/o1/video-to-video/edit`	Video editing
`fal-ai/kling-video/lipsync/audio-to-video`	Lip sync
`fal-ai/kling-video/video-to-audio`	Audio extraction

Sora 2 (OpenAI):

Model	Description
`fal-ai/sora-2/text-to-video`	Text-to-video
`fal-ai/sora-2/text-to-video/pro`	Pro quality
`fal-ai/sora-2/image-to-video`	Image-to-video
`fal-ai/sora-2/video-to-video/remix`	Video remixing

VEO 3 (Google):

Model	Description
`fal-ai/veo3`	VEO 3 standard
`fal-ai/veo3/fast`	Fast variant
`fal-ai/veo3/image-to-video`	Image-to-video
`fal-ai/veo3.1`	Latest version
`fal-ai/veo3.1/reference-to-video`	Reference-guided
`fal-ai/veo3.1/first-last-frame-to-video`	Frame interpolation

WAN (15+ variants): fal-ai/wan-pro/text-to-video, fal-ai/wan-pro/image-to-video, fal-ai/wan/v2.2-a14b/text-to-video, fal-ai/wan-vace-14b/depth, fal-ai/wan-vace-14b/inpainting, fal-ai/wan-vace-14b/pose, fal-ai/wan-effects

Pixverse (20+ variants): fal-ai/pixverse/v5.5/text-to-video, fal-ai/pixverse/v5.5/image-to-video, fal-ai/pixverse/v5.5/effects, fal-ai/pixverse/lipsync, fal-ai/pixverse/sound-effects

Minimax / Hailuo: fal-ai/minimax/hailuo-2.3/text-to-video/pro, fal-ai/minimax/hailuo-2.3/image-to-video/pro, fal-ai/minimax/video-01-director, fal-ai/minimax/video-01-live

Other Video Models:

Provider	Models
Hunyuan	`fal-ai/hunyuan-video/text-to-video`, `image-to-video`, `video-to-video`, `foley`
Pika	`fal-ai/pika/v2.2/text-to-video`, `pikascenes`, `pikaffects`
LTX	`fal-ai/ltx-2/text-to-video`, `image-to-video`, `retake-video`
Luma	`fal-ai/luma-dream-machine/ray-2`, `ray-2-flash`, `luma-photon`
Vidu	`fal-ai/vidu/q2/text-to-video`, `image-to-video/pro`
CogVideoX	`fal-ai/cogvideox-5b/text-to-video`, `video-to-video`
Seedance	`fal-ai/bytedance/seedance/v1/text-to-video`, `image-to-video`
Magi	`fal-ai/magi/text-to-video`, `extend-video`

TTS / Speech Models (50+)

Kokoro (9 languages, 20+ voices per language):

Model	Language	Example Voices
`fal-ai/kokoro/american-english`	English (US)	af_heart, af_alloy, af_bella, af_nova, am_adam, am_echo, am_onyx
`fal-ai/kokoro/british-english`	English (UK)	British voice set
`fal-ai/kokoro/french`	French	French voice set
`fal-ai/kokoro/japanese`	Japanese	Japanese voice set
`fal-ai/kokoro/spanish`	Spanish	Spanish voice set
`fal-ai/kokoro/mandarin-chinese`	Chinese	Mandarin voice set
`fal-ai/kokoro/italian`	Italian	Italian voice set
`fal-ai/kokoro/hindi`	Hindi	Hindi voice set
`fal-ai/kokoro/brazilian-portuguese`	Portuguese	Portuguese voice set

ElevenLabs:

Model	Description
`fal-ai/elevenlabs/tts/eleven-v3`	Professional quality
`fal-ai/elevenlabs/tts/turbo-v2.5`	Faster inference
`fal-ai/elevenlabs/tts/multilingual-v2`	Multi-language
`fal-ai/elevenlabs/text-to-dialogue/eleven-v3`	Dialogue generation
`fal-ai/elevenlabs/sound-effects/v2`	Sound effects
`fal-ai/elevenlabs/speech-to-text`	Transcription
`fal-ai/elevenlabs/audio-isolation`	Background removal

Other TTS: fal-ai/f5-tts (voice cloning), fal-ai/dia-tts, fal-ai/minimax/speech-2.6-turbo, fal-ai/minimax/speech-2.6-hd, fal-ai/chatterbox/text-to-speech, fal-ai/index-tts-2/text-to-speech

FAL Client Capabilities

The @fal-ai/client provides additional features beyond what Noosphere surfaces:

Queue API — Submit jobs, poll status, get results, cancel. Supports webhooks and priority levels
Streaming API — Real-time streaming responses via async iterators
Realtime API — WebSocket connections for interactive use (e.g., real-time image generation)
Storage API — File upload with configurable TTL (1h, 1d, 7d, 30d, 1y, never)
Retry logic — Configurable retries with exponential backoff and jitter
Request middleware — Custom request interceptors and proxy support

Hugging Face — Open Source AI (30+ tasks)

Provider ID: huggingface Modalities: LLM, Image, TTS Library: @huggingface/inference

Access to the entire Hugging Face Hub ecosystem. Any model hosted on HuggingFace can be used by passing its ID directly.

Default Models

Modality	Default Model	Description
LLM	`meta-llama/Llama-3.1-8B-Instruct`	Llama 3.1 8B
Image	`stabilityai/stable-diffusion-xl-base-1.0`	SDXL Base
TTS	`facebook/mms-tts-eng`	MMS TTS English

Any HuggingFace model ID works — just pass it as the model parameter:

await ai.chat({
  provider: 'huggingface',
  model: 'mistralai/Mixtral-8x7B-v0.1',
  messages: [{ role: 'user', content: 'Hello' }],
});

Full Library Capabilities

The @huggingface/inference library (v3.15.0) provides 30+ AI tasks, including capabilities not yet surfaced by Noosphere:

Natural Language Processing:

Task	Method	Description
Chat	`chatCompletion()`	OpenAI-compatible chat completions
Chat Streaming	`chatCompletionStream()`	Token-by-token streaming
Text Generation	`textGeneration()`	Raw text completion
Summarization	`summarization()`	Text summarization
Translation	`translation()`	Language translation
Question Answering	`questionAnswering()`	Extract answers from context
Text Classification	`textClassification()`	Sentiment, topic classification
Zero-Shot Classification	`zeroShotClassification()`	Classify without training
Token Classification	`tokenClassification()`	NER, POS tagging
Sentence Similarity	`sentenceSimilarity()`	Semantic similarity scores
Feature Extraction	`featureExtraction()`	Text embeddings
Fill Mask	`fillMask()`	Fill in masked tokens
Table QA	`tableQuestionAnswering()`	Answer questions about tables

Computer Vision:

Task	Method	Description
Text-to-Image	`textToImage()`	Generate images from text
Image-to-Image	`imageToImage()`	Transform/edit images
Image Captioning	`imageToText()`	Describe images
Classification	`imageClassification()`	Classify image content
Object Detection	`objectDetection()`	Detect and locate objects
Segmentation	`imageSegmentation()`	Pixel-level segmentation
Zero-Shot Image	`zeroShotImageClassification()`	Classify without training
Text-to-Video	`textToVideo()`	Generate videos

Audio:

Task	Method	Description
Text-to-Speech	`textToSpeech()`	Generate speech
Speech-to-Text	`automaticSpeechRecognition()`	Transcription
Audio Classification	`audioClassification()`	Classify sounds
Audio-to-Audio	`audioToAudio()`	Source separation, enhancement

Multimodal:

Task	Method	Description
Visual QA	`visualQuestionAnswering()`	Answer questions about images
Document QA	`documentQuestionAnswering()`	Answer questions about documents

Tabular:

Task	Method	Description
Classification	`tabularClassification()`	Classify tabular data
Regression	`tabularRegression()`	Predict continuous values

HuggingFace Agentic Features

Tool/Function Calling: Full support via tools parameter with tool_choice control (auto/none/required)
JSON Schema Responses: response_format: { type: 'json_schema', json_schema: {...} }
Reasoning: reasoning_effort parameter (none/minimal/low/medium/high/xhigh)
Multimodal Input: Images via image_url content chunks in chat messages
17 Inference Providers: Route through Groq, Together, Fireworks, Replicate, Cerebras, Cohere, and more

ComfyUI — Local Image Generation

Provider ID: comfyui Modalities: Image, Video (planned) Type: Local Default Port: 8188

Connects to a local ComfyUI instance for Stable Diffusion workflows.

How It Works

Clones a built-in txt2img workflow template (KSampler + SDXL pipeline)
Injects your parameters (prompt, dimensions, seed, steps, guidance)
POSTs the workflow to ComfyUI's /prompt endpoint
Polls /history/{promptId} every second until completion (max 5 minutes)
Fetches the generated image from /view
Returns a PNG buffer

Configuration

const ai = new Noosphere({
  local: {
    comfyui: {
      enabled: true,
      host: 'http://localhost',
      port: 8188,
    },
  },
});

Default Workflow

Checkpoint: sd_xl_base_1.0.safetensors
Sampler: euler with normal scheduler
Default Steps: 20
Default CFG/Guidance: 7
Default Size: 1024x1024
Max Size: 2048x2048
Output: PNG

Models Exposed

Model ID	Modality	Description
`comfyui-txt2img`	Image	Text-to-image via workflow
`comfyui-txt2vid`	Video	Planned (requires AnimateDiff workflow)

Local TTS — Piper & Kokoro

Provider IDs: piper, kokoro Modality: TTS Type: Local

Connects to local OpenAI-compatible TTS servers.

Supported Engines

Engine	Default Port	Health Check	Voice Discovery
Piper	5500	`GET /health`	`GET /voices`
Kokoro	5501	`GET /health`	`GET /v1/models` (fallback)

API

Uses the OpenAI-compatible TTS endpoint:

POST /v1/audio/speech
{
  "model": "tts-1",
  "input": "Hello world",
  "voice": "default",
  "speed": 1.0,
  "response_format": "mp3"
}

Supports mp3, wav, and ogg formats. Returns audio as a Buffer.

Architecture

Provider Resolution (Local-First)

When you call a generation method without specifying a provider, Noosphere resolves one automatically:

If model is specified without provider → looks up model in registry cache
If a default is configured for the modality → uses that
Otherwise → local providers first, then cloud providers

resolveProvider(modality):
  1. Check user-specified provider ID → return if found
  2. Check configured defaults → return if found
  3. Scan all providers:
     → Return first LOCAL provider supporting this modality
     → Fallback to first CLOUD provider
  4. Throw NO_PROVIDER error

Retry & Failover Logic

executeWithRetry(modality, provider, fn):
  for attempt = 0..maxRetries:
    try: return fn()
    catch:
      if error is retryable AND attempts remain:
        wait backoffMs * 2^attempt (exponential backoff)
        retry same provider
      if error is NOT GENERATION_FAILED AND failover enabled:
        try each alternative provider for this modality
      throw last error

Retryable errors (same provider): PROVIDER_UNAVAILABLE, RATE_LIMITED, TIMEOUT, GENERATION_FAILED

Failover-eligible errors (cross-provider): PROVIDER_UNAVAILABLE, RATE_LIMITED, TIMEOUT (NOT GENERATION_FAILED)

Model Registry & Caching

Models are fetched from providers via listModels() and cached in memory
Cache TTL is configurable (default: 60 minutes)
syncModels() forces a refresh of all provider model lists
Registry tracks model → provider mappings for fast resolution

Usage Tracking

Every API call (success or failure) records a UsageEvent:

interface UsageEvent {
  modality: 'llm' | 'image' | 'video' | 'tts';
  provider: string;
  model: string;
  cost: number;           // USD
  latencyMs: number;
  input?: number;         // tokens or characters
  output?: number;        // tokens
  unit?: string;
  timestamp: string;      // ISO 8601
  success: boolean;
  error?: string;         // error message if failed
  metadata?: Record<string, unknown>;
}

Error Handling

All errors are instances of NoosphereError:

import { NoosphereError } from 'noosphere';

try {
  await ai.chat({ messages: [{ role: 'user', content: 'Hello' }] });
} catch (err) {
  if (err instanceof NoosphereError) {
    console.log(err.code);           // error code
    console.log(err.provider);       // which provider failed
    console.log(err.modality);       // which modality
    console.log(err.model);          // which model (if known)
    console.log(err.cause);          // underlying error
    console.log(err.isRetryable());  // whether retry might help
  }
}

Error Codes

Code	Description	Retryable	Failover
`PROVIDER_UNAVAILABLE`	Provider is down or unreachable	Yes	Yes
`RATE_LIMITED`	API rate limit exceeded	Yes	Yes
`TIMEOUT`	Request exceeded timeout	Yes	Yes
`GENERATION_FAILED`	Generation error (bad prompt, model issue)	Yes	No
`AUTH_FAILED`	Invalid or missing API key	No	No
`MODEL_NOT_FOUND`	Requested model doesn't exist	No	No
`INVALID_INPUT`	Bad parameters or unsupported operation	No	No
`NO_PROVIDER`	No provider available for the requested modality	No	No

Custom Providers

Extend Noosphere with your own providers:

import type { NoosphereProvider, ModelInfo, ChatOptions, NoosphereResult, Modality } from 'noosphere';

const myProvider: NoosphereProvider = {
  // Required properties
  id: 'my-provider',
  name: 'My Custom Provider',
  modalities: ['llm', 'image'] as Modality[],
  isLocal: false,

  // Required methods
  async ping() { return true; },
  async listModels(modality?: Modality): Promise<ModelInfo[]> {
    return [{
      id: 'my-model',
      provider: 'my-provider',
      name: 'My Model',
      modality: 'llm',
      local: false,
      cost: { price: 1.0, unit: 'per_1m_tokens' },
      capabilities: {
        contextWindow: 128000,
        maxTokens: 4096,
        supportsVision: false,
        supportsStreaming: true,
      },
    }];
  },

  // Optional methods — implement per modality
  async chat(options: ChatOptions): Promise<NoosphereResult> {
    const start = Date.now();
    // ... your implementation
    return {
      content: 'Response text',
      provider: 'my-provider',
      model: 'my-model',
      modality: 'llm',
      latencyMs: Date.now() - start,
      usage: { cost: 0.001, input: 100, output: 50, unit: 'tokens' },
    };
  },

  // stream?(options): NoosphereStream
  // image?(options): Promise<NoosphereResult>
  // video?(options): Promise<NoosphereResult>
  // speak?(options): Promise<NoosphereResult>
  // dispose?(): Promise<void>
};

ai.registerProvider(myProvider);

Provider Summary

Provider	ID	Modalities	Type	Models	Library
Pi-AI Gateway	`pi-ai`	LLM	Cloud	246+	`@mariozechner/pi-ai`
FAL.ai	`fal`	Image, Video, TTS	Cloud	867+	`@fal-ai/client`
Hugging Face	`huggingface`	LLM, Image, TTS	Cloud	Unlimited (any HF model)	`@huggingface/inference`
ComfyUI	`comfyui`	Image	Local	SDXL workflows	Direct HTTP
Piper TTS	`piper`	TTS	Local	Piper voices	Direct HTTP
Kokoro TTS	`kokoro`	TTS	Local	Kokoro voices	Direct HTTP

Requirements

Node.js >= 18.0.0

License

MIT

noosphere

Package Exports

Readme

noosphere

Features

Install

Quick Start

Configuration

Full Configuration Reference

Local Service Environment Variables

API Reference

new Noosphere(config?)

Generation Methods

ai.chat(options): Promise<NoosphereResult>

ai.stream(options): NoosphereStream

ai.image(options): Promise<NoosphereResult>

ai.video(options): Promise<NoosphereResult>

ai.speak(options): Promise<NoosphereResult>

Discovery Methods

ai.getProviders(modality?): Promise<ProviderInfo[]>

ai.getModels(modality?): Promise<ModelInfo[]>

ai.getModel(provider, modelId): Promise<ModelInfo | null>

ai.syncModels(): Promise<SyncResult>

Usage Tracking

ai.getUsage(options?): UsageSummary

Lifecycle

ai.registerProvider(provider): void

ai.dispose(): Promise<void>

NoosphereResult

Providers In Depth

Pi-AI — LLM Gateway (246+ models)

Anthropic Models (19)

OpenAI Models (24)

Google Gemini Models (19)

xAI Grok Models (20)

Groq Models (15)

Cerebras Models (3)

Zai Models (5)

OpenRouter (141 models)

The Pi-AI Engine — Deep Dive

How Pi Keeps 200+ Models Updated

4 API Protocols — How Pi Talks to Every Provider

Agentic Capabilities

1. Tool Use / Function Calling

2. Reasoning / Extended Thinking

3. Vision / Multimodal Input

4. Agent Loop — Autonomous Tool Execution

5. The streamFn Pattern — Injectable Middleware

6. Session Management (via pi-coding-agent)

7. Context Compaction — Automatic Context Window Management

8. Cost Tracking — Cache-Aware Pricing

9. Extension System (via pi-coding-agent)

10. The Anti-MCP Philosophy — Why Pi Uses CLI Instead

FAL — Media Generation (867+ endpoints)

Image Models (200+)

Video Models (150+)

TTS / Speech Models (50+)

FAL Client Capabilities

Hugging Face — Open Source AI (30+ tasks)

Default Models

Full Library Capabilities

HuggingFace Agentic Features

ComfyUI — Local Image Generation

How It Works

Configuration

Default Workflow

Models Exposed

Local TTS — Piper & Kokoro

Supported Engines

API

Architecture

Provider Resolution (Local-First)

Retry & Failover Logic

Model Registry & Caching

Usage Tracking

Error Handling

Error Codes

Custom Providers

Provider Summary

Requirements

`new Noosphere(config?)`

`ai.chat(options): Promise<NoosphereResult>`

`ai.stream(options): NoosphereStream`

`ai.image(options): Promise<NoosphereResult>`

`ai.video(options): Promise<NoosphereResult>`

`ai.speak(options): Promise<NoosphereResult>`

`ai.getProviders(modality?): Promise<ProviderInfo[]>`

`ai.getModels(modality?): Promise<ModelInfo[]>`

`ai.getModel(provider, modelId): Promise<ModelInfo | null>`

`ai.syncModels(): Promise<SyncResult>`

`ai.getUsage(options?): UsageSummary`

`ai.registerProvider(provider): void`

`ai.dispose(): Promise<void>`

5. The `streamFn` Pattern — Injectable Middleware