Package Exports

noosphere

Readme

noosphere

Unified AI creation engine — text, image, video, and audio generation across all providers through a single interface.

One import. Every model. Every modality.

Features

4 modalities — LLM chat, image generation, video generation, and text-to-speech
246+ LLM models — via Pi-AI gateway (OpenAI, Anthropic, Google, Groq, Mistral, xAI, Cerebras, OpenRouter)
867+ media endpoints — via FAL (Flux, SDXL, Kling, Sora 2, VEO 3, Kokoro, ElevenLabs, and hundreds more)
30+ HuggingFace tasks — LLM, image, TTS, translation, summarization, classification, and more
Local-first architecture — Auto-detects ComfyUI, Ollama, Piper, and Kokoro on your machine
Agentic capabilities — Tool use, function calling, reasoning/thinking, vision, and agent loops via Pi-AI
Failover & retry — Automatic retries with exponential backoff and cross-provider failover
Usage tracking — Real-time cost, latency, and token tracking across all providers
TypeScript-first — Full type definitions with ESM and CommonJS support

Install

npm install noosphere

Quick Start

import { Noosphere } from 'noosphere';

const ai = new Noosphere();

// Chat with any LLM
const response = await ai.chat({
  messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.content);

// Generate an image
const image = await ai.image({
  prompt: 'A sunset over mountains',
  width: 1024,
  height: 1024,
});
console.log(image.url);

// Generate a video
const video = await ai.video({
  prompt: 'Ocean waves crashing on rocks',
  duration: 5,
});
console.log(video.url);

// Text-to-speech
const audio = await ai.speak({
  text: 'Welcome to Noosphere',
  voice: 'alloy',
  format: 'mp3',
});
// audio.buffer contains the audio data

Configuration

API keys are resolved from the constructor config or environment variables (config takes priority):

const ai = new Noosphere({
  keys: {
    openai: 'sk-...',
    anthropic: 'sk-ant-...',
    google: 'AIza...',
    fal: 'fal-...',
    huggingface: 'hf_...',
    groq: 'gsk_...',
    mistral: '...',
    xai: '...',
    openrouter: 'sk-or-...',
  },
});

Or set environment variables:

Variable	Provider
`OPENAI_API_KEY`	OpenAI
`ANTHROPIC_API_KEY`	Anthropic
`GEMINI_API_KEY`	Google Gemini
`FAL_KEY`	FAL.ai
`HUGGINGFACE_TOKEN`	Hugging Face
`GROQ_API_KEY`	Groq
`MISTRAL_API_KEY`	Mistral
`XAI_API_KEY`	xAI (Grok)
`OPENROUTER_API_KEY`	OpenRouter

Full Configuration Reference

const ai = new Noosphere({
  // API keys (or use env vars above)
  keys: { /* ... */ },

  // Default models per modality
  defaults: {
    llm: { provider: 'pi-ai', model: 'claude-sonnet-4-20250514' },
    image: { provider: 'fal', model: 'fal-ai/flux/schnell' },
    video: { provider: 'fal', model: 'fal-ai/kling-video/v2/master/text-to-video' },
    tts: { provider: 'fal', model: 'fal-ai/kokoro/american-english' },
  },

  // Local service configuration
  autoDetectLocal: true,  // env: NOOSPHERE_AUTO_DETECT_LOCAL
  local: {
    ollama: { enabled: true, host: 'http://localhost', port: 11434 },
    comfyui: { enabled: true, host: 'http://localhost', port: 8188 },
    piper: { enabled: true, host: 'http://localhost', port: 5500 },
    kokoro: { enabled: true, host: 'http://localhost', port: 5501 },
    custom: [],  // additional LocalServiceConfig[]
  },

  // Retry & failover
  retry: {
    maxRetries: 2,           // default: 2
    backoffMs: 1000,         // default: 1000 (exponential: 1s, 2s, 4s...)
    failover: true,          // default: true — try other providers on failure
    retryableErrors: ['PROVIDER_UNAVAILABLE', 'RATE_LIMITED', 'TIMEOUT'],
  },

  // Timeouts per modality (ms)
  timeout: {
    llm: 30000,    // 30s
    image: 120000, // 2min
    video: 300000, // 5min
    tts: 60000,    // 1min
  },

  // Model discovery cache (minutes)
  discoveryCacheTTL: 60,  // env: NOOSPHERE_DISCOVERY_CACHE_TTL

  // Real-time usage callback
  onUsage: (event) => {
    console.log(`${event.provider}/${event.model}: $${event.cost} (${event.latencyMs}ms)`);
  },
});

Local Service Environment Variables

Variable	Default	Description
`OLLAMA_HOST`	`http://localhost`	Ollama server host
`OLLAMA_PORT`	`11434`	Ollama server port
`COMFYUI_HOST`	`http://localhost`	ComfyUI server host
`COMFYUI_PORT`	`8188`	ComfyUI server port
`PIPER_HOST`	`http://localhost`	Piper TTS server host
`PIPER_PORT`	`5500`	Piper TTS server port
`KOKORO_HOST`	`http://localhost`	Kokoro TTS server host
`KOKORO_PORT`	`5501`	Kokoro TTS server port
`NOOSPHERE_AUTO_DETECT_LOCAL`	`true`	Enable/disable local service auto-detection
`NOOSPHERE_DISCOVERY_CACHE_TTL`	`60`	Model cache TTL in minutes

API Reference

`new Noosphere(config?)`

Creates a new instance. Providers are initialized lazily on first API call. Auto-detects local services via HTTP pings (2s timeout each).

Generation Methods

`ai.chat(options): Promise<NoosphereResult>`

Generate text with any LLM. Supports 246+ models across 8 providers.

const result = await ai.chat({
  provider: 'anthropic',                // optional — auto-resolved if omitted
  model: 'claude-sonnet-4-20250514',    // optional — uses default or first available
  messages: [
    { role: 'system', content: 'You are helpful.' },
    { role: 'user', content: 'Explain quantum computing' },
  ],
  temperature: 0.7,     // optional (0-2)
  maxTokens: 1024,      // optional
  jsonMode: false,       // optional
});

console.log(result.content);          // response text
console.log(result.thinking);         // reasoning output (Claude, GPT-5, o3, Gemini, Grok-4)
console.log(result.usage.cost);       // cost in USD
console.log(result.usage.input);      // input tokens
console.log(result.usage.output);     // output tokens
console.log(result.latencyMs);        // response time in ms

`ai.stream(options): NoosphereStream`

Stream LLM responses token-by-token. Same options as chat().

const stream = ai.stream({
  messages: [{ role: 'user', content: 'Write a story' }],
});

for await (const event of stream) {
  switch (event.type) {
    case 'text_delta':
      process.stdout.write(event.delta!);
      break;
    case 'thinking_delta':
      console.log('[thinking]', event.delta);
      break;
    case 'done':
      console.log('\n\nUsage:', event.result!.usage);
      break;
    case 'error':
      console.error(event.error);
      break;
  }
}

// Or consume the full result
const result = await stream.result();

// Abort at any time
stream.abort();

`ai.image(options): Promise<NoosphereResult>`

Generate images. Supports 200+ image models via FAL, HuggingFace, and ComfyUI.

const result = await ai.image({
  provider: 'fal',                              // optional
  model: 'fal-ai/flux-2-pro',                   // optional
  prompt: 'A futuristic cityscape at sunset',
  negativePrompt: 'blurry, low quality',         // optional
  width: 1024,                                   // optional
  height: 768,                                   // optional
  seed: 42,                                      // optional — reproducible results
  steps: 30,                                     // optional — inference steps (more = higher quality)
  guidanceScale: 7.5,                            // optional — prompt adherence (higher = stricter)
});

console.log(result.url);                // image URL (FAL)
console.log(result.buffer);             // image Buffer (HuggingFace, ComfyUI)
console.log(result.media?.width);       // actual dimensions
console.log(result.media?.height);
console.log(result.media?.format);      // 'png'

`ai.video(options): Promise<NoosphereResult>`

Generate videos. Supports 150+ video models via FAL (Kling, Sora 2, VEO 3, WAN, Pixverse, and more).

const result = await ai.video({
  provider: 'fal',
  model: 'fal-ai/kling-video/v2/master/text-to-video',
  prompt: 'A bird flying through clouds',
  imageUrl: 'https://...',    // optional — image-to-video
  duration: 5,                // optional — seconds
  fps: 24,                    // optional
  width: 1280,                // optional
  height: 720,                // optional
});

console.log(result.url);                // video URL
console.log(result.media?.duration);    // actual duration
console.log(result.media?.fps);         // frames per second
console.log(result.media?.format);      // 'mp4'

`ai.speak(options): Promise<NoosphereResult>`

Text-to-speech synthesis. Supports 50+ TTS models via FAL, HuggingFace, Piper, and Kokoro.

const result = await ai.speak({
  provider: 'fal',
  model: 'fal-ai/kokoro/american-english',
  text: 'Hello world',
  voice: 'af_heart',        // optional — voice ID
  language: 'en',            // optional
  speed: 1.0,                // optional
  format: 'mp3',             // optional — 'mp3' | 'wav' | 'ogg'
});

console.log(result.buffer);  // audio Buffer
console.log(result.url);     // audio URL (FAL)

Discovery Methods

`ai.getProviders(modality?): Promise<ProviderInfo[]>`

List available providers, optionally filtered by modality.

const providers = await ai.getProviders('llm');
// [{ id: 'pi-ai', name: 'Pi-AI', modalities: ['llm'], local: false, status: 'online', modelCount: 246 }]

`ai.getModels(modality?): Promise<ModelInfo[]>`

List all available models with full metadata.

const models = await ai.getModels('image');
// Returns ModelInfo[] with id, provider, name, modality, local, cost, capabilities

`ai.getModel(provider, modelId): Promise<ModelInfo | null>`

Get details about a specific model.

`ai.syncModels(): Promise<SyncResult>`

Refresh model lists from all providers. Returns sync count, per-provider breakdown, and any errors.

Usage Tracking

`ai.getUsage(options?): UsageSummary`

Get aggregated usage statistics with optional filtering.

const usage = ai.getUsage({
  since: '2024-01-01',    // optional — ISO date or Date object
  until: '2024-12-31',    // optional
  provider: 'openai',     // optional — filter by provider
  modality: 'llm',        // optional — filter by modality
});

console.log(usage.totalCost);        // total USD spent
console.log(usage.totalRequests);    // number of requests
console.log(usage.byProvider);       // { openai: 2.50, anthropic: 1.20, fal: 0.30 }
console.log(usage.byModality);       // { llm: 3.00, image: 0.70, video: 0.30, tts: 0.00 }

Lifecycle

`ai.registerProvider(provider): void`

`ai.dispose(): Promise<void>`

Cleanup all provider resources, clear model cache, and reset usage tracker.

NoosphereResult

Every generation method returns a NoosphereResult:

interface NoosphereResult {
  content?: string;        // LLM response text
  thinking?: string;       // reasoning/thinking output (supported models)
  url?: string;            // media URL (images, videos, audio from cloud providers)
  buffer?: Buffer;         // media binary data (local providers, HuggingFace)
  provider: string;        // which provider handled the request
  model: string;           // which model was used
  modality: Modality;      // 'llm' | 'image' | 'video' | 'tts'
  latencyMs: number;       // request duration in milliseconds
  usage: {
    cost: number;          // cost in USD
    input?: number;        // input tokens/characters
    output?: number;       // output tokens
    unit?: string;         // 'tokens' | 'characters' | 'per_image' | 'per_second' | 'free'
  };
  media?: {
    width?: number;        // image/video width
    height?: number;       // image/video height
    duration?: number;     // video/audio duration in seconds
    format?: string;       // 'png' | 'mp4' | 'mp3' | 'wav'
    fps?: number;          // video frames per second
  };
}

Providers In Depth

Pi-AI — LLM Gateway (246+ models)

Provider ID: pi-ai Modalities: LLM (chat + streaming) Library: @mariozechner/pi-ai

A unified gateway that routes to 8 LLM providers through 4 different API protocols:

API Protocol	Providers
`anthropic-messages`	Anthropic
`google-generative-ai`	Google
`openai-responses`	OpenAI (reasoning models)
`openai-completions`	OpenAI, xAI, Groq, Cerebras, Zai, OpenRouter

Anthropic Models (19)

Model	Context	Reasoning	Vision	Input Cost	Output Cost
`claude-opus-4-0`	200k	Yes	Yes	$15/M	$75/M
`claude-opus-4-1`	200k	Yes	Yes	$15/M	$75/M
`claude-sonnet-4-20250514`	200k	Yes	Yes	$3/M	$15/M
`claude-sonnet-4-5-20250929`	200k	Yes	Yes	$3/M	$15/M
`claude-3-7-sonnet-20250219`	200k	Yes	Yes	$3/M	$15/M
`claude-3-5-sonnet-20241022`	200k	No	Yes	$3/M	$15/M
`claude-haiku-4-5-20251001`	200k	No	Yes	$0.80/M	$4/M
`claude-3-5-haiku-20241022`	200k	No	Yes	$0.80/M	$4/M
`claude-3-haiku-20240307`	200k	No	Yes	$0.25/M	$1.25/M
...and 10 more variants

OpenAI Models (24)

Model	Context	Reasoning	Vision	Input Cost	Output Cost
`gpt-5`	200k	Yes	Yes	$10/M	$30/M
`gpt-5-mini`	200k	Yes	Yes	$2.50/M	$10/M
`gpt-4.1`	128k	No	Yes	$2/M	$8/M
`gpt-4.1-mini`	128k	No	Yes	$0.40/M	$1.60/M
`gpt-4.1-nano`	128k	No	Yes	$0.10/M	$0.40/M
`gpt-4o`	128k	No	Yes	$2.50/M	$10/M
`gpt-4o-mini`	128k	No	Yes	$0.15/M	$0.60/M
`o3-pro`	200k	Yes	Yes	$20/M	$80/M
`o3-mini`	200k	Yes	Yes	$1.10/M	$4.40/M
`o4-mini`	200k	Yes	Yes	$1.10/M	$4.40/M
`codex-mini-latest`	200k	Yes	No	$1.50/M	$6/M
...and 13 more variants

Google Gemini Models (19)

Model	Context	Reasoning	Vision	Cost
`gemini-2.5-flash`	1M	Yes	Yes	$0.15-0.60/M
`gemini-2.5-pro`	1M	Yes	Yes	$1.25-10/M
`gemini-2.0-flash`	1M	No	Yes	$0.10-0.40/M
`gemini-2.0-flash-lite`	1M	No	Yes	$0.025-0.10/M
`gemini-1.5-flash`	1M	No	Yes	$0.075-0.30/M
`gemini-1.5-pro`	2M	No	Yes	$1.25-5/M
...and 13 more variants

xAI Grok Models (20)

Model	Context	Reasoning	Vision	Input Cost
`grok-4`	256k	Yes	Yes	$5/M
`grok-4-fast`	256k	Yes	Yes	$3/M
`grok-3`	131k	No	Yes	$3/M
`grok-3-fast`	131k	No	Yes	$5/M
`grok-3-mini-fast-latest`	131k	Yes	No	$0.30/M
`grok-2-vision`	32k	No	Yes	$2/M
...and 14 more variants

Groq Models (15)

Model	Context	Cost
`llama-3.3-70b-versatile`	128k	$0.59/M
`llama-3.1-8b-instant`	128k	$0.05/M
`mistral-saba-24b`	32k	$0.40/M
`qwen-qwq-32b`	128k	$0.29/M
`deepseek-r1-distill-llama-70b`	128k	$0.75/M
...and 10 more

Cerebras Models (3)

gpt-oss-120b, qwen-3-235b-a22b-instruct-2507, qwen-3-coder-480b

Zai Models (5)

glm-4.6, glm-4.5, glm-4.5-flash, glm-4.5v, glm-4.5-air

OpenRouter (141 models)

Aggregator providing access to hundreds of additional models including Llama, Deepseek, Mistral, Qwen, and many more. Full list available via ai.getModels('llm').

Agentic Capabilities (via Pi-AI library)

The underlying @mariozechner/pi-ai library exposes powerful agentic features. While Noosphere currently surfaces chat and streaming, the library provides:

Tool Use / Function Calling:

// Supported across Anthropic, OpenAI, Google, xAI, Groq
// Tool definitions use TypeBox schemas for runtime validation
interface Tool<TParameters extends TSchema = TSchema> {
  name: string;
  description: string;
  parameters: TParameters;  // TypeBox schema — validated at runtime with AJV
}

Reasoning / Thinking:

Anthropic: thinkingEnabled, thinkingBudgetTokens — Claude Opus/Sonnet extended thinking
OpenAI: reasoningEffort (minimal/low/medium/high) — o1/o3/o4/GPT-5 reasoning
Google: thinking.enabled, thinking.budgetTokens — Gemini 2.5 thinking
xAI: Grok-4 native reasoning
Thinking blocks are automatically extracted and streamed as separate thinking_delta events

Vision / Multimodal Input:

// Send images alongside text to vision-capable models
{
  role: "user",
  content: [
    { type: "text", text: "What's in this image?" },
    { type: "image", data: base64String, mimeType: "image/png" }
  ]
}

Agent Loop:

// Built-in agentic execution loop with automatic tool calling
import { agentLoop } from '@mariozechner/pi-ai';

const events = agentLoop(prompt, context, {
  tools: [myTool],
  model: getModel('anthropic', 'claude-sonnet-4-20250514'),
});

for await (const event of events) {
  // event.type: agent_start → turn_start → message_start →
  //   message_update → tool_execution_start → tool_execution_end →
  //   message_end → turn_end → agent_end
}

Cost Tracking per Model:

// Costs tracked per 1M tokens with cache-aware pricing
{
  input: number,       // cost per 1M input tokens
  output: number,      // cost per 1M output tokens
  cacheRead: number,   // prompt cache hit cost
  cacheWrite: number,  // prompt cache write cost
}

FAL — Media Generation (867+ endpoints)

Provider ID: fal Modalities: Image, Video, TTS Library: @fal-ai/client

The largest media generation provider with dynamic pricing fetched at runtime from https://api.fal.ai/v1/models/pricing.

Image Models (200+)

FLUX Family (20+ variants):

Model	Description
`fal-ai/flux/schnell`	Fast generation (default)
`fal-ai/flux/dev`	Higher quality
`fal-ai/flux-2`	Next generation
`fal-ai/flux-2-pro`	Professional quality
`fal-ai/flux-2-flex`	Flexible variant
`fal-ai/flux-2/edit`	Image editing
`fal-ai/flux-2/lora`	LoRA fine-tuning
`fal-ai/flux-pro/v1.1-ultra`	Ultra high quality
`fal-ai/flux-pro/kontext`	Context-aware generation
`fal-ai/flux-lora`	Custom style training
`fal-ai/flux-vision-upscaler`	AI upscaling
`fal-ai/flux-krea-trainer`	Model training
`fal-ai/flux-lora-fast-training`	Fast fine-tuning
`fal-ai/flux-lora-portrait-trainer`	Portrait specialist

Stable Diffusion: fal-ai/stable-diffusion-v15, fal-ai/stable-diffusion-v35-large, fal-ai/stable-diffusion-v35-medium, fal-ai/stable-diffusion-v3-medium

Other Image Models:

Model	Description
`fal-ai/recraft/v3/text-to-image`	Artistic generation
`fal-ai/ideogram/v2`, `v2a`, `v3`	Ideogram series
`fal-ai/imagen3`, `fal-ai/imagen4/preview`	Google Imagen
`fal-ai/gpt-image-1`	GPT image generation
`fal-ai/gpt-image-1/edit-image`	GPT image editing
`fal-ai/reve/text-to-image`	Reve generation
`fal-ai/sana`, `fal-ai/sana/sprint`	Sana models
`fal-ai/pixart-sigma`	PixArt Sigma
`fal-ai/bria/text-to-image/base`	Bria AI

Pre-trained LoRA Styles: fal-ai/flux-2-lora-gallery/sepia-vintage, virtual-tryon, satellite-view-style, realism, multiple-angles, hdr-style, face-to-full-portrait, digital-comic-art, ballpoint-pen-sketch, apartment-staging, add-background

Image Editing/Enhancement (30+ tools): fal-ai/image-editing/age-progression, baby-version, background-change, hair-change, expression-change, object-removal, photo-restoration, style-transfer, and many more.

Video Models (150+)

Kling Video (20+ variants):

Model	Description
`fal-ai/kling-video/v2/master/text-to-video`	Default text-to-video
`fal-ai/kling-video/v2/master/image-to-video`	Image-to-video
`fal-ai/kling-video/v2.5-turbo/pro/text-to-video`	Turbo pro
`fal-ai/kling-video/o1/image-to-video`	O1 quality
`fal-ai/kling-video/o1/video-to-video/edit`	Video editing
`fal-ai/kling-video/lipsync/audio-to-video`	Lip sync
`fal-ai/kling-video/video-to-audio`	Audio extraction

Sora 2 (OpenAI):

Model	Description
`fal-ai/sora-2/text-to-video`	Text-to-video
`fal-ai/sora-2/text-to-video/pro`	Pro quality
`fal-ai/sora-2/image-to-video`	Image-to-video
`fal-ai/sora-2/video-to-video/remix`	Video remixing

VEO 3 (Google):

Model	Description
`fal-ai/veo3`	VEO 3 standard
`fal-ai/veo3/fast`	Fast variant
`fal-ai/veo3/image-to-video`	Image-to-video
`fal-ai/veo3.1`	Latest version
`fal-ai/veo3.1/reference-to-video`	Reference-guided
`fal-ai/veo3.1/first-last-frame-to-video`	Frame interpolation

WAN (15+ variants): fal-ai/wan-pro/text-to-video, fal-ai/wan-pro/image-to-video, fal-ai/wan/v2.2-a14b/text-to-video, fal-ai/wan-vace-14b/depth, fal-ai/wan-vace-14b/inpainting, fal-ai/wan-vace-14b/pose, fal-ai/wan-effects

Pixverse (20+ variants): fal-ai/pixverse/v5.5/text-to-video, fal-ai/pixverse/v5.5/image-to-video, fal-ai/pixverse/v5.5/effects, fal-ai/pixverse/lipsync, fal-ai/pixverse/sound-effects

Minimax / Hailuo: fal-ai/minimax/hailuo-2.3/text-to-video/pro, fal-ai/minimax/hailuo-2.3/image-to-video/pro, fal-ai/minimax/video-01-director, fal-ai/minimax/video-01-live

Other Video Models:

Provider	Models
Hunyuan	`fal-ai/hunyuan-video/text-to-video`, `image-to-video`, `video-to-video`, `foley`
Pika	`fal-ai/pika/v2.2/text-to-video`, `pikascenes`, `pikaffects`
LTX	`fal-ai/ltx-2/text-to-video`, `image-to-video`, `retake-video`
Luma	`fal-ai/luma-dream-machine/ray-2`, `ray-2-flash`, `luma-photon`
Vidu	`fal-ai/vidu/q2/text-to-video`, `image-to-video/pro`
CogVideoX	`fal-ai/cogvideox-5b/text-to-video`, `video-to-video`
Seedance	`fal-ai/bytedance/seedance/v1/text-to-video`, `image-to-video`
Magi	`fal-ai/magi/text-to-video`, `extend-video`

TTS / Speech Models (50+)

Kokoro (9 languages, 20+ voices per language):

Model	Language	Example Voices
`fal-ai/kokoro/american-english`	English (US)	af_heart, af_alloy, af_bella, af_nova, am_adam, am_echo, am_onyx
`fal-ai/kokoro/british-english`	English (UK)	British voice set
`fal-ai/kokoro/french`	French	French voice set
`fal-ai/kokoro/japanese`	Japanese	Japanese voice set
`fal-ai/kokoro/spanish`	Spanish	Spanish voice set
`fal-ai/kokoro/mandarin-chinese`	Chinese	Mandarin voice set
`fal-ai/kokoro/italian`	Italian	Italian voice set
`fal-ai/kokoro/hindi`	Hindi	Hindi voice set
`fal-ai/kokoro/brazilian-portuguese`	Portuguese	Portuguese voice set

ElevenLabs:

Model	Description
`fal-ai/elevenlabs/tts/eleven-v3`	Professional quality
`fal-ai/elevenlabs/tts/turbo-v2.5`	Faster inference
`fal-ai/elevenlabs/tts/multilingual-v2`	Multi-language
`fal-ai/elevenlabs/text-to-dialogue/eleven-v3`	Dialogue generation
`fal-ai/elevenlabs/sound-effects/v2`	Sound effects
`fal-ai/elevenlabs/speech-to-text`	Transcription
`fal-ai/elevenlabs/audio-isolation`	Background removal

Other TTS: fal-ai/f5-tts (voice cloning), fal-ai/dia-tts, fal-ai/minimax/speech-2.6-turbo, fal-ai/minimax/speech-2.6-hd, fal-ai/chatterbox/text-to-speech, fal-ai/index-tts-2/text-to-speech

FAL Client Capabilities

The @fal-ai/client provides additional features beyond what Noosphere surfaces:

Queue API — Submit jobs, poll status, get results, cancel. Supports webhooks and priority levels
Streaming API — Real-time streaming responses via async iterators
Realtime API — WebSocket connections for interactive use (e.g., real-time image generation)
Storage API — File upload with configurable TTL (1h, 1d, 7d, 30d, 1y, never)
Retry logic — Configurable retries with exponential backoff and jitter
Request middleware — Custom request interceptors and proxy support

Hugging Face — Open Source AI (30+ tasks)

Provider ID: huggingface Modalities: LLM, Image, TTS Library: @huggingface/inference

Access to the entire Hugging Face Hub ecosystem. Any model hosted on HuggingFace can be used by passing its ID directly.

Default Models

Modality	Default Model	Description
LLM	`meta-llama/Llama-3.1-8B-Instruct`	Llama 3.1 8B
Image	`stabilityai/stable-diffusion-xl-base-1.0`	SDXL Base
TTS	`facebook/mms-tts-eng`	MMS TTS English

Any HuggingFace model ID works — just pass it as the model parameter:

await ai.chat({
  provider: 'huggingface',
  model: 'mistralai/Mixtral-8x7B-v0.1',
  messages: [{ role: 'user', content: 'Hello' }],
});

Full Library Capabilities

The @huggingface/inference library (v3.15.0) provides 30+ AI tasks, including capabilities not yet surfaced by Noosphere:

Natural Language Processing:

Task	Method	Description
Chat	`chatCompletion()`	OpenAI-compatible chat completions
Chat Streaming	`chatCompletionStream()`	Token-by-token streaming
Text Generation	`textGeneration()`	Raw text completion
Summarization	`summarization()`	Text summarization
Translation	`translation()`	Language translation
Question Answering	`questionAnswering()`	Extract answers from context
Text Classification	`textClassification()`	Sentiment, topic classification
Zero-Shot Classification	`zeroShotClassification()`	Classify without training
Token Classification	`tokenClassification()`	NER, POS tagging
Sentence Similarity	`sentenceSimilarity()`	Semantic similarity scores
Feature Extraction	`featureExtraction()`	Text embeddings
Fill Mask	`fillMask()`	Fill in masked tokens
Table QA	`tableQuestionAnswering()`	Answer questions about tables

Computer Vision:

Task	Method	Description
Text-to-Image	`textToImage()`	Generate images from text
Image-to-Image	`imageToImage()`	Transform/edit images
Image Captioning	`imageToText()`	Describe images
Classification	`imageClassification()`	Classify image content
Object Detection	`objectDetection()`	Detect and locate objects
Segmentation	`imageSegmentation()`	Pixel-level segmentation
Zero-Shot Image	`zeroShotImageClassification()`	Classify without training
Text-to-Video	`textToVideo()`	Generate videos

Audio:

Task	Method	Description
Text-to-Speech	`textToSpeech()`	Generate speech
Speech-to-Text	`automaticSpeechRecognition()`	Transcription
Audio Classification	`audioClassification()`	Classify sounds
Audio-to-Audio	`audioToAudio()`	Source separation, enhancement

Multimodal:

Task	Method	Description
Visual QA	`visualQuestionAnswering()`	Answer questions about images
Document QA	`documentQuestionAnswering()`	Answer questions about documents

Tabular:

Task	Method	Description
Classification	`tabularClassification()`	Classify tabular data
Regression	`tabularRegression()`	Predict continuous values

HuggingFace Agentic Features

Tool/Function Calling: Full support via tools parameter with tool_choice control (auto/none/required)
JSON Schema Responses: response_format: { type: 'json_schema', json_schema: {...} }
Reasoning: reasoning_effort parameter (none/minimal/low/medium/high/xhigh)
Multimodal Input: Images via image_url content chunks in chat messages
17 Inference Providers: Route through Groq, Together, Fireworks, Replicate, Cerebras, Cohere, and more

ComfyUI — Local Image Generation

Provider ID: comfyui Modalities: Image, Video (planned) Type: Local Default Port: 8188

Connects to a local ComfyUI instance for Stable Diffusion workflows.

How It Works

Clones a built-in txt2img workflow template (KSampler + SDXL pipeline)
Injects your parameters (prompt, dimensions, seed, steps, guidance)
POSTs the workflow to ComfyUI's /prompt endpoint
Polls /history/{promptId} every second until completion (max 5 minutes)
Fetches the generated image from /view
Returns a PNG buffer

Configuration

const ai = new Noosphere({
  local: {
    comfyui: {
      enabled: true,
      host: 'http://localhost',
      port: 8188,
    },
  },
});

Default Workflow

Checkpoint: sd_xl_base_1.0.safetensors
Sampler: euler with normal scheduler
Default Steps: 20
Default CFG/Guidance: 7
Default Size: 1024x1024
Max Size: 2048x2048
Output: PNG

Models Exposed

Model ID	Modality	Description
`comfyui-txt2img`	Image	Text-to-image via workflow
`comfyui-txt2vid`	Video	Planned (requires AnimateDiff workflow)

Local TTS — Piper & Kokoro

Provider IDs: piper, kokoro Modality: TTS Type: Local

Connects to local OpenAI-compatible TTS servers.

Supported Engines

Engine	Default Port	Health Check	Voice Discovery
Piper	5500	`GET /health`	`GET /voices`
Kokoro	5501	`GET /health`	`GET /v1/models` (fallback)

API

Uses the OpenAI-compatible TTS endpoint:

POST /v1/audio/speech
{
  "model": "tts-1",
  "input": "Hello world",
  "voice": "default",
  "speed": 1.0,
  "response_format": "mp3"
}

Supports mp3, wav, and ogg formats. Returns audio as a Buffer.

Architecture

Provider Resolution (Local-First)

When you call a generation method without specifying a provider, Noosphere resolves one automatically:

If model is specified without provider → looks up model in registry cache
If a default is configured for the modality → uses that
Otherwise → local providers first, then cloud providers

resolveProvider(modality):
  1. Check user-specified provider ID → return if found
  2. Check configured defaults → return if found
  3. Scan all providers:
     → Return first LOCAL provider supporting this modality
     → Fallback to first CLOUD provider
  4. Throw NO_PROVIDER error

Retry & Failover Logic

executeWithRetry(modality, provider, fn):
  for attempt = 0..maxRetries:
    try: return fn()
    catch:
      if error is retryable AND attempts remain:
        wait backoffMs * 2^attempt (exponential backoff)
        retry same provider
      if error is NOT GENERATION_FAILED AND failover enabled:
        try each alternative provider for this modality
      throw last error

Retryable errors (same provider): PROVIDER_UNAVAILABLE, RATE_LIMITED, TIMEOUT, GENERATION_FAILED

Failover-eligible errors (cross-provider): PROVIDER_UNAVAILABLE, RATE_LIMITED, TIMEOUT (NOT GENERATION_FAILED)

Model Registry & Caching

Models are fetched from providers via listModels() and cached in memory
Cache TTL is configurable (default: 60 minutes)
syncModels() forces a refresh of all provider model lists
Registry tracks model → provider mappings for fast resolution

Usage Tracking

Every API call (success or failure) records a UsageEvent:

interface UsageEvent {
  modality: 'llm' | 'image' | 'video' | 'tts';
  provider: string;
  model: string;
  cost: number;           // USD
  latencyMs: number;
  input?: number;         // tokens or characters
  output?: number;        // tokens
  unit?: string;
  timestamp: string;      // ISO 8601
  success: boolean;
  error?: string;         // error message if failed
  metadata?: Record<string, unknown>;
}

Error Handling

All errors are instances of NoosphereError:

import { NoosphereError } from 'noosphere';

try {
  await ai.chat({ messages: [{ role: 'user', content: 'Hello' }] });
} catch (err) {
  if (err instanceof NoosphereError) {
    console.log(err.code);           // error code
    console.log(err.provider);       // which provider failed
    console.log(err.modality);       // which modality
    console.log(err.model);          // which model (if known)
    console.log(err.cause);          // underlying error
    console.log(err.isRetryable());  // whether retry might help
  }
}

Error Codes

Code	Description	Retryable	Failover
`PROVIDER_UNAVAILABLE`	Provider is down or unreachable	Yes	Yes
`RATE_LIMITED`	API rate limit exceeded	Yes	Yes
`TIMEOUT`	Request exceeded timeout	Yes	Yes
`GENERATION_FAILED`	Generation error (bad prompt, model issue)	Yes	No
`AUTH_FAILED`	Invalid or missing API key	No	No
`MODEL_NOT_FOUND`	Requested model doesn't exist	No	No
`INVALID_INPUT`	Bad parameters or unsupported operation	No	No
`NO_PROVIDER`	No provider available for the requested modality	No	No

Custom Providers

Extend Noosphere with your own providers:

import type { NoosphereProvider, ModelInfo, ChatOptions, NoosphereResult, Modality } from 'noosphere';

const myProvider: NoosphereProvider = {
  // Required properties
  id: 'my-provider',
  name: 'My Custom Provider',
  modalities: ['llm', 'image'] as Modality[],
  isLocal: false,

  // Required methods
  async ping() { return true; },
  async listModels(modality?: Modality): Promise<ModelInfo[]> {
    return [{
      id: 'my-model',
      provider: 'my-provider',
      name: 'My Model',
      modality: 'llm',
      local: false,
      cost: { price: 1.0, unit: 'per_1m_tokens' },
      capabilities: {
        contextWindow: 128000,
        maxTokens: 4096,
        supportsVision: false,
        supportsStreaming: true,
      },
    }];
  },

  // Optional methods — implement per modality
  async chat(options: ChatOptions): Promise<NoosphereResult> {
    const start = Date.now();
    // ... your implementation
    return {
      content: 'Response text',
      provider: 'my-provider',
      model: 'my-model',
      modality: 'llm',
      latencyMs: Date.now() - start,
      usage: { cost: 0.001, input: 100, output: 50, unit: 'tokens' },
    };
  },

  // stream?(options): NoosphereStream
  // image?(options): Promise<NoosphereResult>
  // video?(options): Promise<NoosphereResult>
  // speak?(options): Promise<NoosphereResult>
  // dispose?(): Promise<void>
};

ai.registerProvider(myProvider);

Provider Summary

Provider	ID	Modalities	Type	Models	Library
Pi-AI Gateway	`pi-ai`	LLM	Cloud	246+	`@mariozechner/pi-ai`
FAL.ai	`fal`	Image, Video, TTS	Cloud	867+	`@fal-ai/client`
Hugging Face	`huggingface`	LLM, Image, TTS	Cloud	Unlimited (any HF model)	`@huggingface/inference`
ComfyUI	`comfyui`	Image	Local	SDXL workflows	Direct HTTP
Piper TTS	`piper`	TTS	Local	Piper voices	Direct HTTP
Kokoro TTS	`kokoro`	TTS	Local	Kokoro voices	Direct HTTP

Requirements

Node.js >= 18.0.0

License

MIT