Package Exports
- noosphere
Readme
noosphere
Unified AI creation engine — text, image, video, and audio generation across all providers through a single interface.
One import. Every model. Every modality.
Features
- 4 modalities — LLM chat, image generation, video generation, and text-to-speech
- 246+ LLM models — via Pi-AI gateway (OpenAI, Anthropic, Google, Groq, Mistral, xAI, Cerebras, OpenRouter)
- 867+ media endpoints — via FAL (Flux, SDXL, Kling, Sora 2, VEO 3, Kokoro, ElevenLabs, and hundreds more)
- 30+ HuggingFace tasks — LLM, image, TTS, translation, summarization, classification, and more
- Local-first architecture — Auto-detects ComfyUI, Ollama, Piper, and Kokoro on your machine
- Agentic capabilities — Tool use, function calling, reasoning/thinking, vision, and agent loops via Pi-AI
- Failover & retry — Automatic retries with exponential backoff and cross-provider failover
- Usage tracking — Real-time cost, latency, and token tracking across all providers
- TypeScript-first — Full type definitions with ESM and CommonJS support
Install
npm install noosphereQuick Start
import { Noosphere } from 'noosphere';
const ai = new Noosphere();
// Chat with any LLM
const response = await ai.chat({
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.content);
// Generate an image
const image = await ai.image({
prompt: 'A sunset over mountains',
width: 1024,
height: 1024,
});
console.log(image.url);
// Generate a video
const video = await ai.video({
prompt: 'Ocean waves crashing on rocks',
duration: 5,
});
console.log(video.url);
// Text-to-speech
const audio = await ai.speak({
text: 'Welcome to Noosphere',
voice: 'alloy',
format: 'mp3',
});
// audio.buffer contains the audio dataConfiguration
API keys are resolved from the constructor config or environment variables (config takes priority):
const ai = new Noosphere({
keys: {
openai: 'sk-...',
anthropic: 'sk-ant-...',
google: 'AIza...',
fal: 'fal-...',
huggingface: 'hf_...',
groq: 'gsk_...',
mistral: '...',
xai: '...',
openrouter: 'sk-or-...',
},
});Or set environment variables:
| Variable | Provider |
|---|---|
OPENAI_API_KEY |
OpenAI |
ANTHROPIC_API_KEY |
Anthropic |
GEMINI_API_KEY |
Google Gemini |
FAL_KEY |
FAL.ai |
HUGGINGFACE_TOKEN |
Hugging Face |
GROQ_API_KEY |
Groq |
MISTRAL_API_KEY |
Mistral |
XAI_API_KEY |
xAI (Grok) |
OPENROUTER_API_KEY |
OpenRouter |
Full Configuration Reference
const ai = new Noosphere({
// API keys (or use env vars above)
keys: { /* ... */ },
// Default models per modality
defaults: {
llm: { provider: 'pi-ai', model: 'claude-sonnet-4-20250514' },
image: { provider: 'fal', model: 'fal-ai/flux/schnell' },
video: { provider: 'fal', model: 'fal-ai/kling-video/v2/master/text-to-video' },
tts: { provider: 'fal', model: 'fal-ai/kokoro/american-english' },
},
// Local service configuration
autoDetectLocal: true, // env: NOOSPHERE_AUTO_DETECT_LOCAL
local: {
ollama: { enabled: true, host: 'http://localhost', port: 11434 },
comfyui: { enabled: true, host: 'http://localhost', port: 8188 },
piper: { enabled: true, host: 'http://localhost', port: 5500 },
kokoro: { enabled: true, host: 'http://localhost', port: 5501 },
custom: [], // additional LocalServiceConfig[]
},
// Retry & failover
retry: {
maxRetries: 2, // default: 2
backoffMs: 1000, // default: 1000 (exponential: 1s, 2s, 4s...)
failover: true, // default: true — try other providers on failure
retryableErrors: ['PROVIDER_UNAVAILABLE', 'RATE_LIMITED', 'TIMEOUT'],
},
// Timeouts per modality (ms)
timeout: {
llm: 30000, // 30s
image: 120000, // 2min
video: 300000, // 5min
tts: 60000, // 1min
},
// Model discovery cache (minutes)
discoveryCacheTTL: 60, // env: NOOSPHERE_DISCOVERY_CACHE_TTL
// Real-time usage callback
onUsage: (event) => {
console.log(`${event.provider}/${event.model}: $${event.cost} (${event.latencyMs}ms)`);
},
});Local Service Environment Variables
| Variable | Default | Description |
|---|---|---|
OLLAMA_HOST |
http://localhost |
Ollama server host |
OLLAMA_PORT |
11434 |
Ollama server port |
COMFYUI_HOST |
http://localhost |
ComfyUI server host |
COMFYUI_PORT |
8188 |
ComfyUI server port |
PIPER_HOST |
http://localhost |
Piper TTS server host |
PIPER_PORT |
5500 |
Piper TTS server port |
KOKORO_HOST |
http://localhost |
Kokoro TTS server host |
KOKORO_PORT |
5501 |
Kokoro TTS server port |
NOOSPHERE_AUTO_DETECT_LOCAL |
true |
Enable/disable local service auto-detection |
NOOSPHERE_DISCOVERY_CACHE_TTL |
60 |
Model cache TTL in minutes |
API Reference
new Noosphere(config?)
Creates a new instance. Providers are initialized lazily on first API call. Auto-detects local services via HTTP pings (2s timeout each).
Generation Methods
ai.chat(options): Promise<NoosphereResult>
Generate text with any LLM. Supports 246+ models across 8 providers.
const result = await ai.chat({
provider: 'anthropic', // optional — auto-resolved if omitted
model: 'claude-sonnet-4-20250514', // optional — uses default or first available
messages: [
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'Explain quantum computing' },
],
temperature: 0.7, // optional (0-2)
maxTokens: 1024, // optional
jsonMode: false, // optional
});
console.log(result.content); // response text
console.log(result.thinking); // reasoning output (Claude, GPT-5, o3, Gemini, Grok-4)
console.log(result.usage.cost); // cost in USD
console.log(result.usage.input); // input tokens
console.log(result.usage.output); // output tokens
console.log(result.latencyMs); // response time in msai.stream(options): NoosphereStream
Stream LLM responses token-by-token. Same options as chat().
const stream = ai.stream({
messages: [{ role: 'user', content: 'Write a story' }],
});
for await (const event of stream) {
switch (event.type) {
case 'text_delta':
process.stdout.write(event.delta!);
break;
case 'thinking_delta':
console.log('[thinking]', event.delta);
break;
case 'done':
console.log('\n\nUsage:', event.result!.usage);
break;
case 'error':
console.error(event.error);
break;
}
}
// Or consume the full result
const result = await stream.result();
// Abort at any time
stream.abort();ai.image(options): Promise<NoosphereResult>
Generate images. Supports 200+ image models via FAL, HuggingFace, and ComfyUI.
const result = await ai.image({
provider: 'fal', // optional
model: 'fal-ai/flux-2-pro', // optional
prompt: 'A futuristic cityscape at sunset',
negativePrompt: 'blurry, low quality', // optional
width: 1024, // optional
height: 768, // optional
seed: 42, // optional — reproducible results
steps: 30, // optional — inference steps (more = higher quality)
guidanceScale: 7.5, // optional — prompt adherence (higher = stricter)
});
console.log(result.url); // image URL (FAL)
console.log(result.buffer); // image Buffer (HuggingFace, ComfyUI)
console.log(result.media?.width); // actual dimensions
console.log(result.media?.height);
console.log(result.media?.format); // 'png'ai.video(options): Promise<NoosphereResult>
Generate videos. Supports 150+ video models via FAL (Kling, Sora 2, VEO 3, WAN, Pixverse, and more).
const result = await ai.video({
provider: 'fal',
model: 'fal-ai/kling-video/v2/master/text-to-video',
prompt: 'A bird flying through clouds',
imageUrl: 'https://...', // optional — image-to-video
duration: 5, // optional — seconds
fps: 24, // optional
width: 1280, // optional
height: 720, // optional
});
console.log(result.url); // video URL
console.log(result.media?.duration); // actual duration
console.log(result.media?.fps); // frames per second
console.log(result.media?.format); // 'mp4'ai.speak(options): Promise<NoosphereResult>
Text-to-speech synthesis. Supports 50+ TTS models via FAL, HuggingFace, Piper, and Kokoro.
const result = await ai.speak({
provider: 'fal',
model: 'fal-ai/kokoro/american-english',
text: 'Hello world',
voice: 'af_heart', // optional — voice ID
language: 'en', // optional
speed: 1.0, // optional
format: 'mp3', // optional — 'mp3' | 'wav' | 'ogg'
});
console.log(result.buffer); // audio Buffer
console.log(result.url); // audio URL (FAL)Discovery Methods
ai.getProviders(modality?): Promise<ProviderInfo[]>
List available providers, optionally filtered by modality.
const providers = await ai.getProviders('llm');
// [{ id: 'pi-ai', name: 'Pi-AI', modalities: ['llm'], local: false, status: 'online', modelCount: 246 }]ai.getModels(modality?): Promise<ModelInfo[]>
List all available models with full metadata.
const models = await ai.getModels('image');
// Returns ModelInfo[] with id, provider, name, modality, local, cost, capabilitiesai.getModel(provider, modelId): Promise<ModelInfo | null>
Get details about a specific model.
ai.syncModels(): Promise<SyncResult>
Refresh model lists from all providers. Returns sync count, per-provider breakdown, and any errors.
Usage Tracking
ai.getUsage(options?): UsageSummary
Get aggregated usage statistics with optional filtering.
const usage = ai.getUsage({
since: '2024-01-01', // optional — ISO date or Date object
until: '2024-12-31', // optional
provider: 'openai', // optional — filter by provider
modality: 'llm', // optional — filter by modality
});
console.log(usage.totalCost); // total USD spent
console.log(usage.totalRequests); // number of requests
console.log(usage.byProvider); // { openai: 2.50, anthropic: 1.20, fal: 0.30 }
console.log(usage.byModality); // { llm: 3.00, image: 0.70, video: 0.30, tts: 0.00 }Lifecycle
ai.registerProvider(provider): void
Register a custom provider (see Custom Providers).
ai.dispose(): Promise<void>
Cleanup all provider resources, clear model cache, and reset usage tracker.
NoosphereResult
Every generation method returns a NoosphereResult:
interface NoosphereResult {
content?: string; // LLM response text
thinking?: string; // reasoning/thinking output (supported models)
url?: string; // media URL (images, videos, audio from cloud providers)
buffer?: Buffer; // media binary data (local providers, HuggingFace)
provider: string; // which provider handled the request
model: string; // which model was used
modality: Modality; // 'llm' | 'image' | 'video' | 'tts'
latencyMs: number; // request duration in milliseconds
usage: {
cost: number; // cost in USD
input?: number; // input tokens/characters
output?: number; // output tokens
unit?: string; // 'tokens' | 'characters' | 'per_image' | 'per_second' | 'free'
};
media?: {
width?: number; // image/video width
height?: number; // image/video height
duration?: number; // video/audio duration in seconds
format?: string; // 'png' | 'mp4' | 'mp3' | 'wav'
fps?: number; // video frames per second
};
}Providers In Depth
Pi-AI — LLM Gateway (246+ models)
Provider ID: pi-ai
Modalities: LLM (chat + streaming)
Library: @mariozechner/pi-ai
A unified gateway that routes to 8 LLM providers through 4 different API protocols:
| API Protocol | Providers |
|---|---|
anthropic-messages |
Anthropic |
google-generative-ai |
|
openai-responses |
OpenAI (reasoning models) |
openai-completions |
OpenAI, xAI, Groq, Cerebras, Zai, OpenRouter |
Anthropic Models (19)
| Model | Context | Reasoning | Vision | Input Cost | Output Cost |
|---|---|---|---|---|---|
claude-opus-4-0 |
200k | Yes | Yes | $15/M | $75/M |
claude-opus-4-1 |
200k | Yes | Yes | $15/M | $75/M |
claude-sonnet-4-20250514 |
200k | Yes | Yes | $3/M | $15/M |
claude-sonnet-4-5-20250929 |
200k | Yes | Yes | $3/M | $15/M |
claude-3-7-sonnet-20250219 |
200k | Yes | Yes | $3/M | $15/M |
claude-3-5-sonnet-20241022 |
200k | No | Yes | $3/M | $15/M |
claude-haiku-4-5-20251001 |
200k | No | Yes | $0.80/M | $4/M |
claude-3-5-haiku-20241022 |
200k | No | Yes | $0.80/M | $4/M |
claude-3-haiku-20240307 |
200k | No | Yes | $0.25/M | $1.25/M |
| ...and 10 more variants |
OpenAI Models (24)
| Model | Context | Reasoning | Vision | Input Cost | Output Cost |
|---|---|---|---|---|---|
gpt-5 |
200k | Yes | Yes | $10/M | $30/M |
gpt-5-mini |
200k | Yes | Yes | $2.50/M | $10/M |
gpt-4.1 |
128k | No | Yes | $2/M | $8/M |
gpt-4.1-mini |
128k | No | Yes | $0.40/M | $1.60/M |
gpt-4.1-nano |
128k | No | Yes | $0.10/M | $0.40/M |
gpt-4o |
128k | No | Yes | $2.50/M | $10/M |
gpt-4o-mini |
128k | No | Yes | $0.15/M | $0.60/M |
o3-pro |
200k | Yes | Yes | $20/M | $80/M |
o3-mini |
200k | Yes | Yes | $1.10/M | $4.40/M |
o4-mini |
200k | Yes | Yes | $1.10/M | $4.40/M |
codex-mini-latest |
200k | Yes | No | $1.50/M | $6/M |
| ...and 13 more variants |
Google Gemini Models (19)
| Model | Context | Reasoning | Vision | Cost |
|---|---|---|---|---|
gemini-2.5-flash |
1M | Yes | Yes | $0.15-0.60/M |
gemini-2.5-pro |
1M | Yes | Yes | $1.25-10/M |
gemini-2.0-flash |
1M | No | Yes | $0.10-0.40/M |
gemini-2.0-flash-lite |
1M | No | Yes | $0.025-0.10/M |
gemini-1.5-flash |
1M | No | Yes | $0.075-0.30/M |
gemini-1.5-pro |
2M | No | Yes | $1.25-5/M |
| ...and 13 more variants |
xAI Grok Models (20)
| Model | Context | Reasoning | Vision | Input Cost |
|---|---|---|---|---|
grok-4 |
256k | Yes | Yes | $5/M |
grok-4-fast |
256k | Yes | Yes | $3/M |
grok-3 |
131k | No | Yes | $3/M |
grok-3-fast |
131k | No | Yes | $5/M |
grok-3-mini-fast-latest |
131k | Yes | No | $0.30/M |
grok-2-vision |
32k | No | Yes | $2/M |
| ...and 14 more variants |
Groq Models (15)
| Model | Context | Cost |
|---|---|---|
llama-3.3-70b-versatile |
128k | $0.59/M |
llama-3.1-8b-instant |
128k | $0.05/M |
mistral-saba-24b |
32k | $0.40/M |
qwen-qwq-32b |
128k | $0.29/M |
deepseek-r1-distill-llama-70b |
128k | $0.75/M |
| ...and 10 more |
Cerebras Models (3)
gpt-oss-120b, qwen-3-235b-a22b-instruct-2507, qwen-3-coder-480b
Zai Models (5)
glm-4.6, glm-4.5, glm-4.5-flash, glm-4.5v, glm-4.5-air
OpenRouter (141 models)
Aggregator providing access to hundreds of additional models including Llama, Deepseek, Mistral, Qwen, and many more. Full list available via ai.getModels('llm').
Agentic Capabilities (via Pi-AI library)
The underlying @mariozechner/pi-ai library exposes powerful agentic features. While Noosphere currently surfaces chat and streaming, the library provides:
Tool Use / Function Calling:
// Supported across Anthropic, OpenAI, Google, xAI, Groq
// Tool definitions use TypeBox schemas for runtime validation
interface Tool<TParameters extends TSchema = TSchema> {
name: string;
description: string;
parameters: TParameters; // TypeBox schema — validated at runtime with AJV
}Reasoning / Thinking:
- Anthropic:
thinkingEnabled,thinkingBudgetTokens— Claude Opus/Sonnet extended thinking - OpenAI:
reasoningEffort(minimal/low/medium/high) — o1/o3/o4/GPT-5 reasoning - Google:
thinking.enabled,thinking.budgetTokens— Gemini 2.5 thinking - xAI: Grok-4 native reasoning
- Thinking blocks are automatically extracted and streamed as separate
thinking_deltaevents
Vision / Multimodal Input:
// Send images alongside text to vision-capable models
{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{ type: "image", data: base64String, mimeType: "image/png" }
]
}Agent Loop:
// Built-in agentic execution loop with automatic tool calling
import { agentLoop } from '@mariozechner/pi-ai';
const events = agentLoop(prompt, context, {
tools: [myTool],
model: getModel('anthropic', 'claude-sonnet-4-20250514'),
});
for await (const event of events) {
// event.type: agent_start → turn_start → message_start →
// message_update → tool_execution_start → tool_execution_end →
// message_end → turn_end → agent_end
}Cost Tracking per Model:
// Costs tracked per 1M tokens with cache-aware pricing
{
input: number, // cost per 1M input tokens
output: number, // cost per 1M output tokens
cacheRead: number, // prompt cache hit cost
cacheWrite: number, // prompt cache write cost
}FAL — Media Generation (867+ endpoints)
Provider ID: fal
Modalities: Image, Video, TTS
Library: @fal-ai/client
The largest media generation provider with dynamic pricing fetched at runtime from https://api.fal.ai/v1/models/pricing.
Image Models (200+)
FLUX Family (20+ variants):
| Model | Description |
|---|---|
fal-ai/flux/schnell |
Fast generation (default) |
fal-ai/flux/dev |
Higher quality |
fal-ai/flux-2 |
Next generation |
fal-ai/flux-2-pro |
Professional quality |
fal-ai/flux-2-flex |
Flexible variant |
fal-ai/flux-2/edit |
Image editing |
fal-ai/flux-2/lora |
LoRA fine-tuning |
fal-ai/flux-pro/v1.1-ultra |
Ultra high quality |
fal-ai/flux-pro/kontext |
Context-aware generation |
fal-ai/flux-lora |
Custom style training |
fal-ai/flux-vision-upscaler |
AI upscaling |
fal-ai/flux-krea-trainer |
Model training |
fal-ai/flux-lora-fast-training |
Fast fine-tuning |
fal-ai/flux-lora-portrait-trainer |
Portrait specialist |
Stable Diffusion:
fal-ai/stable-diffusion-v15, fal-ai/stable-diffusion-v35-large, fal-ai/stable-diffusion-v35-medium, fal-ai/stable-diffusion-v3-medium
Other Image Models:
| Model | Description |
|---|---|
fal-ai/recraft/v3/text-to-image |
Artistic generation |
fal-ai/ideogram/v2, v2a, v3 |
Ideogram series |
fal-ai/imagen3, fal-ai/imagen4/preview |
Google Imagen |
fal-ai/gpt-image-1 |
GPT image generation |
fal-ai/gpt-image-1/edit-image |
GPT image editing |
fal-ai/reve/text-to-image |
Reve generation |
fal-ai/sana, fal-ai/sana/sprint |
Sana models |
fal-ai/pixart-sigma |
PixArt Sigma |
fal-ai/bria/text-to-image/base |
Bria AI |
Pre-trained LoRA Styles:
fal-ai/flux-2-lora-gallery/sepia-vintage, virtual-tryon, satellite-view-style, realism, multiple-angles, hdr-style, face-to-full-portrait, digital-comic-art, ballpoint-pen-sketch, apartment-staging, add-background
Image Editing/Enhancement (30+ tools):
fal-ai/image-editing/age-progression, baby-version, background-change, hair-change, expression-change, object-removal, photo-restoration, style-transfer, and many more.
Video Models (150+)
Kling Video (20+ variants):
| Model | Description |
|---|---|
fal-ai/kling-video/v2/master/text-to-video |
Default text-to-video |
fal-ai/kling-video/v2/master/image-to-video |
Image-to-video |
fal-ai/kling-video/v2.5-turbo/pro/text-to-video |
Turbo pro |
fal-ai/kling-video/o1/image-to-video |
O1 quality |
fal-ai/kling-video/o1/video-to-video/edit |
Video editing |
fal-ai/kling-video/lipsync/audio-to-video |
Lip sync |
fal-ai/kling-video/video-to-audio |
Audio extraction |
Sora 2 (OpenAI):
| Model | Description |
|---|---|
fal-ai/sora-2/text-to-video |
Text-to-video |
fal-ai/sora-2/text-to-video/pro |
Pro quality |
fal-ai/sora-2/image-to-video |
Image-to-video |
fal-ai/sora-2/video-to-video/remix |
Video remixing |
VEO 3 (Google):
| Model | Description |
|---|---|
fal-ai/veo3 |
VEO 3 standard |
fal-ai/veo3/fast |
Fast variant |
fal-ai/veo3/image-to-video |
Image-to-video |
fal-ai/veo3.1 |
Latest version |
fal-ai/veo3.1/reference-to-video |
Reference-guided |
fal-ai/veo3.1/first-last-frame-to-video |
Frame interpolation |
WAN (15+ variants):
fal-ai/wan-pro/text-to-video, fal-ai/wan-pro/image-to-video, fal-ai/wan/v2.2-a14b/text-to-video, fal-ai/wan-vace-14b/depth, fal-ai/wan-vace-14b/inpainting, fal-ai/wan-vace-14b/pose, fal-ai/wan-effects
Pixverse (20+ variants):
fal-ai/pixverse/v5.5/text-to-video, fal-ai/pixverse/v5.5/image-to-video, fal-ai/pixverse/v5.5/effects, fal-ai/pixverse/lipsync, fal-ai/pixverse/sound-effects
Minimax / Hailuo:
fal-ai/minimax/hailuo-2.3/text-to-video/pro, fal-ai/minimax/hailuo-2.3/image-to-video/pro, fal-ai/minimax/video-01-director, fal-ai/minimax/video-01-live
Other Video Models:
| Provider | Models |
|---|---|
| Hunyuan | fal-ai/hunyuan-video/text-to-video, image-to-video, video-to-video, foley |
| Pika | fal-ai/pika/v2.2/text-to-video, pikascenes, pikaffects |
| LTX | fal-ai/ltx-2/text-to-video, image-to-video, retake-video |
| Luma | fal-ai/luma-dream-machine/ray-2, ray-2-flash, luma-photon |
| Vidu | fal-ai/vidu/q2/text-to-video, image-to-video/pro |
| CogVideoX | fal-ai/cogvideox-5b/text-to-video, video-to-video |
| Seedance | fal-ai/bytedance/seedance/v1/text-to-video, image-to-video |
| Magi | fal-ai/magi/text-to-video, extend-video |
TTS / Speech Models (50+)
Kokoro (9 languages, 20+ voices per language):
| Model | Language | Example Voices |
|---|---|---|
fal-ai/kokoro/american-english |
English (US) | af_heart, af_alloy, af_bella, af_nova, am_adam, am_echo, am_onyx |
fal-ai/kokoro/british-english |
English (UK) | British voice set |
fal-ai/kokoro/french |
French | French voice set |
fal-ai/kokoro/japanese |
Japanese | Japanese voice set |
fal-ai/kokoro/spanish |
Spanish | Spanish voice set |
fal-ai/kokoro/mandarin-chinese |
Chinese | Mandarin voice set |
fal-ai/kokoro/italian |
Italian | Italian voice set |
fal-ai/kokoro/hindi |
Hindi | Hindi voice set |
fal-ai/kokoro/brazilian-portuguese |
Portuguese | Portuguese voice set |
ElevenLabs:
| Model | Description |
|---|---|
fal-ai/elevenlabs/tts/eleven-v3 |
Professional quality |
fal-ai/elevenlabs/tts/turbo-v2.5 |
Faster inference |
fal-ai/elevenlabs/tts/multilingual-v2 |
Multi-language |
fal-ai/elevenlabs/text-to-dialogue/eleven-v3 |
Dialogue generation |
fal-ai/elevenlabs/sound-effects/v2 |
Sound effects |
fal-ai/elevenlabs/speech-to-text |
Transcription |
fal-ai/elevenlabs/audio-isolation |
Background removal |
Other TTS:
fal-ai/f5-tts (voice cloning), fal-ai/dia-tts, fal-ai/minimax/speech-2.6-turbo, fal-ai/minimax/speech-2.6-hd, fal-ai/chatterbox/text-to-speech, fal-ai/index-tts-2/text-to-speech
FAL Client Capabilities
The @fal-ai/client provides additional features beyond what Noosphere surfaces:
- Queue API — Submit jobs, poll status, get results, cancel. Supports webhooks and priority levels
- Streaming API — Real-time streaming responses via async iterators
- Realtime API — WebSocket connections for interactive use (e.g., real-time image generation)
- Storage API — File upload with configurable TTL (1h, 1d, 7d, 30d, 1y, never)
- Retry logic — Configurable retries with exponential backoff and jitter
- Request middleware — Custom request interceptors and proxy support
Hugging Face — Open Source AI (30+ tasks)
Provider ID: huggingface
Modalities: LLM, Image, TTS
Library: @huggingface/inference
Access to the entire Hugging Face Hub ecosystem. Any model hosted on HuggingFace can be used by passing its ID directly.
Default Models
| Modality | Default Model | Description |
|---|---|---|
| LLM | meta-llama/Llama-3.1-8B-Instruct |
Llama 3.1 8B |
| Image | stabilityai/stable-diffusion-xl-base-1.0 |
SDXL Base |
| TTS | facebook/mms-tts-eng |
MMS TTS English |
Any HuggingFace model ID works — just pass it as the model parameter:
await ai.chat({
provider: 'huggingface',
model: 'mistralai/Mixtral-8x7B-v0.1',
messages: [{ role: 'user', content: 'Hello' }],
});Full Library Capabilities
The @huggingface/inference library (v3.15.0) provides 30+ AI tasks, including capabilities not yet surfaced by Noosphere:
Natural Language Processing:
| Task | Method | Description |
|---|---|---|
| Chat | chatCompletion() |
OpenAI-compatible chat completions |
| Chat Streaming | chatCompletionStream() |
Token-by-token streaming |
| Text Generation | textGeneration() |
Raw text completion |
| Summarization | summarization() |
Text summarization |
| Translation | translation() |
Language translation |
| Question Answering | questionAnswering() |
Extract answers from context |
| Text Classification | textClassification() |
Sentiment, topic classification |
| Zero-Shot Classification | zeroShotClassification() |
Classify without training |
| Token Classification | tokenClassification() |
NER, POS tagging |
| Sentence Similarity | sentenceSimilarity() |
Semantic similarity scores |
| Feature Extraction | featureExtraction() |
Text embeddings |
| Fill Mask | fillMask() |
Fill in masked tokens |
| Table QA | tableQuestionAnswering() |
Answer questions about tables |
Computer Vision:
| Task | Method | Description |
|---|---|---|
| Text-to-Image | textToImage() |
Generate images from text |
| Image-to-Image | imageToImage() |
Transform/edit images |
| Image Captioning | imageToText() |
Describe images |
| Classification | imageClassification() |
Classify image content |
| Object Detection | objectDetection() |
Detect and locate objects |
| Segmentation | imageSegmentation() |
Pixel-level segmentation |
| Zero-Shot Image | zeroShotImageClassification() |
Classify without training |
| Text-to-Video | textToVideo() |
Generate videos |
Audio:
| Task | Method | Description |
|---|---|---|
| Text-to-Speech | textToSpeech() |
Generate speech |
| Speech-to-Text | automaticSpeechRecognition() |
Transcription |
| Audio Classification | audioClassification() |
Classify sounds |
| Audio-to-Audio | audioToAudio() |
Source separation, enhancement |
Multimodal:
| Task | Method | Description |
|---|---|---|
| Visual QA | visualQuestionAnswering() |
Answer questions about images |
| Document QA | documentQuestionAnswering() |
Answer questions about documents |
Tabular:
| Task | Method | Description |
|---|---|---|
| Classification | tabularClassification() |
Classify tabular data |
| Regression | tabularRegression() |
Predict continuous values |
HuggingFace Agentic Features
- Tool/Function Calling: Full support via
toolsparameter withtool_choicecontrol (auto/none/required) - JSON Schema Responses:
response_format: { type: 'json_schema', json_schema: {...} } - Reasoning:
reasoning_effortparameter (none/minimal/low/medium/high/xhigh) - Multimodal Input: Images via
image_urlcontent chunks in chat messages - 17 Inference Providers: Route through Groq, Together, Fireworks, Replicate, Cerebras, Cohere, and more
ComfyUI — Local Image Generation
Provider ID: comfyui
Modalities: Image, Video (planned)
Type: Local
Default Port: 8188
Connects to a local ComfyUI instance for Stable Diffusion workflows.
How It Works
- Clones a built-in txt2img workflow template (KSampler + SDXL pipeline)
- Injects your parameters (prompt, dimensions, seed, steps, guidance)
- POSTs the workflow to ComfyUI's
/promptendpoint - Polls
/history/{promptId}every second until completion (max 5 minutes) - Fetches the generated image from
/view - Returns a PNG buffer
Configuration
const ai = new Noosphere({
local: {
comfyui: {
enabled: true,
host: 'http://localhost',
port: 8188,
},
},
});Default Workflow
- Checkpoint:
sd_xl_base_1.0.safetensors - Sampler: euler with normal scheduler
- Default Steps: 20
- Default CFG/Guidance: 7
- Default Size: 1024x1024
- Max Size: 2048x2048
- Output: PNG
Models Exposed
| Model ID | Modality | Description |
|---|---|---|
comfyui-txt2img |
Image | Text-to-image via workflow |
comfyui-txt2vid |
Video | Planned (requires AnimateDiff workflow) |
Local TTS — Piper & Kokoro
Provider IDs: piper, kokoro
Modality: TTS
Type: Local
Connects to local OpenAI-compatible TTS servers.
Supported Engines
| Engine | Default Port | Health Check | Voice Discovery |
|---|---|---|---|
| Piper | 5500 | GET /health |
GET /voices |
| Kokoro | 5501 | GET /health |
GET /v1/models (fallback) |
API
Uses the OpenAI-compatible TTS endpoint:
POST /v1/audio/speech
{
"model": "tts-1",
"input": "Hello world",
"voice": "default",
"speed": 1.0,
"response_format": "mp3"
}Supports mp3, wav, and ogg formats. Returns audio as a Buffer.
Architecture
Provider Resolution (Local-First)
When you call a generation method without specifying a provider, Noosphere resolves one automatically:
- If
modelis specified withoutprovider→ looks up model in registry cache - If a
defaultis configured for the modality → uses that - Otherwise → local providers first, then cloud providers
resolveProvider(modality):
1. Check user-specified provider ID → return if found
2. Check configured defaults → return if found
3. Scan all providers:
→ Return first LOCAL provider supporting this modality
→ Fallback to first CLOUD provider
4. Throw NO_PROVIDER errorRetry & Failover Logic
executeWithRetry(modality, provider, fn):
for attempt = 0..maxRetries:
try: return fn()
catch:
if error is retryable AND attempts remain:
wait backoffMs * 2^attempt (exponential backoff)
retry same provider
if error is NOT GENERATION_FAILED AND failover enabled:
try each alternative provider for this modality
throw last errorRetryable errors (same provider): PROVIDER_UNAVAILABLE, RATE_LIMITED, TIMEOUT, GENERATION_FAILED
Failover-eligible errors (cross-provider): PROVIDER_UNAVAILABLE, RATE_LIMITED, TIMEOUT (NOT GENERATION_FAILED)
Model Registry & Caching
- Models are fetched from providers via
listModels()and cached in memory - Cache TTL is configurable (default: 60 minutes)
syncModels()forces a refresh of all provider model lists- Registry tracks model → provider mappings for fast resolution
Usage Tracking
Every API call (success or failure) records a UsageEvent:
interface UsageEvent {
modality: 'llm' | 'image' | 'video' | 'tts';
provider: string;
model: string;
cost: number; // USD
latencyMs: number;
input?: number; // tokens or characters
output?: number; // tokens
unit?: string;
timestamp: string; // ISO 8601
success: boolean;
error?: string; // error message if failed
metadata?: Record<string, unknown>;
}Error Handling
All errors are instances of NoosphereError:
import { NoosphereError } from 'noosphere';
try {
await ai.chat({ messages: [{ role: 'user', content: 'Hello' }] });
} catch (err) {
if (err instanceof NoosphereError) {
console.log(err.code); // error code
console.log(err.provider); // which provider failed
console.log(err.modality); // which modality
console.log(err.model); // which model (if known)
console.log(err.cause); // underlying error
console.log(err.isRetryable()); // whether retry might help
}
}Error Codes
| Code | Description | Retryable | Failover |
|---|---|---|---|
PROVIDER_UNAVAILABLE |
Provider is down or unreachable | Yes | Yes |
RATE_LIMITED |
API rate limit exceeded | Yes | Yes |
TIMEOUT |
Request exceeded timeout | Yes | Yes |
GENERATION_FAILED |
Generation error (bad prompt, model issue) | Yes | No |
AUTH_FAILED |
Invalid or missing API key | No | No |
MODEL_NOT_FOUND |
Requested model doesn't exist | No | No |
INVALID_INPUT |
Bad parameters or unsupported operation | No | No |
NO_PROVIDER |
No provider available for the requested modality | No | No |
Custom Providers
Extend Noosphere with your own providers:
import type { NoosphereProvider, ModelInfo, ChatOptions, NoosphereResult, Modality } from 'noosphere';
const myProvider: NoosphereProvider = {
// Required properties
id: 'my-provider',
name: 'My Custom Provider',
modalities: ['llm', 'image'] as Modality[],
isLocal: false,
// Required methods
async ping() { return true; },
async listModels(modality?: Modality): Promise<ModelInfo[]> {
return [{
id: 'my-model',
provider: 'my-provider',
name: 'My Model',
modality: 'llm',
local: false,
cost: { price: 1.0, unit: 'per_1m_tokens' },
capabilities: {
contextWindow: 128000,
maxTokens: 4096,
supportsVision: false,
supportsStreaming: true,
},
}];
},
// Optional methods — implement per modality
async chat(options: ChatOptions): Promise<NoosphereResult> {
const start = Date.now();
// ... your implementation
return {
content: 'Response text',
provider: 'my-provider',
model: 'my-model',
modality: 'llm',
latencyMs: Date.now() - start,
usage: { cost: 0.001, input: 100, output: 50, unit: 'tokens' },
};
},
// stream?(options): NoosphereStream
// image?(options): Promise<NoosphereResult>
// video?(options): Promise<NoosphereResult>
// speak?(options): Promise<NoosphereResult>
// dispose?(): Promise<void>
};
ai.registerProvider(myProvider);Provider Summary
| Provider | ID | Modalities | Type | Models | Library |
|---|---|---|---|---|---|
| Pi-AI Gateway | pi-ai |
LLM | Cloud | 246+ | @mariozechner/pi-ai |
| FAL.ai | fal |
Image, Video, TTS | Cloud | 867+ | @fal-ai/client |
| Hugging Face | huggingface |
LLM, Image, TTS | Cloud | Unlimited (any HF model) | @huggingface/inference |
| ComfyUI | comfyui |
Image | Local | SDXL workflows | Direct HTTP |
| Piper TTS | piper |
TTS | Local | Piper voices | Direct HTTP |
| Kokoro TTS | kokoro |
TTS | Local | Kokoro voices | Direct HTTP |
Requirements
- Node.js >= 18.0.0
License
MIT