JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 792
  • Score
    100M100P100Q99279F
  • License MIT

Universal, cross-platform text-to-speech SDK with multi-provider support.

Package Exports

  • @speech-sdk/core
  • @speech-sdk/core/cartesia
  • @speech-sdk/core/deepgram
  • @speech-sdk/core/elevenlabs
  • @speech-sdk/core/fal-ai
  • @speech-sdk/core/fish-audio
  • @speech-sdk/core/google
  • @speech-sdk/core/hume
  • @speech-sdk/core/mistral
  • @speech-sdk/core/murf
  • @speech-sdk/core/openai
  • @speech-sdk/core/resemble
  • @speech-sdk/core/unreal-speech

Readme

Speech SDK

Universal Text-To-Speech TypeScript SDK with Multi-Provider Support. Cross-platform (Node, Edge, Browser), Open-Source, and Minimal Dependencies.

Install

npm install @speech-sdk/core

Using an AI Coding Assistant?

Add the speech-sdk skill to give your AI assistant full knowledge of this library:

npx skills add Jellypod-Inc/speech-sdk --skill speech-sdk

Quick Start

import { generateSpeech } from '@speech-sdk/core';

const result = await generateSpeech({
  model: 'openai/gpt-4o-mini-tts',
  text: 'Hello from speech-sdk!',
  voice: 'alloy',
});

// Access the audio
result.audio.uint8Array;  // Uint8Array
result.audio.base64;      // string (lazy-computed)
result.audio.mediaType;   // "audio/mpeg"

Supported Providers

Use provider/model strings. Passing just the provider name uses its default model.

Provider String Prefix Default Model Env Var Docs
OpenAI openai gpt-4o-mini-tts OPENAI_API_KEY API Reference
ElevenLabs elevenlabs eleven_multilingual_v2 ELEVENLABS_API_KEY API Reference
Deepgram deepgram aura-2 DEEPGRAM_API_KEY API Reference
Cartesia cartesia sonic-3 CARTESIA_API_KEY API Reference
Hume hume octave-2 HUME_API_KEY API Reference
Google (Gemini TTS) google gemini-2.5-flash-preview-tts GOOGLE_API_KEY API Reference
Fish Audio fish-audio s2-pro FISH_AUDIO_API_KEY API Reference
Unreal Speech unreal-speech default UNREAL_SPEECH_API_KEY API Reference
Murf murf GEN2 MURF_API_KEY API Reference
Resemble resemble default RESEMBLE_API_KEY API Reference
fal fal-ai (user-specified) FAL_API_KEY API Reference
Mistral mistral voxtral-mini-tts-2603 MISTRAL_API_KEY API Reference
generateSpeech({ model: 'openai/tts-1', text: '...', voice: 'alloy' });
generateSpeech({ model: 'elevenlabs/eleven_v3', text: '...', voice: 'voice-id' });
generateSpeech({ model: 'deepgram/aura-2', text: '...', voice: 'thalia-en' });
generateSpeech({ model: 'openai', text: '...', voice: 'alloy' });  // uses default model

Provider-specific API parameters can be passed via providerOptions — these are sent directly to the provider's API using the API's own field names.

Custom Configuration

Use factory functions when you need custom API keys, base URLs, or fetch implementations:

import { generateSpeech } from '@speech-sdk/core';
import { createOpenAI } from '@speech-sdk/core/openai';
import { createElevenLabs } from '@speech-sdk/core/elevenlabs';

const myOpenAI = createOpenAI({
  apiKey: 'sk-...',
  baseURL: 'https://my-proxy.com/v1',
});

const result = await generateSpeech({
  model: myOpenAI('gpt-4o-mini-tts'),
  text: 'Hello!',
  voice: 'alloy',
});

API Key Resolution

When using string models (e.g., 'openai/tts-1'), API keys are resolved from environment variables (see table above). Factory functions accept an explicit apiKey option which takes precedence.

Voice Cloning

Some providers support voice cloning via reference audio. Pass a voice object instead of a string:

import { createMistral } from '@speech-sdk/core/mistral';

const mistral = createMistral();

// Clone from base64 audio
const result = await generateSpeech({
  model: mistral(),
  text: 'Hello!',
  voice: { audio: 'base64-encoded-audio...' },
});

Clone from a URL (fal):

import { createFal } from '@speech-sdk/core/fal-ai';

const fal = createFal();
const result = await generateSpeech({
  model: fal('fal-ai/chatterbox'),
  text: 'Hello!',
  voice: { url: 'https://example.com/reference.wav' },
});

Options

generateSpeech({
  model: string | ResolvedModel,  // required
  text: string,                   // required
  voice: Voice,                   // required
  providerOptions?: object,       // provider-specific API params
  maxRetries?: number,            // default: 2 (retries on 5xx/network errors)
  abortSignal?: AbortSignal,      // cancel the request
  headers?: Record<string, string>, // additional HTTP headers
});

Result

interface SpeechResult {
  audio: {
    uint8Array: Uint8Array;   // raw audio bytes
    base64: string;           // base64 encoded (lazy)
    mediaType: string;        // e.g. "audio/mpeg"
  };
  providerMetadata?: Record<string, unknown>;
}

Error Handling

import { generateSpeech, ApiError, SpeechSDKError } from '@speech-sdk/core';

try {
  const result = await generateSpeech({ ... });
} catch (error) {
  if (error instanceof ApiError) {
    console.log(error.statusCode);  // 401
    console.log(error.model);       // "openai/gpt-4o-mini-tts"
    console.log(error.responseBody);
  }
}
Error When
ApiError Provider API returns a non-2xx response
NoSpeechGeneratedError Provider returned empty audio
SpeechSDKError Base class for all errors

Retry

Built-in retry with exponential backoff via p-retry. Retries on 5xx and network errors. Does not retry 4xx errors. Default: 2 retries.

Development

pnpm install
pnpm test                       # unit tests
pnpm run test:e2e               # e2e tests (requires API keys)
pnpm run typecheck              # type-check without emitting

E2E tests hit real provider APIs. Set the relevant API key environment variables in a .env file or export them in your shell.

License

MIT