JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 792
  • Score
    100M100P100Q99325F
  • License MIT

Universal, cross-platform text-to-speech SDK with multi-provider support.

Package Exports

  • @speech-sdk/core
  • @speech-sdk/core/cartesia
  • @speech-sdk/core/deepgram
  • @speech-sdk/core/elevenlabs
  • @speech-sdk/core/fal-ai
  • @speech-sdk/core/fish-audio
  • @speech-sdk/core/google
  • @speech-sdk/core/hume
  • @speech-sdk/core/mistral
  • @speech-sdk/core/murf
  • @speech-sdk/core/openai
  • @speech-sdk/core/resemble
  • @speech-sdk/core/unreal-speech

Readme

Speech SDK

The Speech SDK is a lightweight, provider-agnostic TypeScript toolkit designed to help build text-to-speech powered applications using popular providers like OpenAI, ElevenLabs, Deepgram, Cartesia, Google, and more. Cross-platform (Node.js, Edge, Browser) with minimal dependencies.

To learn more about the Speech SDK, check out https://speechsdk.dev/.

og-3

Install

npm install @speech-sdk/core

Using an AI Coding Assistant?

Add the speech-sdk skill to give your AI assistant full knowledge of this library:

npx skills add Jellypod-Inc/speech-sdk --skill speech-sdk

Quick Start

import { generateSpeech } from '@speech-sdk/core';

const result = await generateSpeech({
  model: 'openai/gpt-4o-mini-tts',
  text: 'Hello from speech-sdk!',
  voice: 'alloy',
});

// Access the audio
result.audio.uint8Array;  // Uint8Array
result.audio.base64;      // string (lazy-computed)
result.audio.mediaType;   // "audio/mpeg"

Supported Providers

Use provider/model strings. Passing just the provider name uses its default model.

Provider String Prefix Default Model Env Var Docs
OpenAI openai gpt-4o-mini-tts OPENAI_API_KEY API Reference
ElevenLabs elevenlabs eleven_multilingual_v2 ELEVENLABS_API_KEY API Reference
Deepgram deepgram aura-2 DEEPGRAM_API_KEY API Reference
Cartesia cartesia sonic-3 CARTESIA_API_KEY API Reference
Hume hume octave-2 HUME_API_KEY API Reference
Google (Gemini TTS) google gemini-2.5-flash-preview-tts GOOGLE_API_KEY API Reference
Fish Audio fish-audio s2-pro FISH_AUDIO_API_KEY API Reference
Unreal Speech unreal-speech default UNREAL_SPEECH_API_KEY API Reference
Murf murf GEN2 MURF_API_KEY API Reference
Resemble resemble default RESEMBLE_API_KEY API Reference
fal fal-ai (user-specified) FAL_API_KEY API Reference
Mistral mistral voxtral-mini-tts-2603 MISTRAL_API_KEY API Reference
generateSpeech({ model: 'openai/tts-1', text: '...', voice: 'alloy' });
generateSpeech({ model: 'elevenlabs/eleven_v3', text: '...', voice: 'voice-id' });
generateSpeech({ model: 'deepgram/aura-2', text: '...', voice: 'thalia-en' });
generateSpeech({ model: 'openai', text: '...', voice: 'alloy' });  // uses default model

Provider-specific API parameters can be passed via providerOptions — these are sent directly to the provider's API using the API's own field names.

Custom Configuration

Use factory functions when you need custom API keys, base URLs, or fetch implementations:

import { generateSpeech } from '@speech-sdk/core';
import { createOpenAI } from '@speech-sdk/core/openai';
import { createElevenLabs } from '@speech-sdk/core/elevenlabs';

const myOpenAI = createOpenAI({
  apiKey: 'sk-...',
  baseURL: 'https://my-proxy.com/v1',
});

const result = await generateSpeech({
  model: myOpenAI('gpt-4o-mini-tts'),
  text: 'Hello!',
  voice: 'alloy',
});

API Key Resolution

When using string models (e.g., 'openai/tts-1'), API keys are resolved from environment variables (see table above). Factory functions accept an explicit apiKey option which takes precedence.

Audio Tags

Use bracket syntax [tag] to add expressive audio cues like laughter, sighs, or emotions. Provider support varies — unsupported tags are automatically stripped with warnings returned in result.warnings.

const result = await generateSpeech({
  model: 'elevenlabs/eleven_v3',
  text: '[laugh] Oh that is so funny! [sigh] But seriously though.',
  voice: 'voice-id',
});

console.log(result.warnings); // undefined — eleven_v3 supports all tags

Provider behavior

Provider Behavior
ElevenLabs (eleven_v3) All [tag] passed through natively
Cartesia (sonic-3) Emotion tags ([happy], [sad], [angry], etc.) converted to SSML; [laughter] passed through; unknown tags stripped
All others Tags stripped and warnings returned
// Unsupported provider — tags are stripped with warnings
const result = await generateSpeech({
  model: 'openai/gpt-4o-mini-tts',
  text: '[laugh] Hello world',
  voice: 'alloy',
});

console.log(result.warnings);
// ["Audio tag [laugh] is not supported by openai/gpt-4o-mini-tts and was removed."]

Voice Cloning

Some providers support voice cloning via reference audio. Pass a voice object instead of a string:

import { createMistral } from '@speech-sdk/core/mistral';

const mistral = createMistral();

// Clone from base64 audio
const result = await generateSpeech({
  model: mistral(),
  text: 'Hello!',
  voice: { audio: 'base64-encoded-audio...' },
});

Clone from a URL (fal):

import { createFal } from '@speech-sdk/core/fal-ai';

const fal = createFal();
const result = await generateSpeech({
  model: fal('fal-ai/chatterbox'),
  text: 'Hello!',
  voice: { url: 'https://example.com/reference.wav' },
});

Options

generateSpeech({
  model: string | ResolvedModel,  // required
  text: string,                   // required
  voice: Voice,                   // required
  providerOptions?: object,       // provider-specific API params
  maxRetries?: number,            // default: 2 (retries on 5xx/network errors)
  abortSignal?: AbortSignal,      // cancel the request
  headers?: Record<string, string>, // additional HTTP headers
});

Result

interface SpeechResult {
  audio: {
    uint8Array: Uint8Array;   // raw audio bytes
    base64: string;           // base64 encoded (lazy)
    mediaType: string;        // e.g. "audio/mpeg"
  };
  providerMetadata?: Record<string, unknown>;
}

Error Handling

import { generateSpeech, ApiError, SpeechSDKError } from '@speech-sdk/core';

try {
  const result = await generateSpeech({ ... });
} catch (error) {
  if (error instanceof ApiError) {
    console.log(error.statusCode);  // 401
    console.log(error.model);       // "openai/gpt-4o-mini-tts"
    console.log(error.responseBody);
  }
}
Error When
ApiError Provider API returns a non-2xx response
NoSpeechGeneratedError Provider returned empty audio
SpeechSDKError Base class for all errors

Retry

Built-in retry with exponential backoff via p-retry. Retries on 5xx and network errors. Does not retry 4xx errors. Default: 2 retries.

Development

pnpm install
pnpm test                       # unit tests
pnpm run test:e2e               # e2e tests (requires API keys)
pnpm run typecheck              # type-check without emitting

E2E tests hit real provider APIs. Set the relevant API key environment variables in a .env file or export them in your shell.

License

MIT