Package Exports

@speech-sdk/core
@speech-sdk/core/cartesia
@speech-sdk/core/deepgram
@speech-sdk/core/elevenlabs
@speech-sdk/core/fal-ai
@speech-sdk/core/fish-audio
@speech-sdk/core/google
@speech-sdk/core/hume
@speech-sdk/core/mistral
@speech-sdk/core/murf
@speech-sdk/core/openai
@speech-sdk/core/resemble
@speech-sdk/core/unreal-speech

Readme

Speech SDK

Universal Text-To-Speech TypeScript SDK with Multi-Provider Support. Cross-platform (Node, Edge, Browser), Open-Source, and Minimal Dependencies.

Install

npm install @speech-sdk/core

Using an AI Coding Assistant?

Add the speech-sdk skill to give your AI assistant full knowledge of this library:

npx skills add Jellypod-Inc/speech-sdk --skill speech-sdk

Quick Start

import { generateSpeech } from '@speech-sdk/core';

const result = await generateSpeech({
  model: 'openai/gpt-4o-mini-tts',
  text: 'Hello from speech-sdk!',
  voice: 'alloy',
});

// Access the audio
result.audio.uint8Array;  // Uint8Array
result.audio.base64;      // string (lazy-computed)
result.audio.mediaType;   // "audio/mpeg"

Supported Providers

Use provider/model strings. Passing just the provider name uses its default model.

Provider	String Prefix	Default Model	Env Var	Docs
OpenAI	`openai`	`gpt-4o-mini-tts`	`OPENAI_API_KEY`	API Reference
ElevenLabs	`elevenlabs`	`eleven_multilingual_v2`	`ELEVENLABS_API_KEY`	API Reference
Deepgram	`deepgram`	`aura-2`	`DEEPGRAM_API_KEY`	API Reference
Cartesia	`cartesia`	`sonic-3`	`CARTESIA_API_KEY`	API Reference
Hume	`hume`	`octave-2`	`HUME_API_KEY`	API Reference
Google (Gemini TTS)	`google`	`gemini-2.5-flash-preview-tts`	`GOOGLE_API_KEY`	API Reference
Fish Audio	`fish-audio`	`s2-pro`	`FISH_AUDIO_API_KEY`	API Reference
Unreal Speech	`unreal-speech`	`default`	`UNREAL_SPEECH_API_KEY`	API Reference
Murf	`murf`	`GEN2`	`MURF_API_KEY`	API Reference
Resemble	`resemble`	`default`	`RESEMBLE_API_KEY`	API Reference
fal	`fal-ai`	(user-specified)	`FAL_API_KEY`	API Reference
Mistral	`mistral`	`voxtral-mini-tts-2603`	`MISTRAL_API_KEY`	API Reference

generateSpeech({ model: 'openai/tts-1', text: '...', voice: 'alloy' });
generateSpeech({ model: 'elevenlabs/eleven_v3', text: '...', voice: 'voice-id' });
generateSpeech({ model: 'deepgram/aura-2', text: '...', voice: 'thalia-en' });
generateSpeech({ model: 'openai', text: '...', voice: 'alloy' });  // uses default model

Provider-specific API parameters can be passed via providerOptions — these are sent directly to the provider's API using the API's own field names.

Custom Configuration

Use factory functions when you need custom API keys, base URLs, or fetch implementations:

import { generateSpeech } from '@speech-sdk/core';
import { createOpenAI } from '@speech-sdk/core/openai';
import { createElevenLabs } from '@speech-sdk/core/elevenlabs';

const myOpenAI = createOpenAI({
  apiKey: 'sk-...',
  baseURL: 'https://my-proxy.com/v1',
});

const result = await generateSpeech({
  model: myOpenAI('gpt-4o-mini-tts'),
  text: 'Hello!',
  voice: 'alloy',
});

API Key Resolution

When using string models (e.g., 'openai/tts-1'), API keys are resolved from environment variables (see table above). Factory functions accept an explicit apiKey option which takes precedence.

Voice Cloning

Some providers support voice cloning via reference audio. Pass a voice object instead of a string:

import { createMistral } from '@speech-sdk/core/mistral';

const mistral = createMistral();

// Clone from base64 audio
const result = await generateSpeech({
  model: mistral(),
  text: 'Hello!',
  voice: { audio: 'base64-encoded-audio...' },
});

Clone from a URL (fal):

import { createFal } from '@speech-sdk/core/fal-ai';

const fal = createFal();
const result = await generateSpeech({
  model: fal('fal-ai/chatterbox'),
  text: 'Hello!',
  voice: { url: 'https://example.com/reference.wav' },
});

Options

generateSpeech({
  model: string | ResolvedModel,  // required
  text: string,                   // required
  voice: Voice,                   // required
  providerOptions?: object,       // provider-specific API params
  maxRetries?: number,            // default: 2 (retries on 5xx/network errors)
  abortSignal?: AbortSignal,      // cancel the request
  headers?: Record<string, string>, // additional HTTP headers
});

Result

interface SpeechResult {
  audio: {
    uint8Array: Uint8Array;   // raw audio bytes
    base64: string;           // base64 encoded (lazy)
    mediaType: string;        // e.g. "audio/mpeg"
  };
  providerMetadata?: Record<string, unknown>;
}

Error Handling

import { generateSpeech, ApiError, SpeechSDKError } from '@speech-sdk/core';

try {
  const result = await generateSpeech({ ... });
} catch (error) {
  if (error instanceof ApiError) {
    console.log(error.statusCode);  // 401
    console.log(error.model);       // "openai/gpt-4o-mini-tts"
    console.log(error.responseBody);
  }
}

Error	When
`ApiError`	Provider API returns a non-2xx response
`NoSpeechGeneratedError`	Provider returned empty audio
`SpeechSDKError`	Base class for all errors

Retry

Built-in retry with exponential backoff via p-retry. Retries on 5xx and network errors. Does not retry 4xx errors. Default: 2 retries.

Development

pnpm install
pnpm test                       # unit tests
pnpm run test:e2e               # e2e tests (requires API keys)
pnpm run typecheck              # type-check without emitting

E2E tests hit real provider APIs. Set the relevant API key environment variables in a .env file or export them in your shell.

License

MIT