Package Exports
- @speech-sdk/core
- @speech-sdk/core/cartesia
- @speech-sdk/core/deepgram
- @speech-sdk/core/elevenlabs
- @speech-sdk/core/fal-ai
- @speech-sdk/core/fish-audio
- @speech-sdk/core/google
- @speech-sdk/core/hume
- @speech-sdk/core/mistral
- @speech-sdk/core/murf
- @speech-sdk/core/openai
- @speech-sdk/core/resemble
- @speech-sdk/core/unreal-speech
Readme
Speech SDK
Universal Text-To-Speech TypeScript SDK with Multi-Provider Support. Cross-platform (Node, Edge, Browser), Open-Source, and Minimal Dependencies.
Install
npm install @speech-sdk/coreUsing an AI Coding Assistant?
Add the speech-sdk skill to give your AI assistant full knowledge of this library:
npx skills add Jellypod-Inc/speech-sdk --skill speech-sdkQuick Start
import { generateSpeech } from '@speech-sdk/core';
const result = await generateSpeech({
model: 'openai/gpt-4o-mini-tts',
text: 'Hello from speech-sdk!',
voice: 'alloy',
});
// Access the audio
result.audio.uint8Array; // Uint8Array
result.audio.base64; // string (lazy-computed)
result.audio.mediaType; // "audio/mpeg"Supported Providers
Use provider/model strings. Passing just the provider name uses its default model.
| Provider | String Prefix | Default Model | Env Var | Docs |
|---|---|---|---|---|
| OpenAI | openai |
gpt-4o-mini-tts |
OPENAI_API_KEY |
API Reference |
| ElevenLabs | elevenlabs |
eleven_multilingual_v2 |
ELEVENLABS_API_KEY |
API Reference |
| Deepgram | deepgram |
aura-2 |
DEEPGRAM_API_KEY |
API Reference |
| Cartesia | cartesia |
sonic-3 |
CARTESIA_API_KEY |
API Reference |
| Hume | hume |
octave-2 |
HUME_API_KEY |
API Reference |
| Google (Gemini TTS) | google |
gemini-2.5-flash-preview-tts |
GOOGLE_API_KEY |
API Reference |
| Fish Audio | fish-audio |
s2-pro |
FISH_AUDIO_API_KEY |
API Reference |
| Unreal Speech | unreal-speech |
default |
UNREAL_SPEECH_API_KEY |
API Reference |
| Murf | murf |
GEN2 |
MURF_API_KEY |
API Reference |
| Resemble | resemble |
default |
RESEMBLE_API_KEY |
API Reference |
| fal | fal-ai |
(user-specified) | FAL_API_KEY |
API Reference |
| Mistral | mistral |
voxtral-mini-tts-2603 |
MISTRAL_API_KEY |
API Reference |
generateSpeech({ model: 'openai/tts-1', text: '...', voice: 'alloy' });
generateSpeech({ model: 'elevenlabs/eleven_v3', text: '...', voice: 'voice-id' });
generateSpeech({ model: 'deepgram/aura-2', text: '...', voice: 'thalia-en' });
generateSpeech({ model: 'openai', text: '...', voice: 'alloy' }); // uses default modelProvider-specific API parameters can be passed via providerOptions — these are sent directly to the provider's API using the API's own field names.
Custom Configuration
Use factory functions when you need custom API keys, base URLs, or fetch implementations:
import { generateSpeech } from '@speech-sdk/core';
import { createOpenAI } from '@speech-sdk/core/openai';
import { createElevenLabs } from '@speech-sdk/core/elevenlabs';
const myOpenAI = createOpenAI({
apiKey: 'sk-...',
baseURL: 'https://my-proxy.com/v1',
});
const result = await generateSpeech({
model: myOpenAI('gpt-4o-mini-tts'),
text: 'Hello!',
voice: 'alloy',
});API Key Resolution
When using string models (e.g., 'openai/tts-1'), API keys are resolved from environment variables (see table above). Factory functions accept an explicit apiKey option which takes precedence.
Voice Cloning
Some providers support voice cloning via reference audio. Pass a voice object instead of a string:
import { createMistral } from '@speech-sdk/core/mistral';
const mistral = createMistral();
// Clone from base64 audio
const result = await generateSpeech({
model: mistral(),
text: 'Hello!',
voice: { audio: 'base64-encoded-audio...' },
});Clone from a URL (fal):
import { createFal } from '@speech-sdk/core/fal-ai';
const fal = createFal();
const result = await generateSpeech({
model: fal('fal-ai/chatterbox'),
text: 'Hello!',
voice: { url: 'https://example.com/reference.wav' },
});Options
generateSpeech({
model: string | ResolvedModel, // required
text: string, // required
voice: Voice, // required
providerOptions?: object, // provider-specific API params
maxRetries?: number, // default: 2 (retries on 5xx/network errors)
abortSignal?: AbortSignal, // cancel the request
headers?: Record<string, string>, // additional HTTP headers
});Result
interface SpeechResult {
audio: {
uint8Array: Uint8Array; // raw audio bytes
base64: string; // base64 encoded (lazy)
mediaType: string; // e.g. "audio/mpeg"
};
providerMetadata?: Record<string, unknown>;
}Error Handling
import { generateSpeech, ApiError, SpeechSDKError } from '@speech-sdk/core';
try {
const result = await generateSpeech({ ... });
} catch (error) {
if (error instanceof ApiError) {
console.log(error.statusCode); // 401
console.log(error.model); // "openai/gpt-4o-mini-tts"
console.log(error.responseBody);
}
}| Error | When |
|---|---|
ApiError |
Provider API returns a non-2xx response |
NoSpeechGeneratedError |
Provider returned empty audio |
SpeechSDKError |
Base class for all errors |
Retry
Built-in retry with exponential backoff via p-retry. Retries on 5xx and network errors. Does not retry 4xx errors. Default: 2 retries.
Development
pnpm install
pnpm test # unit tests
pnpm run test:e2e # e2e tests (requires API keys)
pnpm run typecheck # type-check without emittingE2E tests hit real provider APIs. Set the relevant API key environment variables in a .env file or export them in your shell.
License
MIT