Package Exports
- @pocketpalai/react-native-speech
- @pocketpalai/react-native-speech/lib/module/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@pocketpalai/react-native-speech) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
@pocketpalai/react-native-speech
On-device, multi-engine text-to-speech for React Native. Wraps the OS-native TTS (iOS AVSpeechSynthesizer / Android TextToSpeech) and three neural engines — Kokoro, Supertonic, Kitten — behind a single API, with native audio playback, progress events, and audio-focus handling.
New Architecture only. Requires React Native's New Architecture. RN 0.76+ enables it by default. For 0.68–0.75 see the enable-apps guide.
Preview
Streaming (LLM token stream → TTS)
One-shot speak
Features
- Four engines behind one API:
OS_NATIVE(platform TTS),KOKORO(high quality, multi-language),SUPERTONIC(fast, lightweight),KITTEN(compact IPA-driven). - License-neutral runner: the library is MIT and ships no model or dictionary data. Consumer apps supply both at runtime. See LICENSES.md.
- On-device synthesis: neural TTS runs entirely on-device. The library performs no network I/O during synthesis. Any initial model or dictionary download is performed by the consumer app using its own network stack.
- Interruption-aware audio: iOS
AVAudioSessionand AndroidAudioFocusare wired through a JSonAudioInterruptionevent so apps can react to phone calls and other interruptions. - Turbo-module native layer: native audio playback, progress events, and chunk progress for neural engines.
- Permissive phonemization: default is
phonemize(MIT). Optionally supply a mmap'd EPD1 dict via theNativeDictAPI for higher accuracy — see PHONEMIZATION.md. HighlightedTextcomponent: highlight spoken text as it synthesizes.- TypeScript: full type definitions; per-engine config is a discriminated union on the
enginefield.
Installation
npm install @pocketpalai/react-native-speech
# or
yarn add @pocketpalai/react-native-speechiOS:
cd ios && pod installExpo (bare only — not supported in Expo Go):
npx expo install @pocketpalai/react-native-speech
npx expo prebuildNeural engines (optional)
The neural engines need onnxruntime-react-native (optional peer):
npm install onnxruntime-react-nativeOS-native TTS works without it.
Quickstart
import Speech, {TTSEngine} from '@pocketpalai/react-native-speech';
await Speech.initialize({engine: TTSEngine.OS_NATIVE});
// voiceId is optional for OS_NATIVE — omitted uses the platform default voice.
await Speech.speak('Hello world');Neural engine quickstarts
The consumer app is responsible for downloading models and passing file paths. See example/src/utils/ for reference model managers.
// Kokoro
await Speech.initialize({
engine: TTSEngine.KOKORO,
modelPath: 'file:///.../kokoro.onnx',
voicesPath: 'file:///.../voices.bin',
tokenizerPath: 'file:///.../tokenizer.json',
});
await Speech.speak('Hello from Kokoro.', 'af_bella');
// Supertonic (4 ONNX files)
await Speech.initialize({
engine: TTSEngine.SUPERTONIC,
durationPredictorPath: 'file:///.../duration_predictor.onnx',
textEncoderPath: 'file:///.../text_encoder.onnx',
vectorEstimatorPath: 'file:///.../vector_estimator.onnx',
vocoderPath: 'file:///.../vocoder.onnx',
unicodeIndexerPath: 'file:///.../unicode_indexer.json',
voicesPath: 'file:///.../voices/',
});
await Speech.speak('Hello from Supertonic.', 'F1');
// Kitten
await Speech.initialize({
engine: TTSEngine.KITTEN,
modelPath: 'file:///.../kitten.onnx',
voicesPath: 'file:///.../voices.json',
dictPath: 'file:///.../en-us.bin', // optional EPD1 dict
});
await Speech.speak('Hello from Kitten.', 'expr-voice-2-f');Full options (execution providers, chunking, phonemizer selection) are documented in USAGE.md.
Streaming input (LLM token streams)
If your app plays a token-by-token LLM response through TTS, use createSpeechStream() instead of calling speak() per sentence. It buffers incoming text and adaptively flushes batches through the underlying engine so playback sounds continuous — the first sentence flushes as soon as it completes (low latency) and subsequent batches are packed up to targetChars characters.
const stream = Speech.createSpeechStream('af_bella', {
targetChars: 300, // default
onError: err => console.warn(err),
});
for await (const token of llmTokenStream) {
stream.append(token); // non-blocking
}
await stream.finalize(); // flushes the tail and resolves when playback ends
// or: await stream.cancel(); // stops and discardsPer-sentence speak() chains produce audible gaps: each call resets the engine's internal synth pipeline, starting a fresh F0 contour and a cold first-chunk inference. The stream avoids this by keeping one continuous synth+play loop alive for the stream's entire lifetime — the next chunk is synthesized while the current one plays, so the only gap is genuine token-rate underrun (LLM slower than playback).
You can also track playback position with stream-absolute offsets:
stream.onProgress(event => {
// event.streamRange is relative to the total text appended so far
highlightText(event.streamRange.start, event.streamRange.end);
});Works with all neural engines (Kokoro, Supertonic, Kitten) as well as the OS engine. See the Streaming tab in example/ for a live demo that simulates variable token rates.
Architecture (short)
Speechis the public facade.Speech.initialize(config)dispatches onconfig.engineand constructs the matching engine.- Each engine implements
TTSEngineInterface<TConfig>. Neural engines run ONNX sessions underonnxruntime-react-nativeand stream PCM to the native audio player. - Native code handles playback, progress events, and OS-level audio focus / session interruptions.
See ARCHITECTURE.md for the full picture, including memory and device requirements.
Model & dictionary downloads
The library ships no model or dictionary assets. Consumer apps fetch them from their own origin (typically Hugging Face) and pass local paths into initialize(). See LICENSES.md for upstream sources and license notes per engine.
Known limitations
- First run per engine has a 200–2000 ms cold-start (model load + compilation).
- Neural engines recommend a 3 GB+ RAM device. Low-memory devices should prefer the Kitten nano/micro variants or fall back to
OS_NATIVE. - OS TTS interruption handling is limited to what the platform provides — no library-level custom ducking beyond what iOS/Android expose.
- Hermes is supported, but has no
TextDecoderor WASM — relevant only if you extend the library's text pipeline. - Android 16 KB page sizes (Android 15+): the library's own
native_dict.sois 16 KB-aligned, butonnxruntime-react-native(≤ 1.24.3 at time of writing) is not — apps that load a neural engine on a 16 KB-page device will fail withdlopenerrors. Workaround: a one-line linker flag added to itsCMakeLists.txtviapatch-package. Seeexample/patches/onnxruntime-react-native+1.24.3.patchand thepostinstallwiring inexample/package.jsonfor the full setup. Drop the patch once upstream ships the fix.
Testing
Mock the module in tests by creating __mocks__/@pocketpalai/react-native-speech.ts:
module.exports = require('@pocketpalai/react-native-speech/jest');Contributing
See CONTRIBUTING.md.
Credits
Forked from @mhpdev/react-native-speech by Mhpdev. The 1.x line provided the OS-native TTS foundation and the HighlightedText component; 2.0 extended the library into a multi-engine neural platform under a new package name.
Built on top of:
phonemizeby hans00 — the MIT G2P library that powers the default phonemizer.onnxruntime-react-native— Microsoft's ONNX Runtime bindings for RN, which every neural engine uses for inference.@dr.pogodin/react-native-fs— file I/O for model and dict loading.
Neural model credits (weights are not bundled):
- Kokoro-82M by hexgrad (Apache-2.0).
- Supertonic by Supertone (code MIT, weights OpenRAIL).
- KittenML kitten-tts (Apache-2.0).
Full license details in LICENSES.md.
License
MIT. See LICENSE. For model and third-party data licenses, see LICENSES.md.