Package Exports
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (echogarden) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
Echogarden
A fully open-source speech system designed with end-users in mind.
- Written in TypeScript, for the Node.js runtime
- Easy to install, run and update
- Runs on Windows (x64), macOS (x64, ARM64) and Linux (x64)
- Doesn't require Python, Docker, or similar system-level dependencies
- Doesn't rely on any essential platform-specific binaries. Engines are either ported via WebAssembly, imported using the ONNX runtime, or written in pure JavaScript
Feature highlights
- Fast, high-quality offline text-to-speech voices based on the VITS neural architecture
- Accurate offline speech recognition using OpenAI Whisper models
- Supports synthesis and recognition via major cloud providers, including Google, Microsoft and Amazon
- Word-level timestamps for all synthesis and recognition outputs
- Speech-to-transcript alignment using dynamic time warping (DTW), and dynamic time warping with recognition assist (DTW-RA) methods
- Advanced subtitle generation, accounting for sentence and phrase boundaries
- Can transcribe speech in any one of 98 languages, translated directly to English, and produce near word-level synchronized subtitles for the translated transcript
- Attempts to improve TTS pronunciation accuracy for a few engines and languages (currently only implemented for English dialects): adds text normalization (e.g. idiomatic date and currency pronunciation), heteronym disambiguation (based on a custom rule-based model) and user-customizable pronunciation lexicons
- Internal package system that auto-downloads and installs voices, models and other resources, as needed
- Other features include: language detection (both for audio and text), voice activity detection, and speech denoising
In development
- Background worker
- WebSocket-based server and API
- Browser extension (for TTS only), including integration with the Web Speech API, and an advanced page reader enabling real-time narration of any page content, with live word highlighting
- New, high-accuracy text language identification model (own work)
Planned, but not yet
- Text enhancement, adding breaks to improve phrasing of synthesized text, as well as adding missing punctuation to recognized transcripts, if needed
- Web-based UI
- Real-time, streaming speech recognition
Maybe
- Browser port for a subset of the API (in particular for the offline TTS models and their dependencies)
Installation
Ensure you have Node.js v16.0.0
or later installed.
then:
npm install echogarden -g
Additional tools:
sox
: used for the CLI's audio playback and recording (only). Auto-installed via a package on Windows and Intel macOS. On Linux and ARM64 macOS, it is recommended to install it via platform package managers likeapt
andbrew
.ffmpeg
: used for codec conversions. Auto-installed via an expansion package on Windows, Intel macOS, and x64 Linux. On ARM64 macOS, it is recommended to install it via platform package manager likebrew
, otherwise, much slowerffmpeg-wasm
would be used.
(hopefully in the future all platforms would be covered using expansion packages)
Updating to latest version
npm update echogarden -g
Next steps
- Using the command line interface
- Options reference
- Full list of supported engines
- Technical overview and Q&A
- Roadmap
- How to help
Credits
This project consolidates, and builds upon the effort of many different individuals and companies, as well as contributing a number of original works.
Designed and developed by Rotem Dan.
License
GNU General Public License v3
Licenses for components, models and other dependencies are detailed on this page.