JSPM

  • Created
  • Published
  • Downloads 1075
  • Score
    100M100P100Q106177F
  • License GPL-3.0-only

An integrated speech toolbox designed with end-users in mind.

Package Exports

    This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (echogarden) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

    Readme

    Echogarden

    An integrated speech toolbox designed with end-users in mind.

    • Written in TypeScript, for the Node.js platform
    • Easy to install, run and update
    • Runs on Windows (x64), macOS (x64, ARM64) and Linux (x64)
    • Does not require Python, Docker, or any other system-level dependencies
    • No essential platform-specific binary executables. Engines are ported via WebAssembly or the ONNX runtime

    Feature highlights

    • Fast, high-quality offline text-to-speech voices based on the VITS neural architecture
    • Accurate offline speech recognition using OpenAI Whisper models
    • Supports synthesis and recognition via major cloud providers, including Google, Microsoft and Amazon
    • Word-level timestamps for all synthesis and recognition outputs
    • Speech-to-transcript alignment using dynamic time warping (DTW), and dynamic time warping with recognition assist (DTW-RA) methods
    • Advanced subtitle generation, accounting for sentence and phrase boundaries
    • Can transcribe speech in any one of 98 languages, translated directly to English, and produce near word-level synchronized subtitles
    • Uses NLP for improving TTS pronunciation accuracy on a few engines and languages: adds text normalization (e.g. idiomatic date and currency pronunciation), heteronym disambiguation (based on POS tagging) and user-customizable phonetic lexicons
    • Internal package system to auto-download and install voices, models and other resources, as needed
    • Other features include: language detection (both for audio and text), voice activity detection and speech denoising

    Planned, but not yet

    • Real-time, streaming speech recognition
    • WebSocket server API
    • Web-based GUI frontend
    • Browser port for a subset of the API (in particular for the offline TTS models and their dependencies)

    Installation

    Ensure you have Node.js v16.0.0 or later installed.

    then:

    npm install echogarden -g

    Additional tools:

    • sox: used for audio playback and recording only. Auto-installed via an expansion package on Windows and Intel macOS. On Linux and ARM64 macOS, it is recommended to install it via platform package managers like apt and brew.
    • ffmpeg: used for codec conversions. Auto-installed via an expansion package on Windows, Intel macOS, and x64 Linux. On ARM64 macOS, it is recommended to install it via platform package manager like brew, otherwise, much slower ffmpeg-wasm would be used.

    (hopefully in the future all platforms would be covered using expansion packages)

    Updating to latest version

    npm update echogarden -g

    Next steps

    Credits

    This project consolidates and builds upon the work of many different individuals and companies.

    Designed and developed by Rotem Dan.

    License

    GNU General Public License v3