Package Exports
- speechflow
Readme
SpeechFlow
Speech Processing Flow Graph
About
SpeechFlow is a command-line interface based tool for establishing a directed data flow graph of audio and text processing nodes. This way, it allows to perform various speech processing tasks in a flexible way.
SpeechFlow comes with built-in graph nodes for local file I/O, local audio
device I/O, local/remote WebSocket network I/O, cloud-based Deepgram
speech-to-text conversion, cloud-based DeepL text-to-text
translation, local Gemma/Ollama
text-to-text translation, cloud-based ElevenLabs
text-to-speech conversion, and local FFmpeg
speech-to-speech encoding. Additional SpeechFlow graph nodes can be provided externally
by NPM packages named speechflow-node-xxx
which expose a class
derived from the exported SpeechFlowNode
class of the speechflow
package.
SpeechFlow is written in TypeScript and ships as an installable package for the Node Package Manager (NPM).
Installation
$ npm install -g speechflow
Usage
$ speechflow
[-h|--help]
[-V|--version]
[-v|--verbose <level>]
[-e|--expression <expression>]
[-f|--expression-file <expression-file>]
[-c|--config <key>@<yaml-config-file>]
[<argument> [...]]
Processing Graph Examples
Capture audio from microphone to file:
device(device: "wasapi:VoiceMeeter Out B1", mode: "r") | file(path: "capture.pcm", mode: "w", type: "audio")
Generate audio file with narration of text file:
file(path: argv.0, mode: "r", type: "audio") | deepgram(language: "en") | file(path: argv.1, mode: "w", type: "text")
Translate stdin to stdout:
file(path: "-", mode: "r", type: "text") | deepl(src: "de", dst: "en-US") | file(path: "-", mode: "w", type: "text")
Pass-through audio from microphone to speaker and in parallel record it to file:
device(device: "wasapi:VoiceMeeter Out B1", mode: "r") | { file(path: "capture.pcm", mode: "w", type: "audio"), device(device: "wasapi:VoiceMeeter VAIO3 Input", mode: "w") }
Real-time translation from german to english, including capturing of all inputs and outputs:
device(device: "wasapi:VoiceMeeter Out B1", mode: "r") | { file(path: "translation-audio-de.pcm", mode: "w", type: "audio"), deepgram(language: "de") | file(path: "translation-text-de.txt", mode: "w", type: "text") } | { deepl(src: "de", dst: "en-US") | file(path: "translation-text-en.txt", mode: "w", type: "text") } | { elevenlabs(language: "en") | { file(path: "translation-audio-en.pcm", mode: "w", type: "audio"), device(device: "wasapi:VoiceMeeter VAIO3 Input", mode: "w") } }
Processing Node Types
Currently SpeechFlow provides the following processing nodes:
Node: file
Purpose: File and StdIO source/sink
Example:file(path: "capture.pcm", mode: "w", type: "audio")
Port Payload input text, audio output text, audio Parameter Position Default Requirement path 0 none none mode 1 "r" /^(?:r|w|rw)$/
type 2 "audio" /^(?:audio|text)$/
Node: websocket
Purpose: WebSocket source/sink
Example:websocket(connect: "ws://127.0.0.1:12345". type: "text")
Port Payload input text, audio output text, audio Parameter Position Default Requirement listen none none /^(?:|ws:\/\/(.+?):(\d+))$/
connect none none /^(?:|ws:\/\/(.+?):(\d+)(?:\/.*)?)$/
type none "audio" /^(?:audio|text)$/
Node: device
Purpose: Microphone/speaker device source/sink
Example:device(device: "wasapi:VoiceMeeter Out B1", mode: "r")
Port Payload input audio output audio Parameter Position Default Requirement device 0 none /^(.+?):(.+)$/
mode 1 "rw" /^(?:r|w|rw)$/
Node: ffmpeg
Purpose: FFmpeg audio format conversion
Example:ffmpeg(src: "pcm", dst: "mp3")
Port Payload input audio output audio Parameter Position Default Requirement src 0 "pcm" /^(?:pcm|wav|mp3|opus)$/
dst 1 "wav" /^(?:pcm|wav|mp3|opus)$/
Node: deepgram
Purpose: Deepgram Speech-to-Text conversion
Example:deepgram(language: "de")
Notice: this node requires an API key!Port Payload input audio output text Parameter Position Default Requirement key none env.SPEECHFLOW_KEY_DEEPGRAM none model 0 "nova-2" none version 1 "latest" none language 2 "de" none Node: deepl
Purpose: DeepL Text-to-Text translation
Example:deepl(src: "de", dst: "en-US")
Notice: this node requires an API key!Port Payload input text output text Parameter Position Default Requirement key none env.SPEECHFLOW_KEY_DEEPL none src 0 "de" /^(?:de|en-US)$/
dst 1 "en-US" /^(?:de|en-US)$/
Node: gemma
Purpose: Google Gemma Text-to-Text translation
Example:gemma(src: "de", dst: "en")
Notice; this node requires the Ollama API!Port Payload input text output text Parameter Position Default Requirement url none "http://127.0.0.1:11434" /^https?:\/\/.+?:\d+$/
src 0 "de" /^(?:de|en)$/
dst 1 "en" /^(?:de|en)$/
Node: elevenlabs
Purpose: ElevenLabs Text-to-Speech conversion
Example:elevenlabs(language: "en")
Notice: this node requires an API key!Port Payload input text output audio Parameter Position Default Requirement key none env.SPEECHFLOW_KEY_ELEVENLABS none voice 0 "Brian" none language 1 "de" none
Graph Expression Language
The SpeechFlow graph expression language is based on FlowLink, which itself has a language following the following BNF-style grammar:
expr ::= parallel
| sequential
| node
| group
parallel ::= sequential ("," sequential)+
sequential ::= node ("|" node)+
node ::= id ("(" (param ("," param)*)? ")")?
param ::= array | object | variable | template | string | number | value
group ::= "{" expr "}"
id ::= /[a-zA-Z_][a-zA-Z0-9_-]*/
variable ::= id
array ::= "[" (param ("," param)*)? "]"
object ::= "{" (id ":" param ("," id ":" param)*)? "}"
template ::= "`" ("${" variable "}" / ("\\`"|.))* "`"
string ::= /"(\\"|.)*"/
| /'(\\'|.)*'/
number ::= /[+-]?/ number-value
number-value ::= "0b" /[01]+/
| "0o" /[0-7]+/
| "0x" /[0-9a-fA-F]+/
| /[0-9]*\.[0-9]+([eE][+-]?[0-9]+)?/
| /[0-9]+/
value ::= "true" | "false" | "null" | "NaN" | "undefined"
History
Speechflow, as a technical cut-through, was initially created in March 2024 for use in the msg Filmstudio context. It was later refined into a more complete toolkit in April 2025 and this way the first time could be used in production.
Copyright & License
Copyright © 2024-2025 Dr. Ralf S. Engelschall
Licensed under GPL 3.0