Package Exports
- @restnpeacepk/worker-vad
- @restnpeacepk/worker-vad/engines/fvad
Readme
worker-vad
Universal Voice Activity Detection SDK - Multiple WASM engines, one simple API
Detect speech in audio streams with WebAssembly-powered engines. Perfect for Cloudflare Workers, browsers, and Node.js.
โจ Features
- ๐ฏ Unified API - One interface for all VAD engines
- ๐ Multiple Engines - fvad, libfvad, rnnoise support
// Create VAD instance const vad = await VAD.create({ sampleRate: 16000, mode: 'aggressive' });
// Process audio const result = vad.process(audioData);
if (result.isSpeech) { console.log('Speech detected!'); }
// Cleanup vad.destroy();
## ๐ Usage
### Basic Example
```javascript
import { VAD } from 'worker-vad';
const vad = await VAD.create({ sampleRate: 16000 });
const audioData = new Int16Array(480); // 30ms at 16kHz
const result = vad.process(audioData);
console.log(result.isSpeech); // true/false
console.log(result.probability); // 0.0 - 1.0Web Audio API
import { VAD } from 'worker-vad';
// Get microphone
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
// Create VAD
const vad = await VAD.create({ sampleRate: 16000 });
// Process audio
const processor = audioContext.createScriptProcessor(4096, 1, 1);
processor.onaudioprocess = (e) => {
const float32 = e.inputBuffer.getChannelData(0);
const pcm = VAD.floatTo16BitPCM(float32);
const result = vad.process(pcm);
if (result.isSpeech) {
console.log('Speaking!');
}
};
source.connect(processor);
processor.connect(audioContext.destination);Cloudflare Workers
import { VAD } from 'worker-vad';
export default {
async fetch(request) {
const vad = await VAD.create({
engine: 'fvad',
sampleRate: 16000
});
const audioBuffer = await request.arrayBuffer();
const result = vad.process(new Int16Array(audioBuffer));
vad.destroy();
return Response.json(result);
}
};๐๏ธ API Reference
VAD.create(options)
Create a new VAD instance.
Options:
engine- Engine to use ('auto','fvad','libfvad','rnnoise')sampleRate- Audio sample rate (8000, 16000, 32000, 48000)mode- VAD sensitivity ('quality','low','aggressive','very-aggressive')frameDuration- Frame duration in ms (10, 20, 30)
Returns: Promise<VAD>
vad.process(audioData)
Process audio data.
Parameters:
audioData- Int16Array of PCM audio data
Returns:
{
isSpeech: boolean,
probability: number,
timestamp: number,
processingTime: number,
engine: string,
metadata: object
}Utility Methods
VAD.floatTo16BitPCM(buffer) // Float32Array โ Int16Array
VAD.int16ToFloat(buffer) // Int16Array โ Float32Array
VAD.base64ToInt16(base64) // Base64 โ Int16Array
VAD.int16ToBase64(buffer) // Int16Array โ Base64
VAD.getAvailableEngines() // List engines
VAD.getEngineCapabilities(name) // Get engine info๐ง Supported Engines
| Engine | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
| fvad | 20KB | โกโกโก | โญโญโญ | Workers, Browser, Node |
| libfvad | 20KB | โกโกโก | โญโญโญ | Browser, Node |
| rnnoise | 100KB | โกโก | โญโญโญโญ | Browser, Node |
๐ Performance
- Processing Speed: < 0.1ms per 30ms frame
- Bundle Size: 20KB (fvad engine)
- Memory Usage: < 1MB per instance
- Latency: < 50ms for real-time
๐ Browser Support
- โ Chrome/Edge (latest)
- โ Firefox (latest)
- โ Safari (latest)
- โ Node.js 14+
- โ Cloudflare Workers
๐ Examples
See the examples directory for:
- Real-time microphone detection
- WebSocket streaming
- Batch processing
- Engine comparison
๐ค Contributing
Contributions welcome! Please read CONTRIBUTING.md first.
๐ License
MIT ยฉ Your Name
๐ Acknowledgments
- fvad-wasm - WebRTC VAD
- Cloudflare Workers - Serverless platform
๐ Support
- ๐ Documentation
- ๐ Issues
- ๐ฌ Discussions