Package Exports

@restnpeacepk/worker-vad
@restnpeacepk/worker-vad/engines/fvad

Readme

worker-vad

Universal Voice Activity Detection SDK - Multiple WASM engines, one simple API

Detect speech in audio streams with WebAssembly-powered engines. Perfect for Cloudflare Workers, browsers, and Node.js.

✨ Features

🎯 Unified API - One interface for all VAD engines
🔄 Multiple Engines - fvad, libfvad, rnnoise support

// Create VAD instance const vad = await VAD.create({ sampleRate: 16000, mode: 'aggressive' });

// Process audio const result = vad.process(audioData);

if (result.isSpeech) { console.log('Speech detected!'); }

// Cleanup vad.destroy();


## 📖 Usage

### Basic Example

```javascript
import { VAD } from 'worker-vad';

const vad = await VAD.create({ sampleRate: 16000 });
const audioData = new Int16Array(480); // 30ms at 16kHz

const result = vad.process(audioData);
console.log(result.isSpeech);      // true/false
console.log(result.probability);   // 0.0 - 1.0

Web Audio API

import { VAD } from 'worker-vad';

// Get microphone
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);

// Create VAD
const vad = await VAD.create({ sampleRate: 16000 });

// Process audio
const processor = audioContext.createScriptProcessor(4096, 1, 1);
processor.onaudioprocess = (e) => {
  const float32 = e.inputBuffer.getChannelData(0);
  const pcm = VAD.floatTo16BitPCM(float32);
  
  const result = vad.process(pcm);
  if (result.isSpeech) {
    console.log('Speaking!');
  }
};

source.connect(processor);
processor.connect(audioContext.destination);

Cloudflare Workers

import { VAD } from 'worker-vad';

export default {
  async fetch(request) {
    const vad = await VAD.create({
      engine: 'fvad',
      sampleRate: 16000
    });
    
    const audioBuffer = await request.arrayBuffer();
    const result = vad.process(new Int16Array(audioBuffer));
    
    vad.destroy();
    
    return Response.json(result);
  }
};

🎛️ API Reference

`VAD.create(options)`

Create a new VAD instance.

Options:

engine - Engine to use ('auto', 'fvad', 'libfvad', 'rnnoise')
sampleRate - Audio sample rate (8000, 16000, 32000, 48000)
mode - VAD sensitivity ('quality', 'low', 'aggressive', 'very-aggressive')
frameDuration - Frame duration in ms (10, 20, 30)

Returns: Promise<VAD>

`vad.process(audioData)`

Process audio data.

Parameters:

audioData - Int16Array of PCM audio data

Returns:

{
  isSpeech: boolean,
  probability: number,
  timestamp: number,
  processingTime: number,
  engine: string,
  metadata: object
}

Utility Methods

VAD.floatTo16BitPCM(buffer)      // Float32Array → Int16Array
VAD.int16ToFloat(buffer)         // Int16Array → Float32Array
VAD.base64ToInt16(base64)        // Base64 → Int16Array
VAD.int16ToBase64(buffer)        // Int16Array → Base64
VAD.getAvailableEngines()        // List engines
VAD.getEngineCapabilities(name)  // Get engine info

🔧 Supported Engines

Engine	Size	Speed	Accuracy	Best For
fvad	20KB	⚡⚡⚡	⭐⭐⭐	Workers, Browser, Node
libfvad	20KB	⚡⚡⚡	⭐⭐⭐	Browser, Node
rnnoise	100KB	⚡⚡	⭐⭐⭐⭐	Browser, Node

📊 Performance

Processing Speed: < 0.1ms per 30ms frame
Bundle Size: 20KB (fvad engine)
Memory Usage: < 1MB per instance
Latency: < 50ms for real-time

🌐 Browser Support

✅ Chrome/Edge (latest)
✅ Firefox (latest)
✅ Safari (latest)
✅ Node.js 14+
✅ Cloudflare Workers

📝 Examples

See the examples directory for:

Real-time microphone detection
WebSocket streaming
Batch processing
Engine comparison

🤝 Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

📄 License

🙏 Acknowledgments

fvad-wasm - WebRTC VAD
Cloudflare Workers - Serverless platform

JSPM

@restnpeacepk/worker-vad