Package Exports
- whisper-web-transcriber
- whisper-web-transcriber/dist/index.esm.js
- whisper-web-transcriber/dist/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (whisper-web-transcriber) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
Whisper Web Transcriber
Real-time audio transcription in the browser using OpenAI's Whisper model via WebAssembly. This package provides an easy-to-use API for integrating speech-to-text capabilities into web applications without any server-side processing.
Live Demo 🎙️
Features
- 🎙️ Real-time audio transcription from microphone
- 🌐 Runs entirely in the browser (no server required)
- 📦 Multiple Whisper model options (tiny, base, quantized versions)
- 💾 Automatic model caching in IndexedDB
- 🔧 Simple, promise-based API
- 📱 Works on all modern browsers with WebAssembly support
- 🌍 Platform-independent (same WASM works on all OS)
Installation
npm install whisper-web-transcriber
Or using yarn:
yarn add whisper-web-transcriber
Quick Start
import { WhisperTranscriber } from 'whisper-web-transcriber';
// Create a new transcriber instance
const transcriber = new WhisperTranscriber({
modelSize: 'base-en-q5_1', // or 'tiny.en', 'base.en', 'tiny-en-q5_1'
onTranscription: (text) => {
console.log('Transcribed:', text);
document.getElementById('transcription').textContent += text + ' ';
},
onProgress: (progress) => {
console.log('Loading progress:', progress + '%');
},
onStatus: (status) => {
console.log('Status:', status);
}
});
// Load the model (only needed once, cached in browser)
await transcriber.loadModel();
// Start recording
await transcriber.startRecording();
// Stop recording
transcriber.stopRecording();
API Reference
Constructor Options
interface WhisperConfig {
modelUrl?: string; // Custom model URL (optional)
modelSize?: 'tiny.en' | 'base.en' | 'tiny-en-q5_1' | 'base-en-q5_1';
sampleRate?: number; // Audio sample rate (default: 16000)
audioIntervalMs?: number; // Audio processing interval (default: 5000ms)
onTranscription?: (text: string) => void;
onProgress?: (progress: number) => void;
onStatus?: (status: string) => void;
debug?: boolean; // Enable debug logging (default: false)
}
Methods
loadModel(): Promise<void>
- Downloads and initializes the Whisper modelstartRecording(): Promise<void>
- Starts microphone recording and transcriptionstopRecording(): void
- Stops recordingdestroy(): void
- Cleanup resources
Model Options
Model | Size | Description |
---|---|---|
tiny.en |
75 MB | Fastest, lower accuracy |
base.en |
142 MB | Better accuracy, slower |
tiny-en-q5_1 |
31 MB | Quantized tiny model, smaller size |
base-en-q5_1 |
57 MB | Quantized base model, good balance |
Browser Requirements
- WebAssembly support
- SharedArrayBuffer support
- Microphone access permission
- Modern browser (Chrome 90+, Firefox 89+, Safari 15+, Edge 90+)
CORS and Security Headers
For SharedArrayBuffer support, your site needs specific headers:
Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin
If you're using the included demo server:
npm run demo
Example HTML
<!DOCTYPE html>
<html>
<head>
<title>Whisper Transcriber Demo</title>
</head>
<body>
<button id="load">Load Model</button>
<button id="start" disabled>Start</button>
<button id="stop" disabled>Stop</button>
<div id="status"></div>
<div id="progress"></div>
<div id="transcription"></div>
<script type="module">
import { WhisperTranscriber } from 'whisper-web-transcriber';
const transcriber = new WhisperTranscriber({
onTranscription: (text) => {
document.getElementById('transcription').textContent += text + ' ';
},
onProgress: (progress) => {
document.getElementById('progress').textContent = progress + '%';
},
onStatus: (status) => {
document.getElementById('status').textContent = status;
}
});
document.getElementById('load').onclick = async () => {
await transcriber.loadModel();
document.getElementById('start').disabled = false;
};
document.getElementById('start').onclick = async () => {
await transcriber.startRecording();
document.getElementById('start').disabled = true;
document.getElementById('stop').disabled = false;
};
document.getElementById('stop').onclick = () => {
transcriber.stopRecording();
document.getElementById('start').disabled = false;
document.getElementById('stop').disabled = true;
};
</script>
</body>
</html>
Performance Considerations
- Transcription is CPU-intensive
- Larger models provide better accuracy but require more processing power
- Quantized models (Q5_1) offer good balance between size and quality
- First-time model loading may take time (models are cached afterward)
Technical Details
Built using:
- whisper.cpp compiled to WebAssembly
- Web Audio API for microphone access
- IndexedDB for model caching
- Service Worker for Cross-Origin Isolation
License
MIT
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Acknowledgments
- whisper.cpp by Georgi Gerganov
- OpenAI Whisper for the original model