JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 17
  • Score
    100M100P100Q68597F
  • License MIT

Real-time audio transcription in the browser using OpenAI's Whisper model via WebAssembly

Package Exports

  • whisper-web-transcriber

Readme

Whisper Web Transcriber

Real-time audio transcription in the browser using OpenAI's Whisper model via WebAssembly. This package provides an easy-to-use API for integrating speech-to-text capabilities into web applications without any server-side processing.

Live Demo 🎙️

Features

  • 🎙️ Real-time audio transcription from microphone
  • 🌐 Runs entirely in the browser (no server required)
  • 📦 Multiple Whisper model options (tiny, base, quantized versions)
  • 💾 Automatic model caching in IndexedDB
  • 🔧 Simple, promise-based API
  • 📱 Works on all modern browsers with WebAssembly support
  • 🌍 Platform-independent (same WASM works on all OS)

Installation

npm install whisper-web-transcriber

Or using yarn:

yarn add whisper-web-transcriber

Quick Start

import { WhisperTranscriber } from 'whisper-web-transcriber';

// Create a new transcriber instance
const transcriber = new WhisperTranscriber({
  modelSize: 'base-en-q5_1', // or 'tiny.en', 'base.en', 'tiny-en-q5_1'
  onTranscription: (text) => {
    console.log('Transcribed:', text);
    document.getElementById('transcription').textContent += text + ' ';
  },
  onProgress: (progress) => {
    console.log('Loading progress:', progress + '%');
  },
  onStatus: (status) => {
    console.log('Status:', status);
  }
});

// Load the model (only needed once, cached in browser)
await transcriber.loadModel();

// Start recording
await transcriber.startRecording();

// Stop recording
transcriber.stopRecording();

API Reference

Constructor Options

interface WhisperConfig {
  modelUrl?: string;              // Custom model URL (optional)
  modelSize?: 'tiny.en' | 'base.en' | 'tiny-en-q5_1' | 'base-en-q5_1';
  sampleRate?: number;            // Audio sample rate (default: 16000)
  audioIntervalMs?: number;       // Audio processing interval (default: 5000ms)
  onTranscription?: (text: string) => void;
  onProgress?: (progress: number) => void;
  onStatus?: (status: string) => void;
  debug?: boolean;                // Enable debug logging (default: false)
}

Methods

  • loadModel(): Promise<void> - Downloads and initializes the Whisper model
  • startRecording(): Promise<void> - Starts microphone recording and transcription
  • stopRecording(): void - Stops recording
  • destroy(): void - Cleanup resources

Model Options

Model Size Description
tiny.en 75 MB Fastest, lower accuracy
base.en 142 MB Better accuracy, slower
tiny-en-q5_1 31 MB Quantized tiny model, smaller size
base-en-q5_1 57 MB Quantized base model, good balance

Browser Requirements

  • WebAssembly support
  • SharedArrayBuffer support
  • Microphone access permission
  • Modern browser (Chrome 90+, Firefox 89+, Safari 15+, Edge 90+)

CORS and Security Headers

For SharedArrayBuffer support, your site needs specific headers:

Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin

If you're using the included demo server:

npm run demo

Example HTML

<!DOCTYPE html>
<html>
<head>
  <title>Whisper Transcriber Demo</title>
</head>
<body>
  <button id="load">Load Model</button>
  <button id="start" disabled>Start</button>
  <button id="stop" disabled>Stop</button>
  <div id="status"></div>
  <div id="progress"></div>
  <div id="transcription"></div>

  <script type="module">
    import { WhisperTranscriber } from 'whisper-web-transcriber';

    const transcriber = new WhisperTranscriber({
      onTranscription: (text) => {
        document.getElementById('transcription').textContent += text + ' ';
      },
      onProgress: (progress) => {
        document.getElementById('progress').textContent = progress + '%';
      },
      onStatus: (status) => {
        document.getElementById('status').textContent = status;
      }
    });

    document.getElementById('load').onclick = async () => {
      await transcriber.loadModel();
      document.getElementById('start').disabled = false;
    };

    document.getElementById('start').onclick = async () => {
      await transcriber.startRecording();
      document.getElementById('start').disabled = true;
      document.getElementById('stop').disabled = false;
    };

    document.getElementById('stop').onclick = () => {
      transcriber.stopRecording();
      document.getElementById('start').disabled = false;
      document.getElementById('stop').disabled = true;
    };
  </script>
</body>
</html>

Performance Considerations

  • Transcription is CPU-intensive
  • Larger models provide better accuracy but require more processing power
  • Quantized models (Q5_1) offer good balance between size and quality
  • First-time model loading may take time (models are cached afterward)

Technical Details

Built using:

  • whisper.cpp compiled to WebAssembly
  • Web Audio API for microphone access
  • IndexedDB for model caching
  • Service Worker for Cross-Origin Isolation

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments