JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 41
  • Score
    100M100P100Q70014F
  • License MIT

Real-time audio transcription in the browser using OpenAI's Whisper model via WebAssembly

Package Exports

  • whisper-web-transcriber
  • whisper-web-transcriber/dist/index.esm.js
  • whisper-web-transcriber/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (whisper-web-transcriber) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

Whisper Web Transcriber

Real-time audio transcription in the browser using OpenAI's Whisper model via WebAssembly. This package provides an easy-to-use API for integrating speech-to-text capabilities into web applications without any server-side processing.

Live Demo 🎙️

Features

  • 🎙️ Real-time audio transcription from microphone
  • 🌐 Runs entirely in the browser (no server required)
  • 📦 Multiple Whisper model options (tiny, base, quantized versions)
  • 💾 Automatic model caching in IndexedDB
  • 🔧 Simple, promise-based API
  • 📱 Works on all modern browsers with WebAssembly support
  • 🌍 Platform-independent (same WASM works on all OS)

Installation

npm install whisper-web-transcriber

Or using yarn:

yarn add whisper-web-transcriber

Quick Start

import { WhisperTranscriber } from 'whisper-web-transcriber';

// Create a new transcriber instance
const transcriber = new WhisperTranscriber({
  modelSize: 'base-en-q5_1', // or 'tiny.en', 'base.en', 'tiny-en-q5_1'
  onTranscription: (text) => {
    console.log('Transcribed:', text);
    document.getElementById('transcription').textContent += text + ' ';
  },
  onProgress: (progress) => {
    console.log('Loading progress:', progress + '%');
  },
  onStatus: (status) => {
    console.log('Status:', status);
  }
});

// Load the model (only needed once, cached in browser)
await transcriber.loadModel();

// Start recording
await transcriber.startRecording();

// Stop recording
transcriber.stopRecording();

API Reference

Constructor Options

interface WhisperConfig {
  modelUrl?: string;              // Custom model URL (optional)
  modelSize?: 'tiny.en' | 'base.en' | 'tiny-en-q5_1' | 'base-en-q5_1';
  sampleRate?: number;            // Audio sample rate (default: 16000)
  audioIntervalMs?: number;       // Audio processing interval (default: 5000ms)
  onTranscription?: (text: string) => void;
  onProgress?: (progress: number) => void;
  onStatus?: (status: string) => void;
  debug?: boolean;                // Enable debug logging (default: false)
}

Methods

  • loadModel(): Promise<void> - Downloads and initializes the Whisper model
  • startRecording(): Promise<void> - Starts microphone recording and transcription
  • stopRecording(): void - Stops recording
  • destroy(): void - Cleanup resources

Model Options

Model Size Description
tiny.en 75 MB Fastest, lower accuracy
base.en 142 MB Better accuracy, slower
tiny-en-q5_1 31 MB Quantized tiny model, smaller size
base-en-q5_1 57 MB Quantized base model, good balance

Browser Requirements

  • WebAssembly support
  • SharedArrayBuffer support
  • Microphone access permission
  • Modern browser (Chrome 90+, Firefox 89+, Safari 15+, Edge 90+)

CORS and Security Headers

For SharedArrayBuffer support, your site needs specific headers:

Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin

If you're using the included demo server:

npm run demo

Example HTML

<!DOCTYPE html>
<html>
<head>
  <title>Whisper Transcriber Demo</title>
</head>
<body>
  <button id="load">Load Model</button>
  <button id="start" disabled>Start</button>
  <button id="stop" disabled>Stop</button>
  <div id="status"></div>
  <div id="progress"></div>
  <div id="transcription"></div>

  <script type="module">
    import { WhisperTranscriber } from 'whisper-web-transcriber';

    const transcriber = new WhisperTranscriber({
      onTranscription: (text) => {
        document.getElementById('transcription').textContent += text + ' ';
      },
      onProgress: (progress) => {
        document.getElementById('progress').textContent = progress + '%';
      },
      onStatus: (status) => {
        document.getElementById('status').textContent = status;
      }
    });

    document.getElementById('load').onclick = async () => {
      await transcriber.loadModel();
      document.getElementById('start').disabled = false;
    };

    document.getElementById('start').onclick = async () => {
      await transcriber.startRecording();
      document.getElementById('start').disabled = true;
      document.getElementById('stop').disabled = false;
    };

    document.getElementById('stop').onclick = () => {
      transcriber.stopRecording();
      document.getElementById('start').disabled = false;
      document.getElementById('stop').disabled = true;
    };
  </script>
</body>
</html>

Performance Considerations

  • Transcription is CPU-intensive
  • Larger models provide better accuracy but require more processing power
  • Quantized models (Q5_1) offer good balance between size and quality
  • First-time model loading may take time (models are cached afterward)

Technical Details

Built using:

  • whisper.cpp compiled to WebAssembly
  • Web Audio API for microphone access
  • IndexedDB for model caching
  • Service Worker for Cross-Origin Isolation

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments