JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 14
  • Score
    100M100P100Q84219F
  • License MIT

Expo plugin for OpenAI Whisper speech-to-text integration with React Native

Package Exports

  • expo-whisper
  • expo-whisper/build/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (expo-whisper) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

expo-whisper

High-performance speech-to-text transcription for React Native/Expo apps using OpenAI's Whisper model

NPM Version Downloads License

Features

  • Real-time transcription with streaming audio support
  • Cross-platform - Works on iOS, Android, and Web
  • High accuracy using OpenAI's Whisper models
  • Native performance with C++ integration
  • Easy integration with React hooks
  • Multi-language support (100+ languages)
  • Flexible configuration for different use cases
  • Offline capable - No internet required

Getting Started

Prerequisites

Before you begin, make sure you have:

  • Expo SDK 54+ or React Native 0.74+
  • Development build environment set up (expo-whisper cannot run in Expo Go)
  • Android Studio (for Android builds) or Xcode (for iOS builds)

Step 1: Installation

# Install the package
npm install expo-whisper

# Install required peer dependencies
npx expo install expo-build-properties

Step 2: Configure app.json

Add the expo-whisper plugin to your app.json configuration:

{
  "expo": {
    "name": "Your App",
    "plugins": [
      [
        "expo-build-properties",
        {
          "android": {
            "minSdkVersion": 21,
            "compileSdkVersion": 34,
            "targetSdkVersion": 34,
            "buildToolsVersion": "34.0.0",
            "enableProguardInReleaseBuilds": true,
            "packagingOptions": {
              "pickFirst": ["**/libc++_shared.so", "**/libjsc.so"]
            }
          },
          "ios": {
            "deploymentTarget": "11.0"
          }
        }
      ],
      "expo-whisper"
    ],
    "android": {
      "permissions": [
        "android.permission.RECORD_AUDIO",
        "android.permission.READ_EXTERNAL_STORAGE",
        "android.permission.WRITE_EXTERNAL_STORAGE"
      ]
    },
    "ios": {
      "infoPlist": {
        "NSMicrophoneUsageDescription": "This app needs access to microphone for speech recognition."
      }
    }
  }
}

Step 3: Download a Whisper Model

Choose a model based on your performance and accuracy needs:

Model Size Speed Accuracy Use Case
tiny.en 39 MB Fastest Basic Real-time, mobile
base.en 74 MB Fast Good Mobile apps
small.en 244 MB Medium Better General purpose
medium.en 769 MB Slow High High accuracy needed

Download options:

# Option 1: Download to assets folder
mkdir -p assets/models
curl -L "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin" -o assets/models/ggml-base.en.bin

# Option 2: Use react-native-fs to download at runtime
npm install react-native-fs

Step 4: Build Your App

Important: expo-whisper requires a development build:

# For Android
npx expo run:android

# For iOS  
npx expo run:ios

# Or use EAS Build
eas build --platform android --profile development
eas build --platform ios --profile development

Cannot use Expo Go - Native libraries require a development build

Step 5: Basic Implementation

import React, { useEffect, useState } from 'react';
import { View, Button, Text, Alert } from 'react-native';
import { useWhisper } from 'expo-whisper';
import { Audio } from 'expo-av';

export default function App() {
  const {
    isModelLoaded,
    isTranscribing,
    lastResult,
    error,
    loadModel,
    transcribeFile,
    clearError,
  } = useWhisper();

  const [recording, setRecording] = useState<Audio.Recording>();

  useEffect(() => {
    // Load model when app starts
    const initializeWhisper = async () => {
      try {
        // Path to your downloaded model
        await loadModel('/path/to/assets/models/ggml-base.en.bin');
        Alert.alert('Success', 'Whisper model loaded successfully!');
      } catch (error) {
        Alert.alert('Error', `Failed to load model: ${error.message}`);
      }
    };

    initializeWhisper();
  }, []);

  const startRecording = async () => {
    try {
      const permission = await Audio.requestPermissionsAsync();
      if (permission.status !== 'granted') {
        Alert.alert('Permission required', 'Microphone access is needed');
        return;
      }

      await Audio.setAudioModeAsync({
        allowsRecordingIOS: true,
        playsInSilentModeIOS: true,
      });

      const { recording } = await Audio.Recording.createAsync(
        Audio.RECORDING_OPTIONS_PRESET_HIGH_QUALITY
      );
      setRecording(recording);
    } catch (err) {
      console.error('Failed to start recording', err);
    }
  };

  const stopRecording = async () => {
    if (!recording) return;

    setRecording(undefined);
    await recording.stopAndUnloadAsync();
    
    const uri = recording.getURI();
    if (uri) {
      try {
        const result = await transcribeFile(uri, {
          language: 'en',
          temperature: 0.0,
        });
        Alert.alert('Transcription', result.text);
      } catch (error) {
        Alert.alert('Error', `Transcription failed: ${error.message}`);
      }
    }
  };

  return (
    <View style={{ flex: 1, padding: 20, justifyContent: 'center' }}>
      <Text style={{ fontSize: 18, marginBottom: 20, textAlign: 'center' }}>
        Whisper Speech-to-Text
      </Text>

      <Text style={{ marginBottom: 10 }}>
        Model Status: {isModelLoaded ? ' Loaded' : ' Loading...'}
      </Text>

      <Button
        title={recording ? 'Stop Recording' : 'Start Recording'}
        onPress={recording ? stopRecording : startRecording}
        disabled={!isModelLoaded || isTranscribing}
      />

      {lastResult && (
        <View style={{ marginTop: 20, padding: 10, backgroundColor: '#f0f0f0' }}>
          <Text style={{ fontWeight: 'bold' }}>Last Transcription:</Text>
          <Text>{lastResult.text}</Text>
        </View>
      )}

      {error && (
        <View style={{ marginTop: 20 }}>
          <Text style={{ color: 'red' }}>Error: {error}</Text>
          <Button title="Clear Error" onPress={clearError} />
        </View>
      )}
    </View>
  );
}

Step 6: Test Your Implementation

  1. Build and install your development build
  2. Grant microphone permissions when prompted
  3. Tap "Start Recording" and speak clearly
  4. Tap "Stop Recording" to see transcription results

Configuration Options

Advanced app.json Setup

For production apps, you may want additional configuration:

{
  "expo": {
    "plugins": [
      [
        "expo-build-properties",
        {
          "android": {
            "minSdkVersion": 21,
            "compileSdkVersion": 34,
            "targetSdkVersion": 34,
            "proguardMinifyEnabled": true,
            "enableProguardInReleaseBuilds": true,
            "packagingOptions": {
              "pickFirst": [
                "**/libc++_shared.so",
                "**/libjsc.so",
                "**/libfbjni.so"
              ]
            }
          },
          "ios": {
            "deploymentTarget": "11.0",
            "bundler": "metro"
          }
        }
      ],
      [
        "expo-whisper",
        {
          "modelPath": "assets/models/ggml-base.en.bin",
          "enableMicrophone": true,
          "enableAudioSession": true
        }
      ]
    ]
  }
}

EAS Build Configuration

Create eas.json for cloud builds:

{
  "cli": {
    "version": ">= 5.9.0"
  },
  "build": {
    "development": {
      "developmentClient": true,
      "distribution": "internal",
      "android": {
        "buildType": "developmentBuild"
      },
      "ios": {
        "buildConfiguration": "Debug"
      }
    },
    "preview": {
      "distribution": "internal",
      "android": {
        "buildType": "apk"
      }
    },
    "production": {
      "android": {
        "buildType": "app-bundle"
      }
    }
  }
}

Development Build Required

expo-whisper uses native libraries and cannot run in Expo Go. You must use a development build:

Why Development Build is Required:

  • Native Libraries: expo-whisper includes compiled C++ libraries (libwhisper.so)
  • Custom Native Code: Direct integration with Whisper C++ implementation
  • Platform-specific Optimizations: Hardware-accelerated audio processing

Setting Up Development Build:

# Install development build tools
npx expo install expo-dev-client

# Build for development
npx expo run:android  # Local Android build
npx expo run:ios      # Local iOS build

# Or use EAS Build (recommended)
eas build --platform android --profile development
eas build --platform ios --profile development

Next Steps

After completing the setup:

  1. Explore the API: Check out the API Reference section
  2. Real-time Streaming: Learn about streaming transcription
  3. Performance Optimization: Read our Audio Guide
  4. Real-world Examples: See REAL_WORLD_EXAMPLE.md

API Reference

Hooks

useWhisper()

The main hook for managing Whisper transcription state and actions.

Returns:

{
  // State
  isModelLoaded: boolean;
  isLoading: boolean;
  isTranscribing: boolean;
  error: string | null;
  lastResult: WhisperResult | null;

  // Actions
  loadModel: (modelPath: string) => Promise<void>;
  transcribe: (audioPath: string, config?: WhisperConfig) => Promise<WhisperResult>;
  transcribeFile: (audioPath: string, config?: WhisperConfig) => Promise<WhisperResult>;
  transcribePCM: (pcmData: Uint8Array, config?: WhisperPCMConfig) => Promise<WhisperResult>;
  releaseModel: () => Promise<void>;
  getModelInfo: () => Promise<string>;
  getSupportedFormats: () => Promise<string>;
  clearError: () => void;
}

Core API

ExpoWhisper.loadModel(modelPath: string)

Load a Whisper model from the given file path.

ExpoWhisper.transcribeFile(audioPath: string, config?: WhisperConfig)

Transcribe an audio file. Supports WAV, MP3, M4A formats.

ExpoWhisper.transcribePCM(pcmData: Uint8Array, config?: WhisperPCMConfig)

Transcribe raw PCM audio data for real-time streaming.

Configuration Options

WhisperConfig

interface WhisperConfig {
  language?: string;           // Language code (e.g., 'en', 'es', 'auto')
  temperature?: number;        // Sampling temperature (0.0 - 1.0)
  maxTokens?: number;         // Maximum tokens to generate
  beamSize?: number;          // Beam search size
  bestOf?: number;            // Number of candidates
  patience?: number;          // Beam search patience
  lengthPenalty?: number;     // Length penalty
  suppressTokens?: number[];  // Tokens to suppress
  initialPrompt?: string;     // Initial prompt text
  wordTimestamps?: boolean;   // Include word-level timestamps
  preprendPunctuations?: string;
  appendPunctuations?: string;
}

WhisperPCMConfig

interface WhisperPCMConfig {
  sampleRate: number;         // Audio sample rate (16000 recommended)
  channels: number;           // Number of channels (1 for mono)
  realtime?: boolean;         // Enable real-time processing
  chunkSize?: number;         // Processing chunk size
}

Audio Format Support

Supported Formats:

  • WAV (recommended): 16kHz, 16-bit PCM, mono/stereo
  • MP3: Various bitrates and sample rates
  • M4A: AAC encoded audio
  • Raw PCM: For streaming applications

Optimal Settings for Best Performance:

  • Sample Rate: 16kHz
  • Bit Depth: 16-bit
  • Channels: Mono (1 channel)
  • Format: WAV or raw PCM for streaming

Advanced Usage

Real-time Streaming

import { useWhisper } from 'expo-whisper';

function StreamingComponent() {
  const { transcribePCM } = useWhisper();

  const processAudioChunk = async (audioData: Uint8Array) => {
    const result = await transcribePCM(audioData, {
      sampleRate: 16000,
      channels: 1,
      realtime: true,
    });

    console.log('Live transcription:', result.text);
  };
}

Batch Processing

const processMultipleFiles = async (filePaths: string[]) => {
  const results = [];

  for (const path of filePaths) {
    try {
      const result = await transcribeFile(path, {
        language: 'auto',
        temperature: 0.0,
      });
      results.push({ path, text: result.text });
    } catch (error) {
      results.push({ path, error: error.message });
    }
  }

  return results;
};

Custom Configuration Presets

// High accuracy preset
const highAccuracyConfig = {
  temperature: 0.0,
  beamSize: 5,
  bestOf: 5,
  wordTimestamps: true,
};

// Fast processing preset
const fastConfig = {
  temperature: 0.1,
  beamSize: 1,
  bestOf: 1,
  maxTokens: 224,
};

// Real-time streaming preset
const streamingConfig = {
  sampleRate: 16000,
  channels: 1,
  realtime: true,
  chunkSize: 1024,
};

Platform Support

Platform Status Native Library
iOS libwhisper.a
Android libwhisper.so
Web WASM (planned)

Development

Building from Source

# Clone the repository
git clone https://github.com/poovarasan4046/expo-whisper.git
cd expo-whisper

# Install dependencies
npm install

# Build the package
npm run build

# Run tests
npm test

Native Dependencies

The package includes pre-compiled Whisper libraries:

  • Android: libwhisper.so (ARM64, ARMv7)
  • iOS: libwhisper.a (ARM64, x86_64)

Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Support


Made with for the React Native community