Package Exports
- @jaehyun-ko/speaker-verification
- @jaehyun-ko/speaker-verification/speaker-verification
Readme
NeXt-TDNN Speaker Verification for Web
Real-time speaker verification in the browser using NeXt-TDNN models. Compare two audio samples to determine if they're from the same speaker.
🎯 Live Demo
Try it now: https://jaehyun-ko.github.io/node-speaker-verification/
Simple and intuitive speaker verification:
- 🎤 Record audio directly from microphone
- 📁 Upload audio files
- 🔍 Get similarity score instantly
🚀 Quick Start (Simple API)
API Methods
initialize(model, options?)
- Initialize with a modelcompareAudio(audio1, audio2)
- Compare two audio samplesgetEmbedding(audio)
- Extract speaker embedding from audiocompareEmbeddings(embedding1, embedding2)
- Compare pre-computed embeddingscleanup()
- Release resources
CDN Usage (Simplest - Just 3 Lines!)
<!DOCTYPE html>
<html>
<head>
<!-- IMPORTANT: Load ONNX Runtime first -->
<script src="https://cdn.jsdelivr.net/npm/onnxruntime-web@1.16.3/dist/ort.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@jaehyun-ko/speaker-verification@5.0.0/dist/speaker-verification.js"></script>
</head>
<body>
<input type="file" id="audio1" accept="audio/*">
<input type="file" id="audio2" accept="audio/*">
<button onclick="compareSpeakers()">Compare</button>
<script>
// Create verifier instance
const verifier = new SpeakerVerification();
async function compareSpeakers() {
// 1. Initialize (only needed once)
await verifier.initialize('standard-256');
// 2. Get audio files
const file1 = document.getElementById('audio1').files[0];
const file2 = document.getElementById('audio2').files[0];
// 3. Compare! That's it!
const result = await verifier.compareAudio(file1, file2);
console.log('Similarity:', (result.similarity * 100).toFixed(1) + '%');
console.log('Same speaker?', result.similarity > 0.5); // You decide the threshold!
}
</script>
</body>
</html>
NPM Installation
# Install both ONNX Runtime and the speaker verification library
npm install onnxruntime-web @jaehyun-ko/speaker-verification
import * as ort from 'onnxruntime-web';
import { SpeakerVerification } from '@jaehyun-ko/speaker-verification';
// Optional: Configure ONNX Runtime WASM paths if needed
// ort.env.wasm.wasmPaths = 'https://cdn.jsdelivr.net/npm/onnxruntime-web@1.16.3/dist/';
// Create instance
const verifier = new SpeakerVerification();
// Initialize with model (auto-downloads from Hugging Face)
await verifier.initialize('standard-256'); // or 'mobile-128' for smaller/faster
// Compare any audio format (File, Blob, ArrayBuffer, Float32Array)
const result = await verifier.compareAudio(audio1, audio2);
console.log(result);
// {
// similarity: 0.92, // 0.0 to 1.0 (higher = more similar)
// processingTime: 523 // milliseconds
// }
// You decide what threshold to use
const isSameSpeaker = result.similarity > 0.5; // Common threshold: 0.5
Available Models
// Standard models (best accuracy)
'standard-256' // 28MB - Recommended
'standard-128' // 7.5MB - Faster
'standard-192' // 16MB
'standard-384' // 32MB - Highest accuracy
// Mobile models (optimized for size/speed)
'mobile-128' // 5MB - Smallest
'mobile-256' // 20MB - Best mobile balance
📱 Microphone Recording
// With the simple API, just pass the recorded blob
const verifier = new SpeakerVerification();
await verifier.initialize('standard-256');
// Record audio using browser API
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const mediaRecorder = new MediaRecorder(stream);
const chunks = [];
mediaRecorder.ondataavailable = (e) => chunks.push(e.data);
mediaRecorder.onstop = async () => {
const audioBlob = new Blob(chunks, { type: 'audio/webm' });
// Compare with another audio
const result = await verifier.compareAudio(audioBlob, anotherAudio);
console.log('Similarity:', result.similarity);
};
mediaRecorder.start();
setTimeout(() => mediaRecorder.stop(), 3000); // Record for 3 seconds
🎛️ Available Models
All models are hosted on Hugging Face.
Simple API Model Keys
Key | Size | Channels | Description |
---|---|---|---|
standard-256 |
28MB | 256 | Recommended - Best balance |
standard-128 |
7.5MB | 128 | Compact, faster processing |
standard-192 |
16MB | 192 | Medium size and accuracy |
standard-384 |
32MB | 384 | Highest accuracy |
mobile-128 |
5MB | 128 | Smallest, mobile-optimized |
mobile-256 |
20MB | 256 | Best mobile balance |
Full Model Names (for advanced usage)
Model | Size | Description |
---|---|---|
NeXt_TDNN_C256_B3_K65_7_cosine |
28MB | Standard 256-channel |
NeXt_TDNN_C128_B3_K65_7_cosine |
7.5MB | Compact 128-channel |
NeXt_TDNN_C192_B1_K65_7_cosine |
16MB | Medium 192-channel |
NeXt_TDNN_C384_B1_K65_7_cosine |
32MB | Large 384-channel |
NeXt_TDNN_light_C128_B3_K65_7_cosine |
5MB | Mobile 128-channel |
NeXt_TDNN_light_C256_B3_K65_7_cosine |
20MB | Mobile 256-channel |
📊 Understanding Results
- Similarity Score: 0.0 to 1.0 (higher = more similar)
- Recommended Threshold: 0.5
- Adjust threshold based on your needs:
- Higher threshold (0.7+) = More strict, fewer false positives
- Lower threshold (0.3-) = More permissive, fewer false negatives
🛠️ Advanced Usage
Custom Model Loading with Simple API
// Load custom model from ArrayBuffer
const modelData = await fetch('path/to/custom-model.onnx').then(r => r.arrayBuffer());
const verifier = new SpeakerVerification();
await verifier.initialize('standard-256', { modelData });
// Or disable caching for development
await verifier.initialize('standard-256', { cacheModel: false });
Batch Processing
const verifier = new SpeakerVerification();
await verifier.initialize('standard-256');
// Compare multiple audio pairs
const results = [];
for (let i = 0; i < audioFiles.length - 1; i++) {
const result = await verifier.compareAudio(audioFiles[i], audioFiles[i + 1]);
results.push(result);
}
// Get average similarity
const avgSimilarity = results.reduce((sum, r) => sum + r.similarity, 0) / results.length;
Working with Embeddings
You can now extract and compare speaker embeddings directly:
const verifier = new SpeakerVerification();
await verifier.initialize('standard-256');
// Extract embeddings from audio
const embedding1 = await verifier.getEmbedding(audio1);
const embedding2 = await verifier.getEmbedding(audio2);
console.log('Embedding 1:', embedding1);
// {
// embedding: Float32Array(192), // Normalized speaker vector
// processingTime: 245 // milliseconds
// }
// Compare pre-computed embeddings
const similarity = verifier.compareEmbeddings(embedding1.embedding, embedding2.embedding);
console.log('Similarity:', similarity); // 0.0 to 1.0
// Store embeddings for later use
const embeddingData = Array.from(embedding1.embedding); // Convert to regular array for storage
localStorage.setItem('speaker1', JSON.stringify(embeddingData));
// Load and use stored embeddings
const storedData = JSON.parse(localStorage.getItem('speaker1'));
const storedEmbedding = new Float32Array(storedData);
const similarity2 = verifier.compareEmbeddings(storedEmbedding, embedding2.embedding);
This is useful for:
- Building speaker databases
- Caching embeddings for performance
- Analyzing speaker characteristics
- Custom similarity metrics
📝 License
Apache License 2.0
🤝 Credits
Based on NeXt-TDNN architecture for speaker verification.
📚 Citation
If you use this library in your research, please cite:
@INPROCEEDINGS{10447037,
author={Heo, Hyun-Jun and Shin, Ui-Hyeop and Lee, Ran and Cheon, YoungJu and Park, Hyung-Min},
booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification},
year={2024},
volume={},
number={},
pages={11186-11190},
keywords={Convolution;Speech recognition;Transformers;Acoustics;Task analysis;Speech processing;speaker recognition;speaker verification;TDNN;ConvNeXt;multi-scale},
doi={10.1109/ICASSP48485.2024.10447037}}