Package Exports
- audio-duplicates
- audio-duplicates/lib/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (audio-duplicates) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
Audio Duplicates
A high-performance audio duplicate detection library built with native C++ and Chromaprint fingerprinting technology. Quickly find duplicate audio files across large collections with robust detection that handles different encodings, bitrates, and formats.
✨ Features
- 🚀 High Performance: Native C++ implementation ~200x faster than JavaScript
- 🎵 Format Support: MP3, WAV, FLAC, OGG, M4A, AAC, WMA, and more
- ⚡ Fast Matching: Optimized inverted index for O(1) duplicate lookups
- 🔧 Robust Detection: Handles different bitrates, sample rates, and encodings
- 💻 CLI Tool: Full-featured command-line interface for batch processing
- 📝 TypeScript Support: Complete TypeScript definitions included
- 🌍 Cross-Platform: Windows, macOS, and Linux support
- 📊 Progress Reporting: Real-time progress bars and statistics
📦 Installation
Prerequisites
Install the required system libraries first:
macOS
brew install chromaprint libsndfile
Ubuntu/Debian
sudo apt-get update
sudo apt-get install libchromaprint-dev libsndfile1-dev
Windows
Download and install:
Install the Package
Global Installation (Recommended for CLI)
npm install -g audio-duplicates
Local Installation (for API usage)
npm install audio-duplicates
The package automatically uses prebuilt binaries when available, falling back to source compilation if needed.
🚀 Quick Start
CLI Usage
Scan for Duplicates
# Scan a single directory
audio-duplicates scan /path/to/music
# Scan multiple directories
audio-duplicates scan /music/collection1 /music/collection2
# Scan with custom threshold
audio-duplicates scan /path/to/music --threshold 0.9
# Save results to file
audio-duplicates scan /path/to/music --output duplicates.json --format json
Compare Two Files
audio-duplicates compare song1.mp3 song2.mp3
Generate Fingerprint
# Generate and display fingerprint
audio-duplicates fingerprint song.mp3
# Save fingerprint to file
audio-duplicates fingerprint song.mp3 --output fingerprint.json
API Usage
Basic Duplicate Detection
const audioDuplicates = require('audio-duplicates');
async function findDuplicates() {
// Scan directory for duplicates
const duplicates = await audioDuplicates.scanDirectoryForDuplicates('/path/to/music', {
threshold: 0.85,
onProgress: (progress) => {
console.log(`Processing: ${progress.current}/${progress.total} - ${progress.file}`);
}
});
// Display results
duplicates.forEach((group, index) => {
console.log(`\nDuplicate Group ${index + 1}:`);
group.files.forEach(file => {
console.log(` ${file.path} (similarity: ${file.similarity})`);
});
});
}
findDuplicates().catch(console.error);
Manual Fingerprint Comparison
const audioDuplicates = require('audio-duplicates');
async function compareFiles() {
// Generate fingerprints
const fp1 = await audioDuplicates.generateFingerprint('file1.mp3');
const fp2 = await audioDuplicates.generateFingerprint('file2.mp3');
// Compare fingerprints
const result = await audioDuplicates.compareFingerprints(fp1, fp2);
console.log('Similarity Score:', result.similarityScore);
console.log('Are Duplicates:', result.isDuplicate);
console.log('Confidence:', result.confidence);
}
compareFiles().catch(console.error);
Batch Processing with Index
const audioDuplicates = require('audio-duplicates');
async function batchProcess() {
// Initialize index for batch processing
await audioDuplicates.initializeIndex();
// Add files to index
const files = ['song1.mp3', 'song2.mp3', 'song3.mp3'];
for (const file of files) {
const fileId = await audioDuplicates.addFileToIndex(file);
console.log(`Added ${file} with ID: ${fileId}`);
}
// Find all duplicates in the index
const duplicateGroups = await audioDuplicates.findAllDuplicates();
console.log('Found', duplicateGroups.length, 'duplicate groups');
// Get index statistics
const stats = await audioDuplicates.getIndexStats();
console.log('Index Stats:', stats);
// Clear index when done
await audioDuplicates.clearIndex();
}
batchProcess().catch(console.error);
TypeScript Usage
import * as audioDuplicates from 'audio-duplicates';
import { DuplicateGroup, ScanOptions, Fingerprint } from 'audio-duplicates';
async function findDuplicatesTyped(): Promise<DuplicateGroup[]> {
const options: ScanOptions = {
threshold: 0.85,
maxDuration: 300, // 5 minutes max
onProgress: (progress: { current: number; total: number; file: string }) => {
console.log(`${progress.current}/${progress.total}: ${progress.file}`);
}
};
return await audioDuplicates.scanDirectoryForDuplicates('/path/to/music', options);
}
async function generateTypedFingerprint(filePath: string): Promise<Fingerprint> {
return await audioDuplicates.generateFingerprint(filePath);
}
📖 API Reference
Core Functions
generateFingerprint(filePath: string): Promise<Fingerprint>
Generate an audio fingerprint from a file.
const fingerprint = await audioDuplicates.generateFingerprint('song.mp3');
console.log('Duration:', fingerprint.duration);
console.log('Sample Rate:', fingerprint.sampleRate);
generateFingerprintLimited(filePath: string, maxDuration: number): Promise<Fingerprint>
Generate fingerprint with duration limit (in seconds).
// Only fingerprint first 30 seconds
const fingerprint = await audioDuplicates.generateFingerprintLimited('song.mp3', 30);
compareFingerprints(fp1: Fingerprint, fp2: Fingerprint): Promise<MatchResult>
Compare two fingerprints and return similarity metrics.
const result = await audioDuplicates.compareFingerprints(fp1, fp2);
console.log('Similarity:', result.similarityScore); // 0.0 to 1.0
console.log('Is Duplicate:', result.isDuplicate); // boolean
console.log('Confidence:', result.confidence); // 0.0 to 1.0
Index Management
initializeIndex(): Promise<boolean>
Initialize the fingerprint index for batch processing.
addFileToIndex(filePath: string): Promise<number>
Add a file to the index and return its unique ID.
findAllDuplicates(): Promise<DuplicateGroup[]>
Find all duplicate groups in the current index.
getIndexStats(): Promise<IndexStats>
Get statistics about the current index.
const stats = await audioDuplicates.getIndexStats();
console.log('Files:', stats.fileCount);
console.log('Index Size:', stats.indexSize);
console.log('Load Factor:', stats.loadFactor);
clearIndex(): Promise<boolean>
Clear the current index and free memory.
Configuration
setSimilarityThreshold(threshold: number): Promise<boolean>
Set the similarity threshold (0.0 to 1.0) for duplicate detection.
await audioDuplicates.setSimilarityThreshold(0.9); // Stricter matching
High-Level Utilities
scanDirectoryForDuplicates(directory: string, options?: ScanOptions): Promise<DuplicateGroup[]>
Scan a directory for duplicates with progress reporting.
Options:
threshold?: number
- Similarity threshold (default: 0.85)maxDuration?: number
- Max duration to fingerprint in secondsonProgress?: (progress) => void
- Progress callbackrecursive?: boolean
- Scan subdirectories (default: true)
🖥️ CLI Reference
Commands
scan <directories...>
Scan directories for duplicate audio files.
# Basic scan
audio-duplicates scan /music
# Advanced options
audio-duplicates scan /music \
--threshold 0.9 \
--format json \
--output results.json \
--max-duration 180 \
--no-progress
Options:
--threshold <number>
- Similarity threshold (0.0-1.0, default: 0.85)--format <format>
- Output format:json
,csv
, ortext
(default: text)--output <file>
- Output file path--max-duration <seconds>
- Maximum duration to fingerprint--no-progress
- Disable progress bar--recursive
- Scan subdirectories (default: true)
compare <file1> <file2>
Compare two audio files directly.
audio-duplicates compare song1.mp3 song2.wav --max-duration 60
fingerprint <file>
Generate and display fingerprint for an audio file.
audio-duplicates fingerprint song.mp3 --output fingerprint.json
Global Options
-v, --verbose
- Verbose output with detailed information--threshold <number>
- Global similarity threshold--format <format>
- Global output format
📊 Performance
Benchmarks
On a modern CPU (Apple M1):
- Fingerprint Generation: 2-5x real-time (faster than playback)
- Index Lookup: ~1ms per query
- Full Comparison: 10-50ms depending on file length
- Memory Usage: ~4KB per minute of audio
- Scalability: Efficiently handles 10,000+ files
Example Performance
Collection Size: 10,000 files (50GB)
Scan Time: ~8 minutes
Memory Usage: ~200MB
Duplicates Found: 847 groups (2,341 files)
🔧 Advanced Usage
Custom Similarity Thresholds
// Exact duplicates only (very strict)
await audioDuplicates.setSimilarityThreshold(0.95);
// Similar versions (more permissive)
await audioDuplicates.setSimilarityThreshold(0.75);
// Near-identical files (default)
await audioDuplicates.setSimilarityThreshold(0.85);
Handling Large Collections
async function processLargeCollection(directories) {
await audioDuplicates.initializeIndex();
for (const dir of directories) {
console.log(`Processing directory: ${dir}`);
// Process in batches to manage memory
const duplicates = await audioDuplicates.scanDirectoryForDuplicates(dir, {
threshold: 0.85,
maxDuration: 300, // Limit to 5 minutes per file
onProgress: (progress) => {
if (progress.current % 100 === 0) {
console.log(`Processed ${progress.current}/${progress.total} files`);
}
}
});
console.log(`Found ${duplicates.length} duplicate groups in ${dir}`);
}
// Get final results
const allDuplicates = await audioDuplicates.findAllDuplicates();
console.log(`Total duplicate groups: ${allDuplicates.length}`);
await audioDuplicates.clearIndex();
}
Output Formats
JSON Output
audio-duplicates scan /music --format json --output results.json
{
"summary": {
"totalFiles": 1500,
"duplicateGroups": 23,
"duplicateFiles": 67,
"spaceWasted": "1.2GB"
},
"duplicateGroups": [
{
"groupId": 1,
"avgSimilarity": 0.94,
"files": [
{
"path": "/music/song1.mp3",
"size": 5242880,
"similarity": 1.0
},
{
"path": "/music/copy/song1.mp3",
"size": 5242880,
"similarity": 0.94
}
]
}
]
}
CSV Output
audio-duplicates scan /music --format csv --output results.csv
🐛 Troubleshooting
Common Issues
Build Errors
# macOS: Install dependencies
brew install chromaprint libsndfile
# Ubuntu: Install dependencies
sudo apt-get install libchromaprint-dev libsndfile1-dev
# Clear npm cache and rebuild
npm cache clean --force
npm rebuild
Runtime Errors
"Could not locate bindings file"
npm run build
"Failed to open audio file"
- Check file format is supported
- Verify file permissions
- Ensure file is not corrupted
"Index not initialized"
// Always initialize before using index functions
await audioDuplicates.initializeIndex();
Performance Optimization
For large collections:
- Use
maxDuration
to limit fingerprint length - Process directories in batches
- Increase similarity threshold for faster results
- Use SSD storage for audio files
🤝 Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Make your changes and add tests
- Run the test suite:
npm test
- Submit a pull request
Development Setup
git clone https://github.com/mcande21/audio-duplicates.git
cd audio-duplicates
npm install
npm run build
npm test
📄 License
MIT License - see LICENSE file for details.
🙏 Acknowledgments
- Chromaprint - Audio fingerprinting library
- libsndfile - Audio file I/O library
- Node-API - Native addon interface
🔗 Related Projects
- AcoustID - Audio identification service
- fpcalc - Command-line fingerprinting tool
- MusicBrainz - Music metadata database
Happy duplicate hunting! 🎵