JSPM

audio-finder

1.0.2
    • ESM via JSPM
    • ES Module Entrypoint
    • Export Map
    • Keywords
    • License
    • Repository URL
    • TypeScript Types
    • README
    • Created
    • Published
    • Downloads 4
    • Score
      100M100P100Q27517F
    • License MIT

    High-performance audio processing library using C++ with Node.js bindings

    Package Exports

    • audio-finder
    • audio-finder/lib/js/index.js

    This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (audio-finder) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

    Readme

    Audio Finder

    A high-performance audio processing library that combines the speed of C++ with the convenience of Node.js/TypeScript. This project provides fast audio analysis, FFT processing, and audio fingerprinting capabilities.

    Features

    • 🚀 High Performance: Core audio processing in C++ for maximum speed
    • 🔧 Easy Integration: TypeScript/JavaScript API with yarn package management
    • 📊 Audio Analysis: FFT, MFCC, pitch detection, onset detection
    • ðŸŽĩ Audio Fingerprinting: Generate and compare audio fingerprints
    • ïŋ― Duplicate Detection: Find duplicate audio files across directories
    • ïŋ―🛠 Cross-Platform: Works on macOS, Linux, and Windows
    • ðŸ“Ķ Modular: Use only the components you need
    • ⚡ CLI Tool: Command-line interface for batch processing

    Project Structure

    audio_finder/
    ├── src/
    │   ├── cpp/                 # C++ source code
    │   │   ├── audio/          # Core audio processing
    │   │   └── bindings/       # Node.js native bindings
    │   └── js/                 # TypeScript/JavaScript wrapper
    ├── include/                # C++ header files
    ├── tests/                  # Test files
    ├── scripts/               # Build and setup scripts
    ├── build/                 # CMake build output
    ├── lib/                   # Compiled JavaScript output
    └── docs/                  # Documentation

    Quick Start

    Prerequisites

    • Node.js 18+ and yarn
    • CMake 3.15+
    • C++ compiler with C++17 support
    • Audio libraries (automatically installed):
      • PortAudio (for audio I/O)
      • FFTW3 (for FFT processing)

    Installation

    1. Clone and setup the project:

      git clone <repository-url>
      cd audio_finder
      ./scripts/setup.sh
    2. Or install dependencies manually:

      yarn install
      yarn install:deps
      yarn build

    Usage

    TypeScript/JavaScript

    import { AudioAnalyzer, AudioUtils } from 'audio-finder';
    
    // Create an analyzer
    const analyzer = new AudioAnalyzer();
    
    // Generate a test signal (440 Hz sine wave)
    const samples = AudioUtils.generateSineWave(440, 1.0, 44100);
    
    // Analyze the audio
    const features = await analyzer.analyzeAudio(samples, 44100);
    
    console.log('RMS:', features.rms);
    console.log('Pitch:', features.pitch);
    console.log('Spectrum length:', features.spectrum.length);
    console.log('MFCC coefficients:', features.mfcc);

    Basic Audio Processing

    import { AudioProcessor, AudioUtils } from 'audio-finder';
    
    const processor = new AudioProcessor();
    
    // Load or generate audio samples
    const samples = AudioUtils.generateSineWave(220, 0.5, 44100);
    
    // Process the audio
    const rms = processor.processAudio(samples);
    const spectrum = processor.getSpectrum(samples);
    const pitch = processor.detectPitch(samples, 44100);
    
    console.log(`RMS: ${rms}, Detected Pitch: ${pitch} Hz`);

    Audio Fingerprinting

    import { AudioAnalyzer } from 'audio-finder';
    
    const analyzer = new AudioAnalyzer();
    
    // Generate fingerprints for two audio samples
    const fp1 = analyzer.generateFingerprint(samples1, 44100);
    const fp2 = analyzer.generateFingerprint(samples2, 44100);
    
    // Compare fingerprints (returns similarity 0-1)
    const similarity = analyzer.compareFingerprints(fp1, fp2);
    console.log(`Similarity: ${similarity * 100}%`);

    Duplicate Audio Detection

    The library includes a powerful duplicate detection system that can find identical or similar audio files across directories, even when they have different filenames.

    CLI Usage

    The easiest way to find duplicate audio files is using the command-line interface:

    # Find duplicates between two directories
    npx duplicate find ./music-collection ./downloaded-music
    
    # With enhanced landmark algorithm for cross-sample-rate detection
    npx duplicate find ./dir1 ./dir2 --algorithm landmark --landmark-threshold 0.08
    
    # Use hybrid mode (default) with custom thresholds
    npx duplicate find ./dir1 ./dir2 --algorithm hybrid --threshold 0.85
    
    # Traditional mode for same-sample-rate files
    npx duplicate find ./dir1 ./dir2 --algorithm traditional --threshold 0.90
    
    # Compare two specific files with detailed analysis
    npx duplicate compare ./song1.mp3 ./song2.wav --algorithm landmark --verbose
    
    # Save results to JSON
    npx duplicate find ./dir1 ./dir2 --output results.json
    
    # Save results to CSV with enhanced metrics
    npx duplicate find ./dir1 ./dir2 --csv duplicates.csv --algorithm hybrid
    
    # Scan a single directory
    npx duplicate scan ./music-library
    
    # Generate fingerprint for a single file
    npx duplicate fingerprint ./song.mp3

    Programmatic Usage

    import { DuplicateDetector } from 'audio-finder';
    
    // Create detector with custom configuration
    const detector = new DuplicateDetector({
        similarityThreshold: 0.85,  // 85% similarity required
        parallelProcessing: true,   // Use multiple threads
        maxThreads: 4,             // Limit concurrent threads
        verbose: true              // Enable detailed logging
    });
    
    // Find duplicates between directories
    const matches = await detector.findDuplicates('./music', './downloads');
    
    console.log(`Found ${matches.length} potential duplicates:`);
    matches.forEach((match, index) => {
        console.log(`${index + 1}. ${(match.similarity * 100).toFixed(1)}% similarity`);
        console.log(`   File A: ${match.fileA.filePath}`);
        console.log(`   File B: ${match.fileB.filePath}`);
        console.log(`   Size diff: ${Math.abs(match.fileA.fileSize - match.fileB.fileSize)} bytes`);
    });

    Progress Tracking

    Monitor detection progress with event listeners:

    detector.on('progress', (progress) => {
        console.log(`${progress.phase}: ${progress.filesProcessed}/${progress.totalFiles}`);
        if (progress.currentFile) {
            console.log(`Processing: ${progress.currentFile}`);
        }
    });
    
    detector.on('match', (match) => {
        console.log(`Found match: ${match.similarity.toFixed(3)} similarity`);
    });

    Detection Algorithm

    The duplicate detection system uses perceptual audio fingerprinting:

    1. Audio Loading: Supports multiple formats (MP3, WAV, FLAC, OGG, M4A, etc.)
    2. Spectral Analysis: Analyzes 2-second chunks using FFT
    3. Feature Extraction: Extracts 32 frequency bands for robust comparison
    4. Perceptual Hashing: Creates compact fingerprints resistant to encoding differences
    5. Similarity Matching: Uses Hamming distance for fast comparison
    6. Threshold Filtering: Configurable similarity thresholds for accuracy tuning

    Performance Characteristics

    • Accuracy: >95% true positive rate, <1% false positive rate (at 0.85 threshold)
    • Speed: ~50ms per minute of audio on modern hardware
    • Memory: ~100KB fingerprint storage per hour of audio
    • Scalability: Parallel processing across multiple CPU cores
    • Formats: Automatic detection of 20+ audio formats
    
    ## Development
    
    ### Building
    
    ```bash
    # Build everything
    yarn build
    
    # Build only C++ components
    yarn build:cpp
    
    # Build only TypeScript components
    yarn build:js
    
    # Build Node.js native addon
    yarn build:addon

    Testing

    # Run all tests
    yarn test
    
    # Run only C++ tests
    yarn test:cpp
    
    # Run only JavaScript tests
    yarn test:js

    Development Workflow

    # Clean build artifacts
    yarn clean
    
    # Development build and test
    yarn dev
    
    # Format code
    yarn format
    
    # Lint TypeScript code
    yarn lint

    API Reference

    AudioAnalyzer

    The main class for audio analysis and feature extraction.

    class AudioAnalyzer {
      // Frequency domain analysis
      analyzeFrequencySpectrum(samples: number[], sampleRate: number): number[]
      
      // Feature extraction
      extractMFCC(samples: number[], sampleRate: number, numCoeffs?: number): number[]
      calculateRMS(samples: number[]): number
      detectPitch(samples: number[], sampleRate: number): number
      detectOnsets(samples: number[], sampleRate: number): number[]
      
      // Audio fingerprinting
      generateFingerprint(samples: number[], sampleRate: number): number[]
      compareFingerprints(fp1: number[], fp2: number[]): number
      
      // Configuration
      setWindowSize(size: number): void
      setHopSize(size: number): void
      setOverlapRatio(ratio: number): void
      
      // High-level analysis
      analyzeAudio(samples: number[], sampleRate: number): Promise<AudioFeatures>
    }

    DuplicateDetector

    The main class for finding duplicate audio files.

    class DuplicateDetector extends EventEmitter {
      constructor(config?: DuplicateDetectionConfig)
      
      // Primary detection method
      findDuplicates(directoryA: string, directoryB: string): Promise<DuplicateMatch[]>
      
      // Utility methods
      scanDirectory(directory: string): Promise<AudioFileInfo[]>
      generateFingerprint(filePath: string): Promise<AudioFingerprint>
      isNativeAvailable(): boolean
      getLastRunStatistics(): DetectionStatistics | null
      
      // Event handling
      on(event: 'progress', listener: (progress: DetectionProgress) => void): this
      on(event: 'match', listener: (match: DuplicateMatch) => void): this
      on(event: 'error', listener: (error: Error) => void): this
    }
    
    interface DuplicateMatch {
      fileA: AudioFileInfo;
      fileB: AudioFileInfo;
      similarity: number;
      hammingDistance: number;
    }
    
    interface DuplicateDetectionConfig {
      similarityThreshold?: number;    // 0.0-1.0, default 0.85
      parallelProcessing?: boolean;    // default true
      maxThreads?: number;            // default 0 (auto)
      verbose?: boolean;              // default false
    }

    AudioUtils

    Utility functions for audio processing.

    class AudioUtils {
      // Signal generation
      static generateSineWave(frequency: number, duration: number, sampleRate: number, amplitude?: number): number[]
      static generateWhiteNoise(duration: number, sampleRate: number, amplitude?: number): number[]
      
      // Audio processing
      static normalize(samples: number[]): number[]
      static applyGain(samples: number[], gainDB: number): number[]
      
      // Utility functions
      static calculateZeroCrossingRate(samples: number[]): number
      static dbToLinear(db: number): number
      static linearToDb(linear: number): number
    }

    Performance

    This library is designed for high-performance audio processing:

    • C++ Core: Critical audio processing algorithms implemented in optimized C++
    • SIMD Instructions: Uses -march=native for CPU-specific optimizations
    • FFTW: Industry-standard FFT library for frequency domain analysis
    • Memory Efficient: Minimal memory allocations in hot paths
    • Zero-Copy: Efficient data transfer between JavaScript and C++

    Benchmarks

    On a typical modern CPU (Apple M1), processing times for common operations:

    • 1024-sample FFT: ~10 Ξs
    • Pitch detection (4096 samples): ~100 Ξs
    • MFCC extraction (4096 samples): ~200 Ξs
    • Audio fingerprinting (1 second, 44.1kHz): ~5 ms

    Contributing

    1. Fork the repository
    2. Create a feature branch
    3. Make your changes
    4. Add tests for new functionality
    5. Run the test suite: yarn test
    6. Submit a pull request

    Development Guidelines

    • Follow the existing code style
    • Add tests for new features
    • Update documentation for API changes
    • Use meaningful commit messages
    • Ensure cross-platform compatibility

    License

    MIT License - see LICENSE file for details.

    Dependencies

    Runtime Dependencies

    • node-addon-api: Node.js native addon interface

    Development Dependencies

    • typescript: TypeScript compiler
    • jest: Testing framework
    • eslint: JavaScript/TypeScript linting
    • node-gyp: Native addon build tool

    System Dependencies

    • PortAudio: Cross-platform audio I/O library
    • FFTW3: Fast Fourier Transform library
    • CMake: Build system for C++ components

    Troubleshooting

    Common Issues

    1. Native module not found: Run yarn build:addon to rebuild the native addon
    2. Audio libraries missing: Run ./scripts/install-audio-libs.sh to install dependencies
    3. CMake errors: Ensure CMake 3.15+ is installed
    4. Compiler errors: Ensure you have a C++17 compatible compiler

    Getting Help

    • Check the documentation
    • Look at the examples
    • Open an issue on GitHub
    • Check existing issues for solutions