Package Exports
- maxsim-web
- maxsim-web/baseline
- maxsim-web/optimized
- maxsim-web/wasm
Readme
maxsim-web
โก High-performance JavaScript/WASM MaxSim for ColBERT and late-interaction retrieval
WASM+SIMD implementation with adaptive batching for variable-length documents. 5x faster than pure JavaScript.
๐ Live Demo โข API Guide โข Examples
Installation
npm install maxsim-webQuick Start
Standard API (2D Arrays)
import { MaxSim } from 'maxsim-web';
const maxsim = await MaxSim.create(); // Auto-selects best backend
// Single document
const score = maxsim.maxsim(queryEmbedding, docEmbedding);
// Batch processing
const scores = maxsim.maxsimBatch(queryEmbedding, [doc1, doc2, doc3]);Flat API (High Performance) ๐
For large batches (100+ docs) or when embeddings are already flat:
import { MaxSimWasm } from 'maxsim-web/wasm';
const maxsim = new MaxSimWasm();
await maxsim.init();
// Zero-copy batch processing - up to 16x faster!
const scores = maxsim.maxsimBatchFlat(
    queryFlat,                      // Float32Array
    queryTokens,                    // number
    docsFlat,                       // Float32Array (all docs concatenated)
    new Uint32Array(docTokenCounts), // tokens per doc
    embeddingDim                    // number
);Why use Flat API?
- โ Eliminates conversion overhead (~260ms for 1000 docs)
- โ Direct WASM calls - no intermediate allocations
- โ Perfect for embeddings from transformers.js, ONNX, etc.
๐ Complete API Guide | ๐ก Performance Comparison
Performance
Benchmark: 100 docs, 128-256 tokens (variable length)
| Implementation | Time | Speedup vs Baseline | 
|---|---|---|
| WASM+SIMD | 11.8ms | 5.2x faster โก | 
| JS Baseline | 61.0ms | 1.0x (baseline) | 
| JS Optimized | 61.8ms | ~1.0x (no improvement in browsers*) | 
*JS Optimized shows 1.25x speedup in Node.js but not in browsers
Why Flat API Matters
When processing 1000 documents:
- 2D Array API: ~320ms (61ms WASM + 260ms conversion overhead)
- Flat API: ~61ms (zero conversion)
- Savings: ~260ms (4.2x faster by avoiding conversions)
API Reference
Core Methods
// Standard (2D arrays)
maxsim.maxsim(query, doc)                    // Single doc
maxsim.maxsimBatch(query, docs)              // Batch
maxsim.maxsim_normalized(query, doc)         // Normalized scores
// Flat API (Float32Array - faster)
maxsim.maxsimFlat(queryFlat, qTokens, docFlat, dTokens, dim)
maxsim.maxsimBatchFlat(queryFlat, qTokens, docsFlat, tokenCounts, dim)
maxsim.maxsimFlat_normalized(...)            // Normalized variantWhen to Use Each
| Your Data | API | Why | 
|---|---|---|
| From transformers.js | Flat API | Already flat - zero overhead | 
| 100+ documents | Flat API | Eliminates conversion time | 
| 2D arrays | Standard API | Convenient | 
| Small batches (<100) | Either | Similar performance | 
Key Features
- โก Adaptive batching - Optimizes for variable-length documents
- ๐ฏ Zero dependencies - Lightweight installation
- ๐ Universal - Browser + Node.js
- ๐ฆ Progressive - Auto-selects fastest backend
- ๐ง TypeScript - Full type definitions
Use Cases
- ๐ Browser search - Client-side semantic retrieval
- ๐ Extensions - Real-time document similarity
- โก Node.js APIs - Fast similarity endpoints
- ๐ฑ PWAs - Offline-capable search
Important Notes
โ ๏ธ Normalized embeddings required: All methods expect L2-normalized embeddings. Modern models (ColBERT, BGE, E5) output normalized embeddings by default.
Two scoring variants:
- maxsim()- Official ColBERT (raw sum) - for ranking within single query
- maxsim_normalized()- Averaged scores - for cross-query comparison
Both produce identical rankings within a query, only absolute values differ.
Documentation
- ๐ Complete API Guide - Detailed usage, migration guide
- ๐ก API Comparison Example - Performance demo
- ๐ Benchmark Results - Detailed performance measurements
- ๐ฏ Performance Analysis - Optimization deep-dive
- ๐งช Interactive Benchmarks - Run benchmarks in your browser
Related
- mixedbread-ai/maxsim-cpu - Original C++/Python implementation
- ColBERT - Late interaction retrieval
- sentence-transformers - Embedding models
License
MIT
Quick tip: Use Flat API to avoid conversion overhead (4x faster for large batches) and WASM+SIMD for computation (5x faster than JS). Combined effect for 1000 docs: ~5x total speedup!