JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 640
  • Score
    100M100P100Q91586F
  • License MIT

High-performance MaxSim scoring with WebAssembly and SIMD for ColBERT-style retrieval - 3-5x faster than JavaScript

Package Exports

  • maxsim-web
  • maxsim-web/baseline
  • maxsim-web/optimized
  • maxsim-web/wasm

Readme

maxsim-web

⚡ High-performance MaxSim scoring with WebAssembly and SIMD for ColBERT-style retrieval

npm version License: MIT

🚀 Live Demo📊 Performance Report📚 API Guide


Features

🚀 3-5x faster than JavaScript - WASM+SIMD optimized implementation 📦 45KB gzipped - Tiny bundle size, 12% smaller than v0.5.0 ⚡ Preloading API - Load documents once, search thousands of times 🎯 Zero dependencies - Pure WASM implementation 🔧 Simple API - Works with any L2-normalized embeddings


Performance (v0.6.0)

Latest benchmarks:

Scenario Documents Query Tokens Performance vs JavaScript
Variable Large 1000 docs 32 tokens 265ms 3.55x faster 🔥
Variable Medium 1000 docs 13 tokens 134ms 3.34x faster 🔥

With preloading:

  • Load documents once: ~230ms (one-time cost)
  • Each search: ~265ms (vs 479ms non-preloaded)
  • 1.81x faster per search + zero conversion overhead
  • Break-even after 2 searches

See Performance Report for detailed benchmarks and analysis.


What's New in v0.6.0

🚀 Major performance improvement: 2.8-5.2x faster!

  • Removed cosine similarity, use dot product exclusively (3-5x faster per operation)
  • Binary size: 51KB → 45KB (12% smaller)
  • WASM now 3-5x faster than JavaScript (was 1.0-1.2x)
  • No breaking changes - all APIs work the same

See Release Notes for details.


Installation

npm install maxsim-web

Quick Start

Basic Usage

import { createMaxSim } from 'maxsim-web';

// Initialize (auto-detects best: WASM+SIMD, WASM, or JS fallback)
const maxsim = await createMaxSim();

// Prepare embeddings (must be L2-normalized!)
const queryEmbedding = [[0.1, 0.2, ...], ...];  // [query_tokens, embedding_dim]
const docEmbeddings = [
  [[0.3, 0.4, ...], ...],  // Doc 1
  [[0.5, 0.6, ...], ...],  // Doc 2
];

// Compute MaxSim scores
const scores = maxsim.maxsimBatch(queryEmbedding, docEmbeddings);
console.log(scores);  // Float32Array of scores (one per document)

Use case: Search the same document set repeatedly with different queries

import { createMaxSim } from 'maxsim-web';

const maxsim = await createMaxSim();

// Step 1: Prepare documents as flat arrays (one-time conversion)
const embeddingDim = 256;
const docTokenCounts = new Uint32Array([doc1.length, doc2.length, ...]);

// Flatten all document embeddings into single Float32Array
const allEmbeddings = new Float32Array(totalTokens * embeddingDim);
// ... copy embeddings into allEmbeddings ...

// Step 2: Load documents (one-time, ~230ms for 1000 docs)
await maxsim.loadDocuments(allEmbeddings, docTokenCounts, embeddingDim);

// Step 3: Search repeatedly (fast! ~265ms per search)
const queryFlat = new Float32Array(queryTokens * embeddingDim);
// ... copy query into queryFlat ...

const scores1 = maxsim.wasmInstance.search_preloaded(queryFlat, queryTokens);
const scores2 = maxsim.wasmInstance.search_preloaded(queryFlat2, queryTokens2);
// ... search 1000s of times with zero conversion overhead!

Performance benefit:

  • First search: 230ms (load) + 265ms (search) = 495ms
  • Subsequent searches: 265ms each (vs 479ms non-preloaded)
  • Recommended for 10+ searches on same document set

API Reference

Main Methods

maxsimBatch(queryEmbedding, docEmbeddings)

Compute MaxSim scores for multiple documents (raw sum).

Parameters:

  • queryEmbedding: Array of query token embeddings [query_tokens][embedding_dim]
  • docEmbeddings: Array of document embeddings [num_docs][doc_tokens][embedding_dim]

Returns: Float32Array of scores (one per document)

Use case: Ranking documents for a single query

Example:

const query = [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]];  // 2 tokens
const docs = [
  [[0.7, 0.8, 0.9]],           // Doc 1: 1 token
  [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]]  // Doc 2: 2 tokens
];
const scores = maxsim.maxsimBatch(query, docs);
// scores = Float32Array [score1, score2]

maxsimBatch_normalized(queryEmbedding, docEmbeddings)

Same as maxsimBatch but returns averaged scores (score / query_tokens).

Use case: Comparing scores across queries with different lengths

Example:

// Query A: 10 tokens, score = 25.0 → normalized = 2.5
// Query B: 20 tokens, score = 40.0 → normalized = 2.0
// Query B ranks higher when normalized

loadDocuments(embeddingsFlat, docTokenCounts, embeddingDim)

Load documents for repeated searching (preloading API).

Parameters:

  • embeddingsFlat: Float32Array - all embeddings concatenated
  • docTokenCounts: Uint32Array - token count per document
  • embeddingDim: number - embedding dimension

Returns: Promise<void>

Use case: Search the same documents with 10+ different queries

Example:

const dim = 256;
const docs = [
  new Float32Array(10 * dim),  // Doc 1: 10 tokens
  new Float32Array(20 * dim),  // Doc 2: 20 tokens
];

const allEmbeddings = new Float32Array(30 * dim);
allEmbeddings.set(docs[0], 0);
allEmbeddings.set(docs[1], 10 * dim);

const docTokenCounts = new Uint32Array([10, 20]);

await maxsim.loadDocuments(allEmbeddings, docTokenCounts, dim);

search_preloaded(queryFlat, queryTokens)

Search preloaded documents (fast!).

Parameters:

  • queryFlat: Float32Array - query embeddings (flattened)
  • queryTokens: number - number of query tokens

Returns: Float32Array of scores

Use case: After calling loadDocuments(), search repeatedly with zero overhead

Example:

const queryFlat = new Float32Array(5 * 256);  // 5 tokens × 256 dims
// ... fill queryFlat with embeddings ...

const scores = maxsim.wasmInstance.search_preloaded(queryFlat, 5);

Utility Methods

numDocumentsLoaded()

Get number of preloaded documents.

Returns: number

Example:

console.log(`Loaded ${maxsim.numDocumentsLoaded()} documents`);

getInfo()

Get implementation details (SIMD support, version, etc.).

Returns: string

Example:

console.log(maxsim.getInfo());
// "MaxSim WASM v0.6.0 (SIMD: true, adaptive_batching: true, ...)"

Requirements

⚠️ Important: maxsim-web requires L2-normalized embeddings

Modern embedding models (ColBERT, BGE, E5, Jina, etc.) output normalized embeddings by default.

Why this matters:

For L2-normalized embeddings (unit vectors):

dot_product(a, b) === cosine_similarity(a, b)

This allows maxsim-web to use efficient dot product operations (3-5x faster than cosine similarity).

Verify your embeddings are normalized:

function isNormalized(embedding) {
  const magnitude = Math.sqrt(
    embedding.reduce((sum, val) => sum + val * val, 0)
  );
  return Math.abs(magnitude - 1.0) < 0.01;  // Within 1%
}

console.log(isNormalized(embedding));  // Should be true

Browser Compatibility

Browser WASM+SIMD WASM JavaScript
Chrome 91+ 3-5x faster ✅ 2-3x faster ✅ Baseline
Edge 91+ 3-5x faster ✅ 2-3x faster ✅ Baseline
Firefox 89+ 3-5x faster ✅ 2-3x faster ✅ Baseline
Safari 16.4+ 3-5x faster ✅ 2-3x faster ✅ Baseline
Node.js 16+ 3-5x faster ✅ 2-3x faster ✅ Baseline

maxsim-web automatically detects and uses the best available implementation.


Use Cases

1. Dense Retrieval (ColBERT-style)

import { createMaxSim } from 'maxsim-web';

const maxsim = await createMaxSim();

// Embed query and documents
const queryEmb = await embedModel.encode(query);
const docEmbs = await Promise.all(docs.map(doc => embedModel.encode(doc)));

// Rank by MaxSim similarity
const scores = maxsim.maxsimBatch(queryEmb, docEmbs);
const topK = scores
  .map((score, idx) => ({ score, idx }))
  .sort((a, b) => b.score - a.score)
  .slice(0, 10);

console.log('Top 10 documents:', topK);

2. Re-ranking Search Results

import { createMaxSim } from 'maxsim-web';

const maxsim = await createMaxSim();

// Load candidate documents from first-stage retrieval
const candidates = await firstStageSearch(query, { topK: 100 });
const candidateEmbs = candidates.map(doc => doc.embedding);

// Flatten and load
await maxsim.loadDocuments(flattenEmbeddings(candidateEmbs), docTokens, dim);

// Re-rank for each user query variation
const queries = [originalQuery, expandedQuery1, expandedQuery2];
const allScores = queries.map(q =>
  maxsim.wasmInstance.search_preloaded(flattenQuery(q), q.length)
);

// Combine scores (e.g., max, average, weighted)
const finalScores = combineScores(allScores);

3. Semantic Search at Scale

import express from 'express';
import { createMaxSim } from 'maxsim-web';

const app = express();
const maxsim = await createMaxSim();

// Preload document collection (once at startup)
const docs = await loadDocuments();  // e.g., 100K documents
await maxsim.loadDocuments(docs.embeddings, docs.tokenCounts, 256);

// Handle search requests (fast!)
app.get('/search', async (req, res) => {
  const queryEmb = await embedModel.encode(req.query.q);
  const scores = maxsim.wasmInstance.search_preloaded(
    flattenQuery(queryEmb),
    queryEmb.length
  );

  const topK = getTopK(scores, 10);
  res.json({ results: topK.map(idx => docs[idx]) });
});

app.listen(3000);  // ~265ms per search for 1000 docs!

4. Batch Processing Pipeline

import { createMaxSim } from 'maxsim-web';

async function processQueries(queries, documents) {
  const maxsim = await createMaxSim();

  // Load documents once
  await maxsim.loadDocuments(documents.embeddings, documents.tokenCounts, 256);

  // Process all queries (fast!)
  const results = queries.map(query => {
    const scores = maxsim.wasmInstance.search_preloaded(
      query.embedding,
      query.tokens
    );
    return getTopK(scores, 10);
  });

  return results;
}

// Process 1000 queries in ~4.4 minutes (vs 12.4 minutes with v0.5.0)
const results = await processQueries(queries, documents);

Performance Tips

  1. Use preloading for 10+ searches on the same document set

    • Break-even after 2 searches
    • Maximum benefit at 100+ searches
  2. Pre-flatten embeddings to Float32Array

    • Avoid 2D array conversion overhead
    • Direct WASM memory access
  3. Use WASM+SIMD browsers for best performance

    • Chrome/Edge 91+, Firefox 89+, Safari 16.4+
    • 3-5x faster than JavaScript fallback
  4. Batch documents together

    • Process multiple documents per call
    • Amortize function call overhead
  5. Profile your specific use case

    • Use browser DevTools Performance tab
    • Measure actual query/document sizes

Examples

See examples/ directory for complete working examples:

  • basic-usage.js - Simple MaxSim scoring
  • preloading-api.js - Preloading for repeated searches
  • colbert-integration.js - Integration with ColBERT models
  • batch-processing.js - Large-scale batch processing
  • nodejs-server.js - Express server with MaxSim

Benchmarks

Run benchmarks yourself:

git clone https://github.com/joe32140/maxsim-web
cd maxsim-web
npm install
npm run benchmark
# Open http://localhost:8080/benchmark/

Or view online: MaxSim Web Benchmarks


Development

Build from Source

# Install Rust and wasm-pack
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
cargo install wasm-pack

# Clone repository
git clone https://github.com/joe32140/maxsim-web
cd maxsim-web

# Install dependencies
npm install

# Build WASM
cd src/rust
RUSTFLAGS="-C target-feature=+simd128" wasm-pack build --target web --out-dir ../../dist/wasm

# Run benchmarks
cd ../..
npm run benchmark

Project Structure

maxsim-web/
├── src/
│   ├── rust/           # Rust WASM implementation
│   │   ├── src/lib.rs  # Core MaxSim algorithm
│   │   └── Cargo.toml
│   └── js/             # JavaScript wrappers
│       ├── maxsim-wasm.js
│       ├── maxsim-baseline.js
│       └── maxsim-optimized.js
├── dist/               # Built artifacts
│   └── wasm/           # WASM binaries
├── docs/               # Documentation
├── benchmark/          # Browser benchmarks
└── examples/           # Usage examples

Contributing

Contributions welcome! Please:

  1. Add tests for new features
  2. Run benchmarks to verify performance
  3. Update documentation
  4. Follow code style (rustfmt, prettier)

See CONTRIBUTING.md for details.


License

MIT - see LICENSE file


Citation

If you use maxsim-web in your research, please cite:

@software{maxsim_web,
  title = {maxsim-web: High-performance MaxSim scoring with WebAssembly},
  author = {Hsu, Joe},
  year = {2025},
  email = {joe32140@gmail.com},
  url = {https://github.com/joe32140/maxsim-web},
  version = {0.6.0}
}


Acknowledgments

  • Inspired by ColBERT's MaxSim scoring algorithm (Khattab & Zaharia, 2020)
  • Built with Rust and wasm-bindgen
  • SIMD optimizations based on modern browser capabilities
  • Performance testing inspired by fast-plaid benchmarks

Support


Made with ⚡ by Joe Hsu (2025)