Package Exports

maxsim-web
maxsim-web/baseline
maxsim-web/optimized
maxsim-web/wasm

Readme

maxsim-web

⚡ High-performance MaxSim scoring with WebAssembly and SIMD for ColBERT-style retrieval

🚀 Live Demo • 📊 Performance Report • 📚 API Guide

Features

🚀 3-5x faster than JavaScript - WASM+SIMD optimized implementation 📦 45KB gzipped - Tiny bundle size, 12% smaller than v0.5.0 ⚡ Preloading API - Load documents once, search thousands of times 🎯 Zero dependencies - Pure WASM implementation 🔧 Simple API - Works with any L2-normalized embeddings

Performance (v0.6.0)

Latest benchmarks:

Scenario	Documents	Query Tokens	Performance	vs JavaScript
Variable Large	1000 docs	32 tokens	265ms	3.55x faster 🔥
Variable Medium	1000 docs	13 tokens	134ms	3.34x faster 🔥

With preloading:

Load documents once: ~230ms (one-time cost)
Each search: ~265ms (vs 479ms non-preloaded)
1.81x faster per search + zero conversion overhead
Break-even after 2 searches

See Performance Report for detailed benchmarks and analysis.

What's New in v0.6.0

🚀 Major performance improvement: 2.8-5.2x faster!

Removed cosine similarity, use dot product exclusively (3-5x faster per operation)
Binary size: 51KB → 45KB (12% smaller)
WASM now 3-5x faster than JavaScript (was 1.0-1.2x)
No breaking changes - all APIs work the same

See Release Notes for details.

Installation

npm install maxsim-web

Quick Start

Basic Usage

import { createMaxSim } from 'maxsim-web';

// Initialize (auto-detects best: WASM+SIMD, WASM, or JS fallback)
const maxsim = await createMaxSim();

// Prepare embeddings (must be L2-normalized!)
const queryEmbedding = [[0.1, 0.2, ...], ...];  // [query_tokens, embedding_dim]
const docEmbeddings = [
  [[0.3, 0.4, ...], ...],  // Doc 1
  [[0.5, 0.6, ...], ...],  // Doc 2
];

// Compute MaxSim scores
const scores = maxsim.maxsimBatch(queryEmbedding, docEmbeddings);
console.log(scores);  // Float32Array of scores (one per document)

Preloading API (Recommended for Production)

Use case: Search the same document set repeatedly with different queries

import { createMaxSim } from 'maxsim-web';

const maxsim = await createMaxSim();

// Step 1: Prepare documents as flat arrays (one-time conversion)
const embeddingDim = 256;
const docTokenCounts = new Uint32Array([doc1.length, doc2.length, ...]);

// Flatten all document embeddings into single Float32Array
const allEmbeddings = new Float32Array(totalTokens * embeddingDim);
// ... copy embeddings into allEmbeddings ...

// Step 2: Load documents (one-time, ~230ms for 1000 docs)
await maxsim.loadDocuments(allEmbeddings, docTokenCounts, embeddingDim);

// Step 3: Search repeatedly (fast! ~265ms per search)
const queryFlat = new Float32Array(queryTokens * embeddingDim);
// ... copy query into queryFlat ...

const scores1 = maxsim.wasmInstance.search_preloaded(queryFlat, queryTokens);
const scores2 = maxsim.wasmInstance.search_preloaded(queryFlat2, queryTokens2);
// ... search 1000s of times with zero conversion overhead!

Performance benefit:

First search: 230ms (load) + 265ms (search) = 495ms
Subsequent searches: 265ms each (vs 479ms non-preloaded)
Recommended for 10+ searches on same document set

API Reference

Main Methods

`maxsimBatch(queryEmbedding, docEmbeddings)`

Compute MaxSim scores for multiple documents (raw sum).

Parameters:

queryEmbedding: Array of query token embeddings [query_tokens][embedding_dim]
docEmbeddings: Array of document embeddings [num_docs][doc_tokens][embedding_dim]

Returns: Float32Array of scores (one per document)

Use case: Ranking documents for a single query

Example:

const query = [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]];  // 2 tokens
const docs = [
  [[0.7, 0.8, 0.9]],           // Doc 1: 1 token
  [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]]  // Doc 2: 2 tokens
];
const scores = maxsim.maxsimBatch(query, docs);
// scores = Float32Array [score1, score2]

`maxsimBatch_normalized(queryEmbedding, docEmbeddings)`

Same as maxsimBatch but returns averaged scores (score / query_tokens).

Use case: Comparing scores across queries with different lengths

Example:

// Query A: 10 tokens, score = 25.0 → normalized = 2.5
// Query B: 20 tokens, score = 40.0 → normalized = 2.0
// Query B ranks higher when normalized

`loadDocuments(embeddingsFlat, docTokenCounts, embeddingDim)`

Load documents for repeated searching (preloading API).

Parameters:

embeddingsFlat: Float32Array - all embeddings concatenated
docTokenCounts: Uint32Array - token count per document
embeddingDim: number - embedding dimension

Returns: Promise<void>

Use case: Search the same documents with 10+ different queries

Example:

const dim = 256;
const docs = [
  new Float32Array(10 * dim),  // Doc 1: 10 tokens
  new Float32Array(20 * dim),  // Doc 2: 20 tokens
];

const allEmbeddings = new Float32Array(30 * dim);
allEmbeddings.set(docs[0], 0);
allEmbeddings.set(docs[1], 10 * dim);

const docTokenCounts = new Uint32Array([10, 20]);

await maxsim.loadDocuments(allEmbeddings, docTokenCounts, dim);

`search_preloaded(queryFlat, queryTokens)`

Search preloaded documents (fast!).

Parameters:

queryFlat: Float32Array - query embeddings (flattened)
queryTokens: number - number of query tokens

Returns: Float32Array of scores

Use case: After calling loadDocuments(), search repeatedly with zero overhead

Example:

const queryFlat = new Float32Array(5 * 256);  // 5 tokens × 256 dims
// ... fill queryFlat with embeddings ...

const scores = maxsim.wasmInstance.search_preloaded(queryFlat, 5);

Utility Methods

`numDocumentsLoaded()`

Get number of preloaded documents.

Returns: number

Example:

console.log(`Loaded ${maxsim.numDocumentsLoaded()} documents`);

`getInfo()`

Get implementation details (SIMD support, version, etc.).

Returns: string

Example:

console.log(maxsim.getInfo());
// "MaxSim WASM v0.6.0 (SIMD: true, adaptive_batching: true, ...)"

Requirements

⚠️ Important: maxsim-web requires L2-normalized embeddings

Modern embedding models (ColBERT, BGE, E5, Jina, etc.) output normalized embeddings by default.

Why this matters:

For L2-normalized embeddings (unit vectors):

dot_product(a, b) === cosine_similarity(a, b)

This allows maxsim-web to use efficient dot product operations (3-5x faster than cosine similarity).

Verify your embeddings are normalized:

function isNormalized(embedding) {
  const magnitude = Math.sqrt(
    embedding.reduce((sum, val) => sum + val * val, 0)
  );
  return Math.abs(magnitude - 1.0) < 0.01;  // Within 1%
}

console.log(isNormalized(embedding));  // Should be true

Browser Compatibility

Browser	WASM+SIMD	WASM	JavaScript
Chrome 91+	✅ 3-5x faster	✅ 2-3x faster	✅ Baseline
Edge 91+	✅ 3-5x faster	✅ 2-3x faster	✅ Baseline
Firefox 89+	✅ 3-5x faster	✅ 2-3x faster	✅ Baseline
Safari 16.4+	✅ 3-5x faster	✅ 2-3x faster	✅ Baseline
Node.js 16+	✅ 3-5x faster	✅ 2-3x faster	✅ Baseline

maxsim-web automatically detects and uses the best available implementation.

Use Cases

1. Dense Retrieval (ColBERT-style)

import { createMaxSim } from 'maxsim-web';

const maxsim = await createMaxSim();

// Embed query and documents
const queryEmb = await embedModel.encode(query);
const docEmbs = await Promise.all(docs.map(doc => embedModel.encode(doc)));

// Rank by MaxSim similarity
const scores = maxsim.maxsimBatch(queryEmb, docEmbs);
const topK = scores
  .map((score, idx) => ({ score, idx }))
  .sort((a, b) => b.score - a.score)
  .slice(0, 10);

console.log('Top 10 documents:', topK);

2. Re-ranking Search Results

import { createMaxSim } from 'maxsim-web';

const maxsim = await createMaxSim();

// Load candidate documents from first-stage retrieval
const candidates = await firstStageSearch(query, { topK: 100 });
const candidateEmbs = candidates.map(doc => doc.embedding);

// Flatten and load
await maxsim.loadDocuments(flattenEmbeddings(candidateEmbs), docTokens, dim);

// Re-rank for each user query variation
const queries = [originalQuery, expandedQuery1, expandedQuery2];
const allScores = queries.map(q =>
  maxsim.wasmInstance.search_preloaded(flattenQuery(q), q.length)
);

// Combine scores (e.g., max, average, weighted)
const finalScores = combineScores(allScores);

3. Semantic Search at Scale

import express from 'express';
import { createMaxSim } from 'maxsim-web';

const app = express();
const maxsim = await createMaxSim();

// Preload document collection (once at startup)
const docs = await loadDocuments();  // e.g., 100K documents
await maxsim.loadDocuments(docs.embeddings, docs.tokenCounts, 256);

// Handle search requests (fast!)
app.get('/search', async (req, res) => {
  const queryEmb = await embedModel.encode(req.query.q);
  const scores = maxsim.wasmInstance.search_preloaded(
    flattenQuery(queryEmb),
    queryEmb.length
  );

  const topK = getTopK(scores, 10);
  res.json({ results: topK.map(idx => docs[idx]) });
});

app.listen(3000);  // ~265ms per search for 1000 docs!

4. Batch Processing Pipeline

import { createMaxSim } from 'maxsim-web';

async function processQueries(queries, documents) {
  const maxsim = await createMaxSim();

  // Load documents once
  await maxsim.loadDocuments(documents.embeddings, documents.tokenCounts, 256);

  // Process all queries (fast!)
  const results = queries.map(query => {
    const scores = maxsim.wasmInstance.search_preloaded(
      query.embedding,
      query.tokens
    );
    return getTopK(scores, 10);
  });

  return results;
}

// Process 1000 queries in ~4.4 minutes (vs 12.4 minutes with v0.5.0)
const results = await processQueries(queries, documents);

Performance Tips

Use preloading for 10+ searches on the same document set
- Break-even after 2 searches
- Maximum benefit at 100+ searches
Pre-flatten embeddings to Float32Array
- Avoid 2D array conversion overhead
- Direct WASM memory access
Use WASM+SIMD browsers for best performance
- Chrome/Edge 91+, Firefox 89+, Safari 16.4+
- 3-5x faster than JavaScript fallback
Batch documents together
- Process multiple documents per call
- Amortize function call overhead
Profile your specific use case
- Use browser DevTools Performance tab
- Measure actual query/document sizes

Examples

See examples/ directory for complete working examples:

basic-usage.js - Simple MaxSim scoring
preloading-api.js - Preloading for repeated searches
colbert-integration.js - Integration with ColBERT models
batch-processing.js - Large-scale batch processing
nodejs-server.js - Express server with MaxSim

Benchmarks

Run benchmarks yourself:

git clone https://github.com/joe32140/maxsim-web
cd maxsim-web
npm install
npm run benchmark
# Open http://localhost:8080/benchmark/

Or view online: MaxSim Web Benchmarks

Development

Build from Source

# Install Rust and wasm-pack
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
cargo install wasm-pack

# Clone repository
git clone https://github.com/joe32140/maxsim-web
cd maxsim-web

# Install dependencies
npm install

# Build WASM
cd src/rust
RUSTFLAGS="-C target-feature=+simd128" wasm-pack build --target web --out-dir ../../dist/wasm

# Run benchmarks
cd ../..
npm run benchmark

Project Structure

maxsim-web/
├── src/
│   ├── rust/           # Rust WASM implementation
│   │   ├── src/lib.rs  # Core MaxSim algorithm
│   │   └── Cargo.toml
│   └── js/             # JavaScript wrappers
│       ├── maxsim-wasm.js
│       ├── maxsim-baseline.js
│       └── maxsim-optimized.js
├── dist/               # Built artifacts
│   └── wasm/           # WASM binaries
├── docs/               # Documentation
├── benchmark/          # Browser benchmarks
└── examples/           # Usage examples

Contributing

Contributions welcome! Please:

Add tests for new features
Run benchmarks to verify performance
Update documentation
Follow code style (rustfmt, prettier)

See CONTRIBUTING.md for details.

License

MIT - see LICENSE file

Citation

If you use maxsim-web in your research, please cite:

@software{maxsim_web,
  title = {maxsim-web: High-performance MaxSim scoring with WebAssembly},
  author = {Hsu, Joe},
  year = {2025},
  email = {joe32140@gmail.com},
  url = {https://github.com/joe32140/maxsim-web},
  version = {0.6.0}
}

fast-plaid - ColBERT search with IVF indexing
ColBERT - Original ColBERT implementation
pylate - Python ColBERT training framework
Vespa ColBERT - Production ColBERT at scale

Acknowledgments

Inspired by ColBERT's MaxSim scoring algorithm (Khattab & Zaharia, 2020)
Built with Rust and wasm-bindgen
SIMD optimizations based on modern browser capabilities
Performance testing inspired by fast-plaid benchmarks

Support

📧 Email: joe32140@gmail.com
🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions

Made with ⚡ by Joe Hsu (2025)

maxsim-web

Package Exports

Readme

maxsim-web

Features

Performance (v0.6.0)

What's New in v0.6.0

Installation

Quick Start

Basic Usage

Preloading API (Recommended for Production)

API Reference

Main Methods

maxsimBatch(queryEmbedding, docEmbeddings)

maxsimBatch_normalized(queryEmbedding, docEmbeddings)

loadDocuments(embeddingsFlat, docTokenCounts, embeddingDim)

search_preloaded(queryFlat, queryTokens)

Utility Methods

numDocumentsLoaded()

getInfo()

Requirements

Browser Compatibility

Use Cases

1. Dense Retrieval (ColBERT-style)

2. Re-ranking Search Results

3. Semantic Search at Scale

4. Batch Processing Pipeline

Performance Tips

Examples

Benchmarks

Development

Build from Source

Project Structure

Contributing

License

Citation

Related Projects

Acknowledgments

Support

`maxsimBatch(queryEmbedding, docEmbeddings)`

`maxsimBatch_normalized(queryEmbedding, docEmbeddings)`

`loadDocuments(embeddingsFlat, docTokenCounts, embeddingDim)`

`search_preloaded(queryFlat, queryTokens)`

`numDocumentsLoaded()`

`getInfo()`