Package Exports
- @ekaone/hamming
Readme
@ekaone/hamming
Hamming distance utilities and LSH binary projection for TypeScript. Zero dependencies. Works in Node.js, Bun, Deno, and Edge runtimes.
Key Differences vs pgvector
This is similar to pgvector but with some key differences:
Similarities:
- Both enable similarity search for AI/ML applications
- Both work with vector embeddings (like OpenAI's 1536-dimensional vectors)
- Both support approximate nearest neighbor search for performance
| Feature | @ekaone/hamming | pgvector |
|---|---|---|
| Storage | In-memory binary codes (8 bytes) | PostgreSQL database |
| Distance Metric | Hamming distance (binary) | Cosine/Euclidean/L2 (float) |
| Performance | XOR + popcount (extremely fast) | Float vector math |
| Memory | 750x smaller (64-bit vs 1536-float) | Full float vectors |
| Persistence | Application-level | Database-level |
| Scaling | Limited by RAM | Scales with database |
Designed use cases like:
- Semantic caching for AI applications
- Near-duplicate detection
- Embedding similarity search (OpenAI, etc.)
- Fast approximate nearest neighbor search
Installation
npm install @ekaone/hammingpnpm install @ekaone/hammingyarn install @ekaone/hammingAPI
hammingString(a, b)
Hamming distance between two equal-length strings. Counts positions where characters differ.
import { hammingString } from "@ekaone/hamming";
hammingString("karolin", "kathrin"); // → 3
hammingString("1011101", "1001001"); // → 2
hammingString("abc", "abc"); // → 0Throws RangeError if strings have different lengths.
hammingStringNorm(a, b)
Normalized distance between two equal-length strings. Returns 0.0 (identical) to 1.0 (completely different).
import { hammingStringNorm } from "@ekaone/hamming";
hammingStringNorm("karolin", "kathrin"); // → 0.428...
hammingStringNorm("abc", "abc"); // → 0
hammingStringNorm("abc", "xyz"); // → 1hammingBits(a, b)
Hamming distance between two 32-bit integers via XOR + popcount.
import { hammingBits } from "@ekaone/hamming";
hammingBits(0b1011101, 0b1001001); // → 2
hammingBits(0x00000000, 0xffffffff); // → 32hammingBigInt(a, b)
Hamming distance between two BigInt values. Useful for 64-bit or wider binary codes.
import { hammingBigInt } from "@ekaone/hamming";
hammingBigInt(0b1011101n, 0b1001001n); // → 2
hammingBigInt(0xffffffffffffffffn, 0x0n); // → 64hammingBuffer(a, b)
Hamming distance between two Uint8Array buffers of equal length. Counts differing bits across all bytes.
import { hammingBuffer } from "@ekaone/hamming";
const a = new Uint8Array([0b11111111, 0b00000000]);
const b = new Uint8Array([0b00000000, 0b11111111]);
hammingBuffer(a, b); // → 16Throws RangeError if buffers have different lengths.
hammingBufferNorm(a, b)
Normalized Hamming distance between two Uint8Array buffers. Returns 0.0 to 1.0.
import { hammingBufferNorm } from "@ekaone/hamming";
const a = new Uint8Array([0xff]);
const b = new Uint8Array([0x00]);
hammingBufferNorm(a, b); // → 1.0generatePlanes(dims, inputDims)
Generate a random projection matrix for LSH (locality-sensitive hashing). Call this once and reuse the result — the same planes must be used for all toBinaryCode calls within a single context.
import { generatePlanes } from "@ekaone/hamming";
// 64 output bits, 1536-dimensional input (OpenAI embeddings)
const planes = generatePlanes(64, 1536);| Parameter | Description |
|---|---|
dims |
Number of output bits (e.g. 64) |
inputDims |
Dimensionality of the input float vector (e.g. 1536) |
toBinaryCode(vector, planes)
Project a float embedding vector onto random hyperplanes, producing a compact binary code (Uint8Array). Each bit is the sign of the dot product with one plane.
import { generatePlanes, toBinaryCode } from "@ekaone/hamming";
const planes = generatePlanes(64, 1536);
const embedding = await getEmbedding("How do I reset my password?");
const code = toBinaryCode(embedding, planes);
// → Uint8Array(8) — 64 bits packed into 8 bytesThe binary code approximates cosine similarity: vectors that are close in embedding space will have similar binary codes and a low Hamming distance between them.
binaryDistance(a, b)
Hamming distance between two binary codes produced by toBinaryCode. Lower = more similar.
import { generatePlanes, toBinaryCode, binaryDistance } from "@ekaone/hamming";
const planes = generatePlanes(64, 1536);
const codeA = toBinaryCode(embeddingA, planes);
const codeB = toBinaryCode(embeddingB, planes);
binaryDistance(codeA, codeB); // → 0 (identical) to 64 (opposite)Real-world use cases
Typo detection in CLI tools
import { hammingStringNorm } from "@ekaone/hamming";
const commands = ["init", "build", "publish", "release"];
const input = "publich"; // user typo
const suggestions = commands
.filter(cmd => cmd.length === input.length)
.map(cmd => ({ cmd, score: hammingStringNorm(cmd, input) }))
.filter(({ score }) => score >= 0.7)
.sort((a, b) => b.score - a.score);
console.log(`Did you mean: ${suggestions[0].cmd}?`);
// → Did you mean: publish?Near-duplicate detection
import { hammingStringNorm } from "@ekaone/hamming";
const flags = ["enable_cache", "enable_cach", "disable_cache", "feature_x"];
const deduped = flags.reduce<string[]>((acc, flag) => {
const isDup = acc.some(existing =>
existing.length === flag.length &&
hammingStringNorm(existing, flag) > 0.85
);
return isDup ? acc : [...acc, flag];
}, []);
// → ["enable_cache", "disable_cache", "feature_x"]Semantic cache for AI apps
import { generatePlanes, toBinaryCode, binaryDistance } from "@ekaone/hamming";
const planes = generatePlanes(64, 1536);
const cache = new Map<string, string>();
async function cachedAnswer(embedding: number[], prompt: string) {
const code = toBinaryCode(embedding, planes);
const key = code.join(",");
for (const [cachedKey, response] of cache) {
const cachedCode = new Uint8Array(cachedKey.split(",").map(Number));
if (binaryDistance(code, cachedCode) <= 4) {
return response; // cache hit
}
}
const response = await callLLM(prompt);
cache.set(key, response);
return response;
}For a production-ready semantic cache with LRU/LFU eviction and TTL, see @ekaone/semantic-cache.
How LSH works
toBinaryCode uses random hyperplane projection (a form of locality-sensitive hashing). For each random plane, it checks which side of the plane the input vector falls on — producing a 1 or 0. With 64 planes you get a 64-bit fingerprint.
The key property: vectors that are close in the original high-dimensional space (high cosine similarity) will tend to land on the same side of most planes, resulting in similar binary codes with a low Hamming distance. This lets you approximate expensive float vector comparisons with a fast XOR + popcount.
"forgot my password" → [0.21, -0.84, ...] → 0b10110010...
"reset my password" → [0.22, -0.83, ...] → 0b10110110...
Hamming distance = 2 ✓
"cancel subscription" → [-0.60, 0.41, ...] → 0b01001101...
Hamming distance = 29 ✗License
MIT © Eka Prasetia
Links
⭐ If this library helps you, please consider giving it a star on GitHub!