JSPM

@ekaone/hamming

0.1.1
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 9
  • Score
    100M100P100Q76540F
  • License MIT

Hamming distance utilities and LSH binary projection for TypeScript

Package Exports

  • @ekaone/hamming

Readme

@ekaone/hamming

Hamming distance utilities and LSH binary projection for TypeScript. Zero dependencies. Works in Node.js, Bun, Deno, and Edge runtimes.

Key Differences vs pgvector

This is similar to pgvector but with some key differences:

Similarities:

  • Both enable similarity search for AI/ML applications
  • Both work with vector embeddings (like OpenAI's 1536-dimensional vectors)
  • Both support approximate nearest neighbor search for performance
Feature @ekaone/hamming pgvector
Storage In-memory binary codes (8 bytes) PostgreSQL database
Distance Metric Hamming distance (binary) Cosine/Euclidean/L2 (float)
Performance XOR + popcount (extremely fast) Float vector math
Memory 750x smaller (64-bit vs 1536-float) Full float vectors
Persistence Application-level Database-level
Scaling Limited by RAM Scales with database

Designed use cases like:

  • Semantic caching for AI applications
  • Near-duplicate detection
  • Embedding similarity search (OpenAI, etc.)
  • Fast approximate nearest neighbor search

Installation

npm install @ekaone/hamming
pnpm install @ekaone/hamming
yarn install @ekaone/hamming

API

hammingString(a, b)

Hamming distance between two equal-length strings. Counts positions where characters differ.

import { hammingString } from "@ekaone/hamming";

hammingString("karolin", "kathrin"); // → 3
hammingString("1011101", "1001001"); // → 2
hammingString("abc",     "abc");     // → 0

Throws RangeError if strings have different lengths.


hammingStringNorm(a, b)

Normalized distance between two equal-length strings. Returns 0.0 (identical) to 1.0 (completely different).

import { hammingStringNorm } from "@ekaone/hamming";

hammingStringNorm("karolin", "kathrin"); // → 0.428...
hammingStringNorm("abc",     "abc");     // → 0
hammingStringNorm("abc",     "xyz");     // → 1

hammingBits(a, b)

Hamming distance between two 32-bit integers via XOR + popcount.

import { hammingBits } from "@ekaone/hamming";

hammingBits(0b1011101, 0b1001001); // → 2
hammingBits(0x00000000, 0xffffffff); // → 32

hammingBigInt(a, b)

Hamming distance between two BigInt values. Useful for 64-bit or wider binary codes.

import { hammingBigInt } from "@ekaone/hamming";

hammingBigInt(0b1011101n, 0b1001001n);           // → 2
hammingBigInt(0xffffffffffffffffn, 0x0n);         // → 64

hammingBuffer(a, b)

Hamming distance between two Uint8Array buffers of equal length. Counts differing bits across all bytes.

import { hammingBuffer } from "@ekaone/hamming";

const a = new Uint8Array([0b11111111, 0b00000000]);
const b = new Uint8Array([0b00000000, 0b11111111]);

hammingBuffer(a, b); // → 16

Throws RangeError if buffers have different lengths.


hammingBufferNorm(a, b)

Normalized Hamming distance between two Uint8Array buffers. Returns 0.0 to 1.0.

import { hammingBufferNorm } from "@ekaone/hamming";

const a = new Uint8Array([0xff]);
const b = new Uint8Array([0x00]);

hammingBufferNorm(a, b); // → 1.0

generatePlanes(dims, inputDims)

Generate a random projection matrix for LSH (locality-sensitive hashing). Call this once and reuse the result — the same planes must be used for all toBinaryCode calls within a single context.

import { generatePlanes } from "@ekaone/hamming";

// 64 output bits, 1536-dimensional input (OpenAI embeddings)
const planes = generatePlanes(64, 1536);
Parameter Description
dims Number of output bits (e.g. 64)
inputDims Dimensionality of the input float vector (e.g. 1536)

toBinaryCode(vector, planes)

Project a float embedding vector onto random hyperplanes, producing a compact binary code (Uint8Array). Each bit is the sign of the dot product with one plane.

import { generatePlanes, toBinaryCode } from "@ekaone/hamming";

const planes = generatePlanes(64, 1536);
const embedding = await getEmbedding("How do I reset my password?");

const code = toBinaryCode(embedding, planes);
// → Uint8Array(8) — 64 bits packed into 8 bytes

The binary code approximates cosine similarity: vectors that are close in embedding space will have similar binary codes and a low Hamming distance between them.


binaryDistance(a, b)

Hamming distance between two binary codes produced by toBinaryCode. Lower = more similar.

import { generatePlanes, toBinaryCode, binaryDistance } from "@ekaone/hamming";

const planes = generatePlanes(64, 1536);

const codeA = toBinaryCode(embeddingA, planes);
const codeB = toBinaryCode(embeddingB, planes);

binaryDistance(codeA, codeB); // → 0 (identical) to 64 (opposite)

Real-world use cases

Typo detection in CLI tools

import { hammingStringNorm } from "@ekaone/hamming";

const commands = ["init", "build", "publish", "release"];
const input = "publich"; // user typo

const suggestions = commands
  .filter(cmd => cmd.length === input.length)
  .map(cmd => ({ cmd, score: hammingStringNorm(cmd, input) }))
  .filter(({ score }) => score >= 0.7)
  .sort((a, b) => b.score - a.score);

console.log(`Did you mean: ${suggestions[0].cmd}?`);
// → Did you mean: publish?

Near-duplicate detection

import { hammingStringNorm } from "@ekaone/hamming";

const flags = ["enable_cache", "enable_cach", "disable_cache", "feature_x"];

const deduped = flags.reduce<string[]>((acc, flag) => {
  const isDup = acc.some(existing =>
    existing.length === flag.length &&
    hammingStringNorm(existing, flag) > 0.85
  );
  return isDup ? acc : [...acc, flag];
}, []);

// → ["enable_cache", "disable_cache", "feature_x"]

Semantic cache for AI apps

import { generatePlanes, toBinaryCode, binaryDistance } from "@ekaone/hamming";

const planes = generatePlanes(64, 1536);
const cache = new Map<string, string>();

async function cachedAnswer(embedding: number[], prompt: string) {
  const code = toBinaryCode(embedding, planes);
  const key = code.join(",");

  for (const [cachedKey, response] of cache) {
    const cachedCode = new Uint8Array(cachedKey.split(",").map(Number));
    if (binaryDistance(code, cachedCode) <= 4) {
      return response; // cache hit
    }
  }

  const response = await callLLM(prompt);
  cache.set(key, response);
  return response;
}

For a production-ready semantic cache with LRU/LFU eviction and TTL, see @ekaone/semantic-cache.


How LSH works

toBinaryCode uses random hyperplane projection (a form of locality-sensitive hashing). For each random plane, it checks which side of the plane the input vector falls on — producing a 1 or 0. With 64 planes you get a 64-bit fingerprint.

The key property: vectors that are close in the original high-dimensional space (high cosine similarity) will tend to land on the same side of most planes, resulting in similar binary codes with a low Hamming distance. This lets you approximate expensive float vector comparisons with a fast XOR + popcount.

"forgot my password"   → [0.21, -0.84, ...]  → 0b10110010...
"reset my password"    → [0.22, -0.83, ...]  → 0b10110110...
                                                 Hamming distance = 2 ✓

"cancel subscription"  → [-0.60, 0.41, ...]  → 0b01001101...
                                                 Hamming distance = 29 ✗

License

MIT © Eka Prasetia

⭐ If this library helps you, please consider giving it a star on GitHub!