Package Exports
- namespace-guard
- namespace-guard/adapters/drizzle
- namespace-guard/adapters/knex
- namespace-guard/adapters/kysely
- namespace-guard/adapters/mikro-orm
- namespace-guard/adapters/mongoose
- namespace-guard/adapters/prisma
- namespace-guard/adapters/raw
- namespace-guard/adapters/sequelize
- namespace-guard/adapters/typeorm
- namespace-guard/composability-vectors
- namespace-guard/confusable-weights
- namespace-guard/font-specific-weights
- namespace-guard/profanity-en
Readme
namespace-guard
The world's first library that detects confusable characters across non-Latin scripts. Slug claimability, Unicode anti-spoofing, and LLM Denial of Spend defence in one zero-dependency package.
- Live demo: https://paultendo.github.io/namespace-guard/
- Blog post: https://paultendo.github.io/posts/namespace-guard-launch/
Cross-script confusable detection
Existing confusable standards (TR39, IDNA) map non-Latin characters to Latin equivalents. They have zero coverage for confusable pairs between two non-Latin scripts.
namespace-guard ships 3,525 cross-script pairs from confusable-vision (measured across 245 system fonts using vector-outline raycasting — RaySpace). This catches attacks that no other library detects:
import { areConfusable, detectCrossScriptRisk } from "namespace-guard";
import { CONFUSABLE_WEIGHTS } from "namespace-guard/confusable-weights";
// Hangul ᅵ and Han 丨 are visually identical (ray distance 0.004, Arial Unicode MS)
areConfusable("\u1175", "\u4E28", { weights: CONFUSABLE_WEIGHTS }); // true
// Greek Τ and Han 丅 are near-identical (multiple fonts)
areConfusable("\u03A4", "\u4E05", { weights: CONFUSABLE_WEIGHTS }); // true
// Cyrillic І and Greek Ι are identical outlines (62 fonts)
areConfusable("\u0406", "\u0399", { weights: CONFUSABLE_WEIGHTS }); // true
// Without weights, only skeleton-based detection (TR39 coverage)
areConfusable("\u1175", "\u4E28"); // false
// Analyze an identifier for cross-script risk
const risk = detectCrossScriptRisk("\u1175\u4E28", { weights: CONFUSABLE_WEIGHTS });
// { riskLevel: "high", scripts: ["han", "hangul"], crossScriptPairs: [...] }4,174 total confusable pairs scored by visual measurement (3,111 TR39-confirmed, 1,063 novel). Each pair carries a danger score (0–1) representing geometric similarity across fonts; the shipped dataset uses a 0.5 floor. For higher precision, filter at danger > 0.7 (574 pairs). Cross-script data licensed CC-BY-4.0.
Installation
npm install namespace-guardQuick Start (60 seconds)
import { createNamespaceGuardWithProfile } from "namespace-guard";
import { createPrismaAdapter } from "namespace-guard/adapters/prisma";
import { PrismaClient } from "@prisma/client";
const prisma = new PrismaClient();
const guard = createNamespaceGuardWithProfile(
"consumer-handle",
{
reserved: ["admin", "api", "settings", "dashboard", "login", "signup"],
sources: [
{ name: "user", column: "handleCanonical", scopeKey: "id" },
{ name: "organization", column: "slugCanonical", scopeKey: "id" },
],
},
createPrismaAdapter(prisma)
);
await guard.assertClaimable("acme-corp");For race-safe writes, use claim():
const result = await guard.claim(input.handle, async (canonical) => {
return prisma.user.create({
data: {
handle: input.handle,
handleCanonical: canonical,
},
});
});
if (!result.claimed) {
return { error: result.message };
}What You Get
- Cross-script confusable detection with 3,525 measured pairs between non-Latin scripts
- Cross-table collision checks (users, orgs, teams, etc.)
- Reserved-name blocking with category-aware messages
- Unicode anti-spoofing (NFKC + confusable detection + mixed-script/risk controls)
- Invisible character detection (zero-width joiners, direction overrides, and other hidden bytes)
- Optional profanity/evasion validation
- Suggestion strategies for taken names
- CLI for red-team generation, calibration, drift, and CI gates
LLM Pipeline Preprocessing
Confusable characters are pixel-identical to Latin letters but encode as multi-byte BPE tokens. A 95-line contract that costs 881 tokens in clean ASCII costs 4,567 tokens when flooded with confusables: 5.2x the API bill. The model reads it correctly. The invoice does not care.
We tested this across 4 frontier models, 8 attack types, and 130+ API calls. Zero meaning flips. Every substituted clause was correctly interpreted. But the billing attack succeeds. We call it Denial of Spend: the confusable analogue of DDoS, where the attacker cannot degrade the service but can inflate the cost of running it.
canonicalise() recovered every substituted term across all 12 attack variants, collapsing the 5.2x inflation to 1.0x. Processing a 10,000-character document takes under 1ms.
import { canonicalise, scan, isClean } from "namespace-guard";
const raw = "The seller аssumes аll liаbility.";
const report = scan(raw); // detailed findings + risk level
const clean = canonicalise(raw); // "The seller assumes all liability."
const ok = isClean(raw); // false (mixed-script confusable detected)
// For known-Latin documents (e.g. English contracts), use strategy: "all"
// to also catch words where every character was substituted:
canonicalise("поп-refundable", { strategy: "all" }); // "non-refundable"Research:
- Denial of Spend: https://paultendo.github.io/posts/confusable-vision-llm-attack-tests/
- Launch: https://paultendo.github.io/posts/namespace-guard-launch/
- NFKC/TR39 composability: https://paultendo.github.io/posts/unicode-confusables-nfkc-conflict/
Advanced Security Primitives
Low-level helpers for custom scoring, pairwise checks, and cross-script risk analysis:
import { skeleton, areConfusable, confusableDistance } from "namespace-guard";
skeleton("pa\u0443pal"); // "paypal" skeleton form
areConfusable("paypal", "pa\u0443pal"); // true
confusableDistance("paypal", "pa\u0443pal"); // graded similarity + chainDepth + explainable stepsFor measured visual scoring, pass the optional weights from confusable-vision (4,174 pairs scored across 245 fonts using vector-outline raycasting, including 3,525 cross-script pairs). Each pair has a danger score (0–1); the default 0.5 floor favours recall, use danger > 0.7 for precision. The context filter restricts to identifier-valid, domain-valid, or all pairs.
import { confusableDistance } from "namespace-guard";
import { CONFUSABLE_WEIGHTS } from "namespace-guard/confusable-weights";
const result = confusableDistance("paypal", "pa\u0443pal", {
weights: CONFUSABLE_WEIGHTS,
context: "identifier",
});
// result.similarity, result.steps (including "visual-weight" reason for novel pairs)Realistic Domain Spoof Detection
For domain name validation, isDomainSpoof() only flags threats that could produce registrable domain names. ICANN registrars enforce single-script labels, so mixed-script spoofs (e.g., one Cyrillic letter in a Latin domain) are excluded — they can't actually be registered.
import { isDomainSpoof } from "namespace-guard";
import { CONFUSABLE_WEIGHTS } from "namespace-guard/confusable-weights";
// Full-Cyrillic lookalike — registrable and deceptive
isDomainSpoof("\u0440\u0430\u0443\u0440\u0430\u04CF", "paypal", { weights: CONFUSABLE_WEIGHTS });
// { spoof: true, script: "cyrillic", danger: 0.91, substitutions: [...] }
// Mixed-script — not registrable, not flagged
isDomainSpoof("\u0440aypal", "paypal", { weights: CONFUSABLE_WEIGHTS });
// { spoof: false }
// Known-legitimate non-Latin domain — skip via allowlist
isDomainSpoof("\u0430\u0441\u0435", "ace", {
weights: CONFUSABLE_WEIGHTS,
allowlist: ["\u0430\u0441\u0435"],
});
// { spoof: false }The danger score (0–1) is always returned when a script match is found, even if below the minDanger threshold (default 0.5). Set minDanger: 0.7 for higher precision.
Research
Two research tracks feed the library:
Visual measurement. 4,174 confusable pairs measured across 245 system fonts using vector-outline raycasting (RaySpace). 3,525 of these are cross-script pairs between non-Latin scripts (Hangul/Han, Cyrillic/Greek, Cyrillic/Arabic, and more) with zero coverage in any existing standard. Each pair carries a danger score (0–1) representing geometric similarity; the shipped floor is 0.5 (for higher precision, try 0.7). Full dataset published as confusable-vision (CC-BY-4.0).
Normalisation composability. 31 characters where Unicode's confusables.txt and NFKC normalisation disagree. Two production maps (CONFUSABLE_MAP for NFKC-first, CONFUSABLE_MAP_FULL for raw-input pipelines), a benchmark corpus, and composability vectors wired into CLI drift baselines. Submitted to Unicode public review (PRI #540) and published in accumulated feedback.
- Technical reference: docs/reference.md#how-the-anti-spoofing-pipeline-works
- Launch write-up: https://paultendo.github.io/posts/namespace-guard-launch/
- Denial of Spend: https://paultendo.github.io/posts/confusable-vision-llm-attack-tests/
Built-in Profiles
Use createNamespaceGuardWithProfile(profile, overrides, adapter):
consumer-handle: strict defaults for public handlesorg-slug: workspace/org slugsdeveloper-id: technical IDs with looser numeric rules
Profiles are defaults, not lock-in. Override only what you need.
Zero-Dependency Moderation Integration
Core stays zero-dependency. You can use built-ins or plug in any external library.
import {
createNamespaceGuard,
createPredicateValidator,
} from "namespace-guard";
import { createEnglishProfanityValidator } from "namespace-guard/profanity-en";
const guard = createNamespaceGuard(
{
sources: [
{ name: "user", column: "handleCanonical", scopeKey: "id" },
{ name: "organization", column: "slugCanonical", scopeKey: "id" },
],
validators: [
createEnglishProfanityValidator({ mode: "evasion" }),
createPredicateValidator((identifier) => thirdPartyFilter.has(identifier)),
],
},
adapter
);CLI Workflow
# 1) Generate realistic attack variants
npx namespace-guard attack-gen paypal --json
# 2) Calibrate thresholds and CI gate suggestions from your dataset
npx namespace-guard recommend ./risk-dataset.json
# 3) Preflight canonical collisions before adding DB unique constraints
npx namespace-guard audit-canonical ./users-export.json --json
# 4) Compare TR39-full vs NFKC-filtered behaviour
npx namespace-guard drift --jsonAdapter Support
- Prisma
- Drizzle
- Kysely
- Knex
- TypeORM
- MikroORM
- Sequelize
- Mongoose
- Raw SQL
Adapter setup examples and migration guidance: docs/reference.md#adapters
Production Recommendation: Canonical Uniqueness
For full protection against Unicode/canonicalization edge cases, enforce uniqueness on canonical columns (for example handleCanonical, slugCanonical) and point sources[*].column there.
Migration guides per adapter: docs/reference.md#canonical-uniqueness-migration-per-adapter
Documentation Map
- Full reference: docs/reference.md
- Config reference: docs/reference.md#configuration
- Validators (profanity, homoglyph, invisible): docs/reference.md#async-validators
- Canonical preflight audit (
audit-canonical): docs/reference.md#audit-canonical-command - Anti-spoofing pipeline and composability vectors: docs/reference.md#how-the-anti-spoofing-pipeline-works
- LLM preprocessing (
canonicalise,scan,isClean): docs/reference.md#llm-pipeline-preprocessing - Benchmark corpus (
confusable-bench.v1): docs/reference.md#confusable-benchmark-corpus-artifact - Advanced primitives (
skeleton,areConfusable,confusableDistance): docs/reference.md#advanced-security-primitives - Confusable weights (scored pairs, including cross-script): docs/reference.md#confusable-weights-subpath
- Cross-script detection: docs/reference.md#cross-script-detection
- CLI reference: docs/reference.md#cli
- API reference: docs/reference.md#api-reference
- Framework integration (Next.js/Express/tRPC): docs/reference.md#framework-integration
Support
If namespace-guard helped you, please star the repo. It helps the project a lot.
- GitHub Sponsors: https://github.com/sponsors/paultendo
- Buy me a coffee: https://buymeacoffee.com/paultendo
Contributing
Contributions welcome. Please open an issue first to discuss larger changes.