JSPM

utf8-sanitize

1.0.2
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 1292
  • Score
    100M100P100Q99877F
  • License MIT

A performant zero-dependency utility to clean UTF-8 text, fix mojibake from latin1, verify string length, and sanitize input

Package Exports

  • utf8-sanitize
  • utf8-sanitize/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (utf8-sanitize) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

utf8-sanitize

NPM version

UTF-8 Sanitization and Repair Utility

A lightweight, performant utility with zero dependencies to sanitize and repair UTF-8 text including:

  • 🛠️ Repairing mojibake corruption in latin1 single-byte to multi-byte UTF-8 character conversion
  • 📏 Check if a string's length matches its expected or safe 32-bit length
  • 🚫 Cleans string by removing/escaping characters based on a specifiable sanitization mode
  • 🔄 Function that provides a full pipeline for repair and sanitization through FullSanitize

Install

npm install utf8-sanitize

Usage

Import:

const { FullSanitize } = require ('utf8-sanitize') # Import pipeline
const { FullSanitize, FixLatin1Corrupt, VerifyByteLength, SanitizeInput, MAX_SAFE_CHAR_LIMIT } = require ('utf8-sanitize') # Import all

Functions:

FullSanitize(); // => string
// Full pipeline for mojibake repair from latin1 to UTF-8 string encoding, verifies expected string length and sanitizes string

FixLatin1Corrupt() // => string
// Repairs mojibake corruption from latin1 single-byte to multi-byte UTF-8 character conversion

VerifyByteLength() // => boolean
// Check if a string's length matches its expected or safe 32-bit length

SanitizeInput() // => string
// Cleans string by removing/escaping characters based on a sanitization mode specifiable via options (alphanumeric, html, filename)

MAX_SAFE_CHAR_LIMIT // => number
// Max safe V8 32-bit character string limit used by VerifyByteLength()

Check USAGE_EX.MD for in-depth function usage examples

Testing

Basic assert tests are included in /test/ folder in index.test.js

Credit

Author: charuse

License: MIT

Made with the help of Google Gemini 2.5 Pro