JSPM

zh-chardet

1.0.0
    • ESM via JSPM
    • ES Module Entrypoint
    • Export Map
    • Keywords
    • License
    • Repository URL
    • TypeScript Types
    • README
    • Created
    • Published
    • Downloads 2
    • Score
      100M100P100Q33668F
    • License MIT

    Chinese character encoding detection for GB18030, Big5, UTF-8, UTF-16

    Package Exports

    • zh-chardet
    • zh-chardet/dist/index.js

    This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (zh-chardet) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

    Readme

    zh-chardet

    Chinese character encoding detection for GB18030, Big5, UTF-8, and UTF-16.

    Features

    • Detects GB18030, Big5, UTF-8, UTF-16LE, and UTF-16BE encodings
    • Works in both Node.js and browsers
    • TypeScript support included
    • Confidence scoring for detection results
    • Multiple detection result options

    Installation

    npm install zh-chardet

    Usage

    Basic Usage

    import { detect, detectBest, detectEncoding } from 'zh-chardet';
    
    // Get all possible encodings with confidence scores
    const results = detect('你好世界');
    console.log(results);
    // [{ encoding: 'UTF-8', confidence: 0.90 }, ...]
    
    // Get the best match only
    const best = detectBest('你好世界');
    console.log(best);
    // { encoding: 'UTF-8', confidence: 0.90 }
    
    // Get just the encoding name
    const encoding = detectEncoding('你好世界');
    console.log(encoding);
    // 'UTF-8'

    Working with Binary Data

    import { detect } from 'zh-chardet';
    
    // From Uint8Array
    const bytes = new Uint8Array([0xC4, 0xE3, 0xBA, 0xC3]);
    const results = detect(bytes);
    
    // From Buffer (Node.js)
    const buffer = Buffer.from([0xC4, 0xE3, 0xBA, 0xC3]);
    const results2 = detect(buffer);

    Options

    import { detect } from 'zh-chardet';
    
    // Filter results by minimum confidence
    const results = detect(text, { minimumConfidence: 0.5 });

    API

    detect(input, options?): DetectionResult[]

    Returns all possible encoding matches with confidence scores.

    • input: string | Buffer | Uint8Array - Text or binary data to analyze
    • options.minimumConfidence: number - Filter results below this confidence (default: 0.1)

    detectBest(input, options?): DetectionResult | null

    Returns the highest confidence encoding match.

    detectEncoding(input, options?): Encoding | null

    Returns just the encoding name of the best match.

    Supported Encodings

    • UTF-8: Unicode encoding
    • UTF-16LE: UTF-16 Little Endian (with/without BOM)
    • UTF-16BE: UTF-16 Big Endian (with/without BOM)
    • GB18030: Chinese national standard encoding
    • Big5: Traditional Chinese encoding

    License

    MIT