Package Exports
- zh-chardet
- zh-chardet/dist/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (zh-chardet) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
zh-chardet
Chinese character encoding detection for GB18030, Big5, UTF-8, and UTF-16.
Features
- Detects GB18030, Big5, UTF-8, UTF-16LE, and UTF-16BE encodings
- Works in both Node.js and browsers
- TypeScript support included
- Confidence scoring for detection results
- Multiple detection result options
Installation
npm install zh-chardet
Usage
Basic Usage
import { detect, detectBest, detectEncoding } from 'zh-chardet';
// Get all possible encodings with confidence scores
const results = detect('你好世界');
console.log(results);
// [{ encoding: 'UTF-8', confidence: 0.90 }, ...]
// Get the best match only
const best = detectBest('你好世界');
console.log(best);
// { encoding: 'UTF-8', confidence: 0.90 }
// Get just the encoding name
const encoding = detectEncoding('你好世界');
console.log(encoding);
// 'UTF-8'
Working with Binary Data
import { detect } from 'zh-chardet';
// From Uint8Array
const bytes = new Uint8Array([0xC4, 0xE3, 0xBA, 0xC3]);
const results = detect(bytes);
// From Buffer (Node.js)
const buffer = Buffer.from([0xC4, 0xE3, 0xBA, 0xC3]);
const results2 = detect(buffer);
Options
import { detect } from 'zh-chardet';
// Filter results by minimum confidence
const results = detect(text, { minimumConfidence: 0.5 });
API
detect(input, options?): DetectionResult[]
Returns all possible encoding matches with confidence scores.
input
:string | Buffer | Uint8Array
- Text or binary data to analyzeoptions.minimumConfidence
:number
- Filter results below this confidence (default: 0.1)
detectBest(input, options?): DetectionResult | null
Returns the highest confidence encoding match.
detectEncoding(input, options?): Encoding | null
Returns just the encoding name of the best match.
Supported Encodings
- UTF-8: Unicode encoding
- UTF-16LE: UTF-16 Little Endian (with/without BOM)
- UTF-16BE: UTF-16 Big Endian (with/without BOM)
- GB18030: Chinese national standard encoding
- Big5: Traditional Chinese encoding
License
MIT