Package Exports
- html-encoding-sniffer
- html-encoding-sniffer/lib/html-encoding-sniffer.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (html-encoding-sniffer) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
Determine the Encoding of a HTML Byte Stream
This package implements the HTML Standard's encoding sniffing algorithm in all its glory. The most interesting part of this is how it pre-scans the first 1024 bytes in order to search for certain <meta charset>
-related patterns.
const htmlEncodingSniffer = require("html-encoding-sniffer");
const fs = require("fs");
const htmlBytes = fs.readFileSync("./html-page.html");
const sniffedEncoding = htmlEncodingSniffer(htmlBytes);
The passed bytes are given as a Uint8Array
; the Node.js Buffer
subclass of Uint8Array
will also work, as shown above.
The returned value will be a canonical encoding name (not a label). You might then combine this with the whatwg-encoding package to decode the result:
const whatwgEncoding = require("whatwg-encoding");
const htmlString = whatwgEncoding.decode(htmlBytes, sniffedEncoding);
Options
You can pass two potential options to htmlEncodingSniffer
:
const sniffedEncoding = htmlEncodingSniffer(htmlBytes, {
transportLayerEncodingLabel,
defaultEncoding
});
These represent two possible inputs into the encoding sniffing algorithm:
transportLayerEncodingLabel
is an encoding label that is obtained from the "transport layer" (probably a HTTPContent-Type
header), which overrides everything but a BOM.defaultEncoding
is the ultimate fallback encoding used if no valid encoding is supplied by the transport layer, and no encoding is sniffed from the bytes. It defaults to"windows-1252"
, as recommended by the algorithm's table of suggested defaults for "All other locales" (including theen
locale).
Credits
This package was originally based on the excellent work of @nicolashenry, in jsdom. It has since been pulled out into this separate package.