Package Exports

detect-file-encoding-and-language

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (detect-file-encoding-and-language) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

Detect-File-Encoding-and-Language

Functionality
Installation
Example
Supported Languages
Used Encodings
Confidence Score
License

Functionality

Determine the encoding and language of any text file!

Detects 40 languages as well as the appropriate encoding
Works best with large inputs
Completely free, no API key required

For reliable encoding and language detection, use files containing 500 words or more. Smaller inputs can work as well but the results might be less accurate and in some cases incorrect.

Feel free to test the functionality of this NPM package here. Upload your own files and see if the encoding and language are detected correctly!

Installation

npm install detect-file-encoding-and-language

Example

// index.html

<input type="file" id="my-input-field" >

// app.js

const languageEncoding = require("detect-file-encoding-and-language");

document.getElementById("my-input-field").addEventListener("change", inputHandler);

function inputHandler(e) {
    const file = e.target.files[0];

    languageEncoding(file).then(fileInfo => console.log(fileInfo));
    // Possible result: { language: english, encoding: UTF-8, confidence: 0.99}
}

Supported Languages

polish
czech
hungarian
romanian
slovak
slovenian
albanian
russian
ukrainian
bulgarian
english
french
portuguese
spanish
german
italian
danish
norwegian
swedish
dutch
finnish
serbo-croatian
estonian
icelandic
malay-indonesian
greek
turkish
hebrew
arabic
farsi-persian
lithuanian
chinese-simplified
chinese-traditional
japanese
korean
thai
bengali
hindi
urdu
vietnamese

Used Encodings

UTF-8
CP1250
CP1251
CP1252
CP1253
CP1254
CP1255
CP1256
CP1257
GB18030
BIG5
Shift-JIS
EUC-KR
TIS-620

Confidence Score

The confidence score ranges from 0.5 to 1 and reflects the ratio between the two highest scoring languages/encodings.

If the confidence score is 0.5 you have a one in two chance that the language/encoding has been detected correctly.

A 0.8 confidence score means that the detected language/encoding had four times more matches than the second highest scoring language/encoding.

License

This project is licensed under the MIT License