JSPM

  • Created
  • Published
  • Downloads 86924
  • Score
    100M100P100Q176671F
  • License MIT

An NPM package to detect the encoding and language of a file

Package Exports

  • detect-file-encoding-and-language

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (detect-file-encoding-and-language) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

Detect-File-Encoding-and-Language

npm npm npm bundle size

NPM stats

Maintance...

Please install version 1.6.1! It's the latest stable version.

The newest version 1.6.7 does not work!

I am currently working on a fix!

Functionality

Determine the encoding and language of any text file!

  • Detects 40 languages as well as the appropriate encoding
  • Works best with large inputs
  • Completely free, no API key required

For reliable encoding and language detection, use files containing 500 words or more. Smaller inputs can work as well but the results might be less accurate and in some cases incorrect.

Feel free to test the functionality of this NPM package here. Upload your own files and see if the encoding and language are detected correctly!

Index

Usage (Javascript)

Installation

$ npm install detect-file-encoding-and-language

In the browser

// index.html

<input type="file" id="my-input-field" >

Note: This should work fine with frameworks such as React but if you're using pure vanilla Javascript make sure to use a bundler such as Browserify!

// app.js

const languageEncoding = require("detect-file-encoding-and-language");

document.getElementById("my-input-field").addEventListener("change", inputHandler);

function inputHandler(e) {
    const file = e.target.files[0];

    languageEncoding(file).then(fileInfo => console.log(fileInfo));
    // Possible result: { language: english, encoding: UTF-8, confidence: 0.97}
}

In Node.js

// index.js

const languageEncoding = require("detect-file-encoding-and-language");

const pathToFile = "/home/username/documents/my-text-file.txt"

languageEncoding(pathToFile).then(fileInfo => console.log(fileInfo));
// Possible result: { language: japanese, encoding: Shift-JIS, confidence: 1 }

Usage (CLI)

Installation

$ npm install -g detect-file-encoding-and-language

In the terminal

Use the command dfeal to retrieve the encoding and language of your file:

$ dfeal "/home/user name/Documents/subtitle file.srt"
# Possible result: { language: french, encoding: CP1252, confidence: 0.99 }

or without quotation marks, using backslashes to escape spaces:

$ dfeal /home/user\ name/Documents/subtitle\ file.srt
# Possible result: { language: french, encoding: CP1252, confidence: 0.99 }

Supported Languages

  • Polish
  • Czech
  • Hungarian
  • Romanian
  • Slovak
  • Slovenian
  • Albanian
  • Russian
  • Ukrainian
  • Bulgarian
  • English
  • French
  • Portuguese
  • Spanish
  • German
  • Italian
  • Danish
  • Norwegian
  • Swedish
  • Dutch
  • Finnish
  • Serbo-Croatian
  • Estonian
  • Icelandic
  • Malay-Indonesian
  • Greek
  • Turkish
  • Hebrew
  • Arabic
  • Farsi-Persian
  • Lithuanian
  • Chinese-Simplified
  • Chinese-Traditional
  • Japanese
  • Korean
  • Thai
  • Bengali
  • Hindi
  • Urdu
  • Vietnamese

Used Encodings

  • UTF-8
  • CP1250
  • CP1251
  • CP1252
  • CP1253
  • CP1254
  • CP1255
  • CP1256
  • CP1257
  • GB18030
  • BIG5
  • Shift-JIS
  • EUC-KR
  • TIS-620

Confidence Score

The confidence score ranges from 0 to 1. It is based on the amount of matches that were found for a particular language and the frequency of those matches. If you want to learn more about how it all works, check out the Wiki entry!

Known Issues

  • Unable to detect Shift-JIS encoded Japanese text files when using Node.js. Solutions are welcome!
  • Unable to detect UTF-16-LE encoded files when using Node.js. Solutions are welcome!

License

This project is licensed under the MIT License

License