JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 169918
  • Score
    100M100P100Q160259F
  • License MIT

compressed-trie data-structure

Package Exports

  • efrt

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (efrt) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

trie-based compression of word-data
npm i efrt

efrt is a prefix/suffix trie optimised for compression of english words.

it is based on mckoss/lookups by Mike Koss and bits.js by Steve Hanov

  • squeeze a list of words into a very compact form
  • reduce filesize/bandwidth a bunch
  • ensure unpacking overhead is negligible
  • word-lookups are critical-path

By doing the fancy stuff ahead-of-time, efrt lets you ship much bigger word-lists to the client-side, without much hassle.

var efrt = require('efrt')
var words = [
  'coolage',
  'cool',
  'cool cat',
  'cool.com',
  'coolamungo'
];

//pack these words as tightly as possible
var compressed = efrt.pack(words);
//cool0;! cat,.com,a0;ge,mungo

//create a lookup-trie
var trie = efrt.unpack(compressed);

//hit it!
console.log(trie.has('cool'));//true
console.log(trie.has('miles davis'));//false

Demo!

the words you input should be pretty normalized. Spaces and unicode are good, but numbers, case-sensitivity, and some punctuation are not (yet) supported.

##Performance there are two modes that efrt can run in, depending on what you want to optimise for. By itself, it will be ready-instantly, but must lookup words by their prefixes in the trie. This is not super-fast. If you want lookups to go faster, you can call trie.cache() first, to pre-compute the queries. Things will run much faster after this:

var compressed = efrt.pack(skateboarders);//1k words (on a macbook)
var trie = efrt.unpack(compressed)
trie.has('tony hawk')
// trie-lookup: 1.1ms

trie.cache()
// caching-step: 5.1ms

trie.has('tony hawk')
// cached-lookup: 0.02ms

the trie.cache() command will spin the trie into a good-old javascript object, for faster lookups. It takes some time building it though.

In this example, with 1k words, it makes sense to hit .cache() if you are going to do more-than 5 lookups on the trie, but your mileage may vary. You can access the object from trie.toObject(), or trie.toArray() if you'd like use it directly.

Size

efrt will pack filesize down as much as possible, depending upon the redundancy of the prefixes/suffixes in the words, and the size of the list.

  • list of countries - 1.5k -> 0.8k (46% compressed)
  • all adverbs in wordnet - 58k -> 24k (58% compressed)
  • all adjectives in wordnet - 265k -> 99k (62% compressed)
  • all nouns in wordnet - 1,775k -> 692k (61% compressed)

but there are some things to consider:

  • bigger files compress further (see 🎈 birthday problem)
  • using efrt will reduce gains from gzip compression, which most webservers quietly use
  • english is more suffix-redundant than prefix-redundant, so non-english words may benefit from other styles

##Use IE9+

<script src="https://unpkg.com/efrt@latest/builds/efrt.min.js"></script>
<script>
  var smaller=efrt.pack(['larry','curly','moe'])
  var trie=efrt.unpack(smaller)
  console.log(trie.has('moe'))
</script>

if you're doing the second step in the client, you can load just the unpack-half of the library(~3k):

<script src="https://unpkg.com/efrt@latest/builds/efrt-unpack.min.js"></script>
<script>
  var trie=unpack(compressedStuff);
  trie.has('miles davis');
</script>

MIT