Package Exports
- retext
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (retext) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
retext is an extensible natural language systemβby default using parse-latin to transform natural language into NLCST. Retext provides a pluggable system for analysing and manipulating natural language in JavaScript. NodeJS and the browser. Tests provide 100% coverage.
Rather than being a do-all library for Natural Language Processing (such as NLTK or OpenNLP), retext aims to be useful for more practical use cases (such as censoring profane words or decoding emoticons, but the possibilities are endless) instead of more academic goals (research purposes). retext is inherently modularβit uses plugins (similar to rework for CSS) instead of providing everything out of the box (such as Natural). This makes retext a viable tool for use on the web.
Installation
npm:
npm install retext
component install wooorm/retext
bower install retext
Duo:
var Retext = require('wooorm/retext');
UMD (globals/AMD/CommonJS) (uncompressed and compressed):
<script src="path/to/retext.js"></script>
<script>
var retext = new Retext();
</script>
Usage
The following example uses retext-emoji (to show emoji) and retext-smartypants (for smart punctuation).
Require dependencies:
var retext = require('retext');
var emoji = require('retext-emoji');
var smartypants = require('retext-smartypants');
Create an instance using retext-emoji and -smartypants:
var processor = retext().use(smartypants).use(emoji, {
'convert' : 'encode'
});
Process a document:
var doc = processor.process(
'The three wise monkeys [. . .] sometimes called the ' +
'three mystic apes--are a pictorial maxim. Together ' +
'they embody the proverbial principle to ("see no evil, ' +
'hear no evil, speak no evil"). The three monkeys are ' +
'Mizaru (π), covering his eyes, who sees no ' +
'evil; Kikazaru (π), covering his ears, ' +
'who hears no evil; and Iwazaru (π), ' +
'covering his mouth, who speaks no evil.'
);
Yields (you need a browser which supports emoji to see them):
The three wise monkeys [β¦] sometimes called the three
mystic apesβare a pictorial maxim. Together they
embody the proverbial principle to (βsee no evil,
hear no evil, speak no evilβ). The three monkeys are
Mizaru (π), covering his eyes, who sees no evil;
Kikazaru (π), covering his ears, who hears no evil;
and Iwazaru (π), covering his mouth, who speaks no evil.
API
retext.use(plugin[, options])
Change the way retext works by using a plugin.
Signatures
processor = retext.use(plugin, options?)
;processor = retext.use(plugins)
.
Parameters
plugin
(Function
) β A Plugin;plugins
(Array.<Function>
) β A list of Plugins;options
(Object?
) β Passed to the plugin. Specified by its documentation.
Returns
Object
: an instance of Retext: The returned object functions just like
retext (it has the same methods), but caches the use
d plugins. This
provides the ability to chain use
calls to use multiple plugins, but
ensures the functioning of the retext module does not change for other
dependents.
retext.process(value[, done])
Parse a text document, apply plugins to it, and compile it into something else.
Signatures
doc = mdast.process(value[, done])
.
Parameters
value
(string
) β Text document;done
(function(err, doc, file)
, optional) β Callback invoked when the output is generated with either an error, or a result. Only strictly needed when async plugins are used.
Returns
string
or null
: A document. Formatted in whatever plugins generate.
The result is null
if a plugin is asynchronous, in which case the callback
done
shouldβve been passed (donβt worry: plugin creators make sure you know
its async).
plugin
A plugin is simply a function, with function(retext[, options])
as its
signature. The first argument is the Retext instance a user attached the
plugin to. The plugin is invoked when a user use
s the plugin (not when a
document is parsed) and enables the plugin to modify retext.
The plugin can return another function: function(NLCSTNode, file[, next])
.
This function is invoked when a document is parsed.
Plugins
retext-content β Append, prepend, remove, and replace content into/from Retext nodes;
retext-cst β (demo) β Encoding and decoding between AST (JSON) and TextOM object model;
retext-directionality β (demo) β Detect the direction text is written in;
retext-dom β (demo) β Create a (living) DOM tree from a TextOM tree;
retext-double-metaphone β (demo) β Implementation of the Double Metaphone algorithm;
retext-emoji β (demo) β Encode or decode Gemojis;
retext-find β Easily find nodes;
retext-inspect β (demo) β Nicely display nodes in
console.log
calls;retext-keywords β (demo) β Extract keywords and keyphrases;
retext-lancaster-stemmer β (demo) β Implementation of the Lancaster (Paice/Husk) algorithm;
retext-language β (demo) β Detect the language of text;
retext-link β (demo) β Detect links in text;
retext-live β Change a node based on a (new?) value;
retext-metaphone β (demo) β Implementation of the Metaphone algorithm;
retext-porter-stemmer β (demo) β Implementation of the Porter stemming algorithm;
retext-pos β (demo) β Part-of-speech tagger;
retext-range β Sequences of content within a TextOM tree between two points;
retext-search β (demo) β Search in a TextOM tree;
retext-sentiment β (demo) β Detect sentiment in text;
retext-smartypants β (demo) β Implementation of SmartyPants;
retext-soundex β (demo) β Implementation of the Soundex algorithm;
retext-syllable β (demo) β Syllable count;
retext-visit β (demo) β Visit nodes, optionally by type;
retext-walk β Walk trees, optionally by type.
Desired Plugins
Hey! Want to create one of the following, or any other plugin, for retext but not sure where to start? I suggest to read retext-visitβs source code to see how itβs build first (itβs probably the most straight forward to learn), and go from there. Let me know if you still have any questions, go ahead and send me feedback or raise an issue.
retext-date β Detect time and date in text;
retext-frequen -words β Like retext-keywords, but based on frequency and stop-words instead of a POS-tagger;
retext-hyphen β Insert soft-hyphens where needed; this might have to be implemented with some sort of node which doesnβt stringify;
retext-location β Track the position of nodes (line, column);
retext-no-pants β Opposite of retext-smartypants;
retext-no-break β Inserts non-breaking spaces between things like β100 kmβ;
retext-profanity β Censor profane words;
retext-punctuation-pair β Detect which opening or initial punctuation, belongs to which closing or final punctuation mark (and vice versa);
retext-summary β Summarise text;
retext-sync β Detect changes in a textarea (or contenteditable?), sync the diffs over to a retext tree, let plugins modify the content, and sync the diffs back to the textarea;
retext-typography β Applies typographic enhancements, like (or using?) retext-smartypants and retext-hyphen;
retraverse β Like Estraverse.
Parsers
parse-latin (demo) β default;
parse-english (demo) β Specifically for English;
parse-dutch (demo) β Specifically for Dutch;
Benchmark
On a MacBook Air, it parses about 2 big articles, 25 sections, or 230 paragraphs per second.
retext.parse(value, callback);
325 op/s Β» A paragraph (5 sentences, 100 words)
33 op/s Β» A section (10 paragraphs, 50 sentences, 1,000 words)
3 op/s Β» An article (100 paragraphs, 500 sentences, 10,000 words)
Related
License
MIT Β© Titus Wormer