Package Exports
- wink-bm25-text-search
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (wink-bm25-text-search) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
wink-bm25-text-search
Configurable BM25 Text Search Engine with simple semantic search support

wink-bm25-text-search is a part of wink, which is a family of Machine Learning NPM packages. They consist of simple and/or higher order functions that can be combined with NodeJS stream
and child processes
to create recipes for analytics driven business solutions.
Easily add in-memory semantic search to your application using wink-bm25-text-search. It is based on one of the most popular text-retrieval algorithm — BM25F — a Probabilistic Relevance Framework (PRF) for document retrieval. It accepts structured JSON documents as input for creating the model. Following is an example document structure of the sample data JSON contained in this package:
{
title: 'Barack Obama',
body: 'Barack Hussein Obama II born August 4, 1961 is an American politician...'
tags: 'democratic nobel peace prize columbia michelle...'
}
The sample data is created using excerpts from Wikipedia articles such as one on Barack Obama.
It's API offers a rich set of features:
- Configure text preparation task such as amplify negation, tokenize, stem, remove stop words, and propagate negation using wink-nlp-utils or any other package of your choice.
- Add semantic flavor to the search by:
- Defining the text preparation tasks separately for (a) each field (e.g. body or tags), (b) search string, and (c) a default for everything else.
- Assigning different degree of importance to every field in terms of a numerical weight.
- Configure all the BM25 parameters — (a)
k1
to control TF saturation, (b)b
to control degree of normalization, and (c)k
to manage IDF. - Export and import learnings from the added documents in a JSON format that can be easily saved on hard-disk.
Installation
Use npm to install:
npm install wink-bm25-text-search --save
Example 
// Load wink-bm25-text-search
var bm25 = require( 'wink-bm25-text-search' )();
// Load NLP utilities
var nlp = require( 'wink-nlp-utils' );
// Load sample data (load any other JSON data instead of sample)
var docs = require( 'wink-bm25-text-search/sample-data/data-for-wink-bm25.json' );
// Set up preparatory tasks for 'body' field
bm25.definePrepTasks( [
nlp.string.lowerCase,
nlp.string.removeExtraSpaces,
nlp.string.tokenize0,
nlp.tokens.propagateNegations,
nlp.tokens.removeWords,
nlp.tokens.stem
], 'body' );
// Set up 'default' preparatory tasks i.e. for everything else
bm25.definePrepTasks( [
nlp.string.lowerCase,
nlp.string.removeExtraSpaces,
nlp.string.tokenize0,
nlp.tokens.propagateNegations,
nlp.tokens.stem
] );
// Define BM25 configuration
bm25.defineConfig( {
fldWeights: { title: 4, body: 1, tags: 2 },
bm25Params: { k1: 1.2, k: 1, b: 0.75 }
} );
// Add documents now...
docs.forEach( function ( doc, i ) {
// Note, 'i' becomes the unique id for 'doc'
bm25.addDoc( doc, i );
} );
// Consolidate before searching
bm25.consolidate();
// All set, start searching!
var results = bm25.search( 'who is married to barack' );
// results is an array of [ doc-id, score ], sorted by score
// results[ 0 ][ 0 ] i.e. the top result is:
console.log( docs[ results[ 0 ][ 0 ] ].body );
// -> Michelle LaVaughn Robinson Obama (born January 17, 1964) is...
API
definePrepTasks( tasks [, field ] )
Defines the text preparation tasks
to transform raw incoming text into an array of tokens required during addDoc()
, and search()
operations. The tasks
should be an array of functions. The first function in this array must accept a string as input; and the last function must return an array of tokens as JavaScript Strings. Each function must accept one input argument and return a single value. definePrepTasks
returns the count of tasks
. The second argument — field
is optional. It defines the field
of the document for which the tasks
will be defined; in absence of this argument, the tasks
become the default for everything else.
As illustrated in the usage, wink-nlp-utils offers a rich set of such functions.
defineConfig( config )
Defines the configuration from the config
object. This object must define 2 properties viz. (a) fldWeights
and bm25Params
. The fldWeights
is an object where each key is the document's field name and the value is the numerical weight i.e. the importance of that field. The bm25Params
is also an object that defines upto 3 keys viz. k1
, b
, and k
. Thier default values are respectively 1.2, 0.75, and 1.
addDoc( doc, uniqueId )
Simply adds the doc
with the uniqueId
to the BM25 model. If the input is a JavaScript String, then definePrepTasks()
must be called before learning. Similarly defineConfig()
must also be called before this operation.
It has an alias learn( doc, uniqueId )
to maintain API level uniformity across various wink packages such as wink-naive-bayes-text-classifier.
consolidate()
Consolidates the BM25 model for all the added documents. It is a prerequisite for search()
.
search( text [, limit ] )
Searches for the text
and returns upto the limit
number of results. The result is an array of
[ uniqueId, relevanceScore ]
, sorted on the relevanceScore
. The default value of limit
is 10.
Like addDoc()
, tt also has an alias predict( doc, uniqueId )
to maintain API level uniformity across various wink packages such as wink-naive-bayes-text-classifier.
exportJSON()
The BM25 model can be exported as JSON text that may be saved in a file.
importJSON( json )
An existing JSON BM25 model can be imported for search. It is essential to definePrepTasks()
and consolidate()
before attempting to search.
reset()
It completely resets the BM25 model by re-initializing all the variables, except the preparatory tasks.
Need Help?
If you spot a bug and the same has not yet been reported, raise a new issue or consider fixing it and sending a pull request.
Copyright & License
wink-bm25-text-search is copyright 2017 GRAYPE Systems Private Limited.
It is licensed under the under the terms of the GNU Affero General Public License as published by the Free Software Foundation, version 3 of the License.