Package Exports
- stopwords-json
- stopwords-json/dist/en
- stopwords-json/stopwords-all.json
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (stopwords-json) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
stopwords-json

Stopwords for various languages in JSON format. Per Wikipedia:
Stop words are words which are filtered out prior to, or after, processing of natural language data [...] these are some of the most common, short function words, such as the, is, at, which, and on.
You can use all stopwords with stopwords-all.json (keyed by language ISO 639-1 code), or see the below table for individual language stopword files.
Languages
There are a total of 50 supported languages:
| Language | Stopword count | Filename |
|---|---|---|
| Afrikaans | 51 | af.json |
| Arabic | 162 | ar.json |
| Armenian | 45 | hy.json |
| Basque | 98 | eu.json |
| Bengali | 116 | bn.json |
| Breton | 126 | br.json |
| Bulgarian | 259 | bg.json |
| Catalan | 218 | ca.json |
| Chinese | 542 | zh.json |
| Croatian | 179 | hr.json |
| Czech | 346 | cs.json |
| Danish | 101 | da.json |
| Dutch | 275 | nl.json |
| English | 570 | en.json |
| Esperanto | 173 | eo.json |
| Estonian | 35 | et.json |
| Finnish | 772 | fi.json |
| French | 606 | fr.json |
| Galician | 160 | gl.json |
| German | 596 | de.json |
| Greek | 75 | el.json |
| Hausa | 39 | ha.json |
| Hebrew | 194 | he.json |
| Hindi | 225 | hi.json |
| Hungarian | 781 | hu.json |
| Indonesian | 355 | id.json |
| Irish | 109 | ga.json |
| Italian | 623 | it.json |
| Japanese | 109 | ja.json |
| Korean | 679 | ko.json |
| Latin | 49 | la.json |
| Latvian | 161 | lv.json |
| Marathi | 99 | mr.json |
| Norwegian | 172 | no.json |
| Persian | 332 | fa.json |
| Polish | 260 | pl.json |
| Portuguese | 408 | pt.json |
| Romanian | 282 | ro.json |
| Russian | 539 | ru.json |
| Slovak | 110 | sk.json |
| Slovenian | 446 | sl.json |
| Somalia | 30 | so.json |
| Southern Sotho | 31 | st.json |
| Spanish | 577 | es.json |
| Swahili | 74 | sw.json |
| Swedish | 401 | sv.json |
| Thai | 115 | th.json |
| Turkish | 279 | tr.json |
| Yoruba | 60 | yo.json |
| Zulu | 29 | zu.json |
Sources
- Apache Lucene - Apache 2.0 License
- Carrot2 - License
- cue.language - Apache 2.0 License
- Jacques Savoy - BSD License
- SMART Information Retrieval System: ftp://ftp.cs.cornell.edu/pub/smart/
- ASP Stoplist Project - CC-BY and Apache 2.0
License and Copyright
Copyright (c) 2016 Peter Graham, contributors. Released under the Apache-2.0 license.