JSPM

  • Created
  • Published
  • Downloads 114486
  • Score
    100M100P100Q166996F
  • License MIT

English natural language parser

Package Exports

  • parse-english

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (parse-english) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

parse-english

Build Coverage Downloads Size Chat

English language parser for retext producing nlcst nodes.

Install

This package is ESM only: Node 12+ is needed to use it and it must be imported instead of required.

npm:

npm install parse-english

Use

import inspect from 'unist-util-inspect'
import {ParseEnglish} from 'parse-english'

var tree = new ParseEnglish().parse(
  'Mr. Henry Brown: A hapless but friendly City of London worker.'
)

console.log(inspect(tree))

Yields:

RootNode[1] (1:1-1:63, 0-62)
└─ ParagraphNode[1] (1:1-1:63, 0-62)
   └─ SentenceNode[23] (1:1-1:63, 0-62)
      ├─ WordNode[2] (1:1-1:4, 0-3)
      │  ├─ TextNode: "Mr" (1:1-1:3, 0-2)
      │  └─ PunctuationNode: "." (1:3-1:4, 2-3)
      ├─ WhiteSpaceNode: " " (1:4-1:5, 3-4)
      ├─ WordNode[1] (1:5-1:10, 4-9)
      │  └─ TextNode: "Henry" (1:5-1:10, 4-9)
      ├─ WhiteSpaceNode: " " (1:10-1:11, 9-10)
      ├─ WordNode[1] (1:11-1:16, 10-15)
      │  └─ TextNode: "Brown" (1:11-1:16, 10-15)
      ├─ PunctuationNode: ":" (1:16-1:17, 15-16)
      ├─ WhiteSpaceNode: " " (1:17-1:18, 16-17)
      ├─ WordNode[1] (1:18-1:19, 17-18)
      │  └─ TextNode: "A" (1:18-1:19, 17-18)
      ├─ WhiteSpaceNode: " " (1:19-1:20, 18-19)
      ├─ WordNode[1] (1:20-1:27, 19-26)
      │  └─ TextNode: "hapless" (1:20-1:27, 19-26)
      ├─ WhiteSpaceNode: " " (1:27-1:28, 26-27)
      ├─ WordNode[1] (1:28-1:31, 27-30)
      │  └─ TextNode: "but" (1:28-1:31, 27-30)
      ├─ WhiteSpaceNode: " " (1:31-1:32, 30-31)
      ├─ WordNode[1] (1:32-1:40, 31-39)
      │  └─ TextNode: "friendly" (1:32-1:40, 31-39)
      ├─ WhiteSpaceNode: " " (1:40-1:41, 39-40)
      ├─ WordNode[1] (1:41-1:45, 40-44)
      │  └─ TextNode: "City" (1:41-1:45, 40-44)
      ├─ WhiteSpaceNode: " " (1:45-1:46, 44-45)
      ├─ WordNode[1] (1:46-1:48, 45-47)
      │  └─ TextNode: "of" (1:46-1:48, 45-47)
      ├─ WhiteSpaceNode: " " (1:48-1:49, 47-48)
      ├─ WordNode[1] (1:49-1:55, 48-54)
      │  └─ TextNode: "London" (1:49-1:55, 48-54)
      ├─ WhiteSpaceNode: " " (1:55-1:56, 54-55)
      ├─ WordNode[1] (1:56-1:62, 55-61)
      │  └─ TextNode: "worker" (1:56-1:62, 55-61)
      └─ PunctuationNode: "." (1:62-1:63, 61-62)

API

This package exports the following identifiers: ParseEnglish. There is no default export.

parse-english has the same API as parse-latin.

Algorithm

All of parse-latin is included, and the following support for the English natural language:

  • Unit abbreviations (tsp., tbsp., oz., ft., and more)
  • Time references (sec., min., tues., thu., feb., and more)
  • Business Abbreviations (Inc. and Ltd.)
  • Social titles (Mr., Mmes., Sr., and more)
  • Rank and academic titles (Dr., Rep., Gen., Prof., Pres., and more)
  • Geographical abbreviations (Ave., Blvd., Ft., Hwy., and more)
  • American state abbreviations (Ala., Minn., La., Tex., and more)
  • Canadian province abbreviations (Alta., Qué., Yuk., and more)
  • English county abbreviations (Beds., Leics., Shrops., and more)
  • Common elision (omission of letters) (’n’, ’o, ’em, ’twas, ’80s, and more)

License

MIT © Titus Wormer