Package Exports

parse-latin
parse-latin/index.js
parse-latin/lib/plugin/break-implicit-sentences
parse-latin/lib/plugin/break-implicit-sentences.js
parse-latin/lib/plugin/make-final-white-space-siblings
parse-latin/lib/plugin/make-final-white-space-siblings.js
parse-latin/lib/plugin/make-initial-white-space-siblings
parse-latin/lib/plugin/make-initial-white-space-siblings.js
parse-latin/lib/plugin/merge-affix-exceptions
parse-latin/lib/plugin/merge-affix-exceptions.js
parse-latin/lib/plugin/merge-affix-symbol
parse-latin/lib/plugin/merge-affix-symbol.js
parse-latin/lib/plugin/merge-initial-digit-sentences
parse-latin/lib/plugin/merge-initial-digit-sentences.js
parse-latin/lib/plugin/merge-initial-lower-case-letter-sentences
parse-latin/lib/plugin/merge-initial-lower-case-letter-sentences.js
parse-latin/lib/plugin/merge-non-word-sentences
parse-latin/lib/plugin/merge-non-word-sentences.js
parse-latin/lib/plugin/merge-remaining-full-stops
parse-latin/lib/plugin/merge-remaining-full-stops.js
parse-latin/lib/plugin/patch-position
parse-latin/lib/plugin/patch-position.js
parse-latin/lib/plugin/remove-empty-nodes
parse-latin/lib/plugin/remove-empty-nodes.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (parse-latin) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

parse-latin

A natural language parser, for Latin-script languages, that produces nlcst.

What is this?
When should I use this?
Install
Use
API
- ParseLatin()
Algorithm
Types
Compatibility
Related
Contribute
Security
License

What is this?

This package exposes a parser that takes Latin-script natural language and produces a syntax tree.

When should I use this?

If you want to handle natural language as syntax trees manually, use this.

Alternatively, you can use the retext plugin retext-latin, which wraps this project to also parse natural language at a higher-level (easier) abstraction.

Whether Old-English (“þā gewearþ þǣm hlāforde and þǣm hȳrigmannum wiþ ānum penninge”), Icelandic (“Hvað er að frétta”), French (“Où sont les toilettes?”), this project does a good job at tokenizing it.

For English and Dutch, you can instead use parse-english and parse-dutch.

You can somewhat use this for Latin-like scripts, such as Cyrillic (“Добро пожаловать!”), Georgian (“როგორა ხარ?”), Armenian (“Շատ հաճելի է”), and such.

Install

This package is ESM only. In Node.js (version 14.14+ and 16.0+), install with npm:

npm install parse-latin

In Deno with esm.sh:

import {ParseLatin} from 'https://esm.sh/parse-latin@6'

In browsers with esm.sh:

<script type="module">
  import {ParseLatin} from 'https://esm.sh/parse-latin@6?bundle'
</script>

Use

import {inspect} from 'unist-util-inspect'
import {ParseLatin} from 'parse-latin'

const tree = new ParseLatin().parse('A simple sentence.')

console.log(inspect(tree))

Yields:

RootNode[1] (1:1-1:19, 0-18)
└─0 ParagraphNode[1] (1:1-1:19, 0-18)
    └─0 SentenceNode[6] (1:1-1:19, 0-18)
        ├─0 WordNode[1] (1:1-1:2, 0-1)
        │   └─0 TextNode "A" (1:1-1:2, 0-1)
        ├─1 WhiteSpaceNode " " (1:2-1:3, 1-2)
        ├─2 WordNode[1] (1:3-1:9, 2-8)
        │   └─0 TextNode "simple" (1:3-1:9, 2-8)
        ├─3 WhiteSpaceNode " " (1:9-1:10, 8-9)
        ├─4 WordNode[1] (1:10-1:18, 9-17)
        │   └─0 TextNode "sentence" (1:10-1:18, 9-17)
        └─5 PunctuationNode "." (1:18-1:19, 17-18)

API

This package exports the identifier ParseLatin. There is no default export.

`ParseLatin()`

Create a new parser.

`ParseLatin#parse(value)`

Turn natural language into a syntax tree.

Parameters

value (string, optional) — value to parse

Returns

Tree (RootNode).

Algorithm

👉 Note: The easiest way to see how parse-latin parses, is by using the online parser demo, which shows the syntax tree corresponding to the typed text.

parse-latin splits text into white space, punctuation, symbol, and word tokens:

“word” is one or more unicode letters or numbers
“white space” is one or more unicode white space characters
“punctuation” is one or more unicode punctuation characters
“symbol” is one or more of anything else

Then, it manipulates and merges those tokens into a syntax tree, adding sentences and paragraphs where needed.

some punctuation marks are part of the word they occur in, such as non-profit, she’s, G.I., 11:00, N/A, &c, nineteenth- and…
some periods do not mark a sentence end, such as 1., e.g., id.
although periods, question marks, and exclamation marks (sometimes) end a sentence, that end might not occur directly after the mark, such as .), ."
…and many more exceptions

Types

This package is fully typed with TypeScript. It exports no additional types.

Compatibility

This package is at least compatible with all maintained versions of Node.js. As of now, that is Node.js 14.14+ and 16.0+. It also works in Deno and modern browsers.

parse-english — English (natural language) parser
parse-dutch — Dutch (natural language) parser

Contribute

Yes please! See How to Contribute to Open Source.

Security

This package is safe.

parse-latin

Package Exports

Readme

parse-latin

Contents

What is this?

When should I use this?

Install

Use

API

ParseLatin()

ParseLatin#parse(value)

Parameters

Returns

Algorithm

Types

Compatibility

Related

Contribute

Security

License

`ParseLatin()`

`ParseLatin#parse(value)`