JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 2
  • Score
    100M100P100Q24022F
  • License MIT

A tiny but elegant parser combinator library written by Mepy

Package Exports

  • nihil

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (nihil) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

nihil

Info

Author = Mepy Github : Language = Engilish, mainly Blog : Language = Simplified Chinese(简体中文)

Intro

WARNING : a TOY project current, do NOT use in production

nihil is a parser combinator library for Javascript(ES2015+)

nihil is null, nothingness, etc, while nihil is useful and elegant NPM

nihil is a lot inspired by bnb But written in the style of FP(Functional Programming), mainly, I thought details in ## Reference

nihil has zero dependencies and is about 5.5KB(exactly 5646Bytes) raw Javascript As for minifying and zipping, I haven't tried it.

nihil source code uses some features of ES2015(or ES6), such as destructuring assignment, arrow function So you'd better use modern browser like Chrome, Firefox, etc.

nihil library contains only, maybe you can consider, one object nihil So it of course supports both Browser and NodeJS

nihil is elegant and easy to use, try it? -> Read ## Tutorial

Reference

nihil is a lot inspired by bnb At first, I used bnb for another toy project However, I met some troubles :

  1. Creating a recursive parser is a little hard
  2. Assign bnb.parser.parse to a variable will throw an error Because bnb implement parser with class Assign instance.parse to a variable will lose this pointer I decide to implement my own one, and I read the source code of bnb But nihil is totally new written, I thought it should be NOT a derivative of bnb

The similarity between nihil and bnb might only be bnb.match(RegExp) <=> nihil(RegExp) and chain method .and, .or, etc. Currently, I send an email to the author of bnb to talk about it

Tutorial

RegExp

You can use nihil(re) to create parsers and parse raw string with method .candy

nihil(/a+/)
.candy("aaaa")//{ value: [ 'aaaa' ] }

and : Sequential Parser

You can use nihil.and(re,...) to create a sequential parser

nihil.and(/a*/,/b*/,/c*/)
.candy("bbbbcccc")//{ value: [ '', 'bbbb', 'cccc' ] }

If exists a parser k, you can use k.and(parser,...)

const k = nihil(/k+/)
const g = nihil(/g+/)
k.and(g,k)
.candy("kkkggggkkkk")//{ value: [ 'kkk', 'gggg', 'kkkk' ] }

But It would trouble you if the second parser is a simple parser generated by nihil Well, you can simply enter RegExps in place of simple parsers

k.and(/g+/,k)
.candy("kkkggggkkkk")//{ value: [ 'kkk', 'gggg', 'kkkk' ] }

or : Choose Parsers

If you want to parse "a" or "b", You can of course use RegExp /a|b/ But if you want to parse [one complex language] or [another complex language], You can also do it with a complex RegExp which might bring errors, but using nihil.or, you can implement the same function as /a|b/ with well-understanding code style

const a_b = nihil.or(/a/,/b/)
a_b.candy("a")//{ value: [ 'a' ] }
a_b.candy("b")//{ value: [ 'b' ] }

.or is similar to .and, so you can also do like these:

const a = nihil(/a/)
const b = nihil(/b/)
const c = nihil(/c/)
a.or(b,c).candy("c")//{ value: [ 'c' ] }
a.or(b,/c/).candy("c")//{ value: [ 'c' ] }

ATTENTION : The order of parser makes difference The combined parser will try to parse using from left parser to right parser

nihil.or(/a/,/a+/).candy("aaa")//{ error: { expected: '<eof>', location: 1 } }
nihil.or(/a+/,/a/).candy("aaa")//{ value: [ 'aaa' ] }

Well, maybe you are considering it is just another style to write RegExp and doubting whether it is necessary to learn .or method. What if the complex language is not a RE language but a LL(1) language? You can not implement it with RegExp!!! Read the following tutorial, you can create a parser of LL(m) (m as big as you want and if your computer can compute it in time)

keep : Select Parsers

.keep method is used for selecting parsers according former value With this, you can implement a LL(m) parser. For example, maybe improper, we implement a LL(1) parser which accepts a string like 'aA', 'bB', ..., 'zZ'

const parser = nihil(/[a-z]/)
.keep(([$1])=>nihil(RegExp($1.toUpperCase())))
.candy
parser("nN")//{ value: [ 'n', 'N' ] }
parser("iJ")//{ error: [ { expected: '/I/', location: 1 } ] }

ATTENTION : in .keep method, you must give a function which returns a parser but NOT a RegExp, different from .and and .or

As for LL(m), the format is like this, simply m = 2

const char4 = nihil.and(/./,/./,/./,/./)
//result of char4 is an array with a length of 4 > m = 2
const parser = char4.keep(([$1,$2])=>{
        if($1=='a'){return nihil(/r/)}
        else
        {
                if($2=='b'){return nihil(/s/)}
                else{return nihil(/t/)}
        }
}).candy
//parser accepts /...ar/, /..b[^a]s/ or /..[^b][^a]t/ 
parser("wxyar")//{ value: [ 'w', 'x', 'y', 'a', 'r' ] }
parser("wxbzs")//{ value: [ 'w', 'x', 'b', 'z', 's' ] }
parser("wxyzt")//{ value: [ 'w', 'x', 'y', 'z', 't' ] }

ATTENTION : $1 is the left first of current parser "[$4][$3][$2][$1]" is the correct order for raw string

REASON : Why name this method after 'keep'? Because this method KEEP the former value (e.g. result of char4) Can we choose not to keep it? Of course, use .drop instead of .keep But it is directly used in a low frequency, in my view. Yes, we indirectly call method .drop by .map

map : Transform Value

Until now, value of result of parsers are all strings Sometimes, we need to transform them, e.g. transform '3' to 3

nihil(/[1-9][0-9]*/)//string of number, decimal
.map(([$1])=>[Number($1)])
.candy("3")//{ value: [ 3 ] }

Is it a little similar to .keep? Yes, it is calling method .drop to drop ['3'] and push [3]

ATTENTION:fn, in .map(fn), must return an ARRAY of values

.map could do other things, like verifying the value: For example, we decide to reject 114514

const num = nihil(/[1-9][0-9]*/)
.map(([$1])=>[Number($1)])
.map(([$1])=>{
        if($1==114514)
        {
                return []//or return undefined or return;
        }
        else{return [$1]}
})
num.and(/ /,num)
.candy("114514 3")////{ value: [ ' ', 3 ] } where ' ' is the result of / /

However, you might realize that the parser, in fact, accept 114514 but drop it. What if you want throw an error when meeting 114514? Use .drop

drop : Intercept Value (Optional)

Except that .drop drops the former parser's result, it is nearly the same as .keep

What needs to be emphasized is that fn in .drop(fn) MUST return a parser NEITHER RegExp, NOR array of values in .map

nihil.box(obj) will return a parser which does nothing but return obj, so called a box (a parser containing result) You might feels it useful, right?

const num = nihil(/[1-9][0-9]*/)
.map(([$1])=>[Number($1)])
.drop(([$1])=>{
        if($1==114514)
        {
                return nihil.box({error:[{reason:"Found 114514"}]})
                //error : [must be an array]
        }
        else
        {
                return nihil.box({value:[$1]})
                //value : [must be an array]
        }
})
const foo = num.and(/ /,num).candy
foo("114514 3")//{ error: { reason: 'Found 114514' } }
foo("10492 3")//{ value: [ 10492, ' ', 3 ] }

However, .drop could not throw an error with its location Because .keep and .drop use fn as selector in such format :

//in .keep, .drop
        ...
        const a = A(source)//using parser A to parse source into result a
        const B = selector(a.value)//using value of result to generate parser B
        ...

Location throwing is of course essential, but currently not supported. TODO: next version would support it, as well as pretty location {index,line,column}

sep : Parsing With Seperator

In the example of .map and .drop, you see an ugly value ' ' of result of RegExp / / Can we drop it? Using .map or .drop is bothersome Use .sep! See the following exmaple:

const num = nihil(/[1-9][0-9]*/)
.map(([$1])=>[Number($1)])
.sep(/ /)
.candy

num("2333")//{ value: [ [ 2333 ] ] }
num("34 15 27")//{ value: [ [ 34, 15, 27 ] ] }

ATTENTION : value of .sep would be an array of array (a second [])

candy&nihil : Some Explains

Now, you might use .candy for many times But you can't use nihil.candy because nihil has no .candy nihil(RegExp) return a TRUE parser, thus having .candy .candy can be assigned to a variable, but the variable is not a parser thus having no chain-style .and, .or, etc .candy can only do parsing

nihil, as its name, is a nihil (NULL parser, PSEUDO parser), a so special parser that nihil has many useful functions, like .and, .or, etc Anyway, nihil is also a parser though special You can use it to do nothing... Wait, have you felt struggle to handle? See an example:

const x_y = nihil.or(/x+/,/y+/)
x_y.sep(/ /).candy("xxxyyy")  //{ value: [ [ 'xxx', 'yyy' ] ] }
x_y.sep(nihil).candy("xxxyyy")//{ value: [ [ 'xxx', 'yyy' ] ] }

Though "xxxyyy" have no space between "xxx" and "yyy", .sep parser seperate them automatically! It acts like .sep(nihil), RegExp / / do nothing as nihil As a matter of fact, we implement .sep with the help of nihil And you might realize that .sep parser would parse undefinite times and return an array Right, we implement .sep with the help of .loop which actually use nihil But don't hurry to search .loop, let's learn .recur first.

recur : Recursively Parsing

Recall a recursive implement of Fact, Function Fact calls itself

const Fact = (n)=>{
        if(n==0){return 1;}
        else{return n*Fact(n-1)}
}
Fact(4)//4*3*2*1=24

Our parser could call itself to parse string like this "((a))" A universal recursive parse have 4 main part, L, I, R and fn To parse "((a))" into [["a"]], we let L = /(/, I=/a/, R=/)/ and fn = ([L,I,R])=>([[I]]) See example:

const a_recur = nihil(/a/)
.recur(/\(/,/\)/,([L,I,R])=>([[I]]))

a_recur.candy("((a))")//{value:[[["a"]]]}

Wait, why there are TRIPLE [] ? Well, remember, the value MUST be a ARRAY []

a_recur.sep(nihil).candy("((a))(a)")//{value:[[[["a"]],["a"]]]}

See, [["a"]] is the actual result.

loop : Looping Parse?

See example

nihil(/x/).loop().candy("xxxx")//{ value: [ 'x', 'x', 'x', 'x' ] }

ADVANCED : If familiar with recursion, you would quickly comprehend: loop is a subset of recursion We write recursive grammar like this:

M -> L M R | I

If we let I = R = nihil, (also fn=x=>x,i.e fn(x)=x, which is default) then we could get a loop parser But different from I.recur(L,R,fn), because nihil is a special parser We use nihil.recur(L)(nihil)(nihil)() to implement L.loop() You can see API.nihil.loop and source code for more understanding WARNING : We use a few FP(Functional Programming) styles which might trouble you

API

nihil

nihil(parser A)=>parser A nihil(source)=>nihil.nihil nihil(RegExp)=>parser(RE)

nihil.parser

It is the helper function of nihil If you are a new hand of Javascript but familiar with OOP(Object Oriented Programming), you can consider it as class parser

As a matter of fact, its real function is .candy for easily parse string and return proper result As for other functions, they are all for convenience of creating parsers in the grammar of chain calling You can intuitively feels it in Tutorial.

result format

nihil.nihil is used for loop, result={nihil:true,error:[...]} doesn't mean error occurs

nihil.nihil = {nihil:true}
nihil.eof = {eof:true}
result=
{
        value:[],
        error:[{expected,location}],
        nihil:true||undefined//true->return by nihil(source)
        eof:true||undefined//true->reach end of file/source
}

nihil.and

input array of RegExp or parser, return a parser which parses source sequentially in the order of array

nihil.or

input array of RegExp or parser, return a parser which tries to parse source from array[0] to array[array.length-1]:=end. If array[k] succeeds, array[k+1],...,end wouldn't be used to parse source.

nihil.keep

It helps implement the parser of LL($\infty$) For example, we want to implement a LL(m) parser, We first define a function fn with an array of values [$1,$2,...,$m], return by parse, as arguments and a parser as return value, according to [$1,$2,...,$m].

var fn = ([$1,$2,...,$m])=>parser
var LLm = nihil.keep(parse)(fn)

nihil.drop

It is similar to nihil.keep with difference that it would DROP the values parsed out by parse It is used for nihil.map

nihil.box

It is to create a parser with value in it and do nothing to ```source``

nihil.box = value=>nihil.parser(source=>value)

nihil.map

Used to map the value of parse to the format you like

nihil.map(parse)(mapping)

nihil.lazy

Inspired by bnb,bread-n-butter It is used to generate recursive parser(see source code of nihil.recur) Input a function fn=>parser Return the parser with lazily called fn at the parsing time

creating time : time when creating parsers parsing time : time when using parsers to parse

nihil.recur

It is used to implement the grammar shown below:

M -> L M R
M -> I
/* I,L,R could be terminates */

That is, you can have a parser parse "LLLLLLIRRRRR" For convenience, you can input a function f to map result of recursive parser to the format you like

nihil.recur(L)(I)(R)(f=x=>x)

ADVANCED:some of L,I,R could be nihil For Example, you want to implement a loop parser parse "LLLL...L" ATTENTION: L is a syntax, it could be a sentence of LL(m), e.g. L=$a^nb^n$ See nihil.loop

nihil.loop

It is implemented with nihil.recur

nihil.loop = nihil.recur(L)(nihil)(nihil)()

nihil.sep

It is used for parsing "1 2 3" into [1,2,3] with seperator " "(SPACE)

nihil.sep(parse)(seperator)

nihil.nest

It is used in nihil.sep It map [value] to [[value]]

nihil.reverse

It is used in nihil.sep and nihil.parser.candy It reverse value from [$1,$2,...,$m] to [$m,$(m-1),...,$2,$1] REASON to reverse: the parsing result of nihil.and(parser1,parser2,...parserM) is [resultM,result(M-1),...,result1] which is reverse. When returning array of value, you'd better use nihil.reverse