Package Exports
- nihil
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (nihil) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
nihil
Info
Author = Mepy Github : Language = Engilish, mainly Blog : Language = Simplified Chinese(简体中文)
Intro
WARNING : a TOY project current, do NOT use in production
nihil is a parser combinator library for Javascript(ES2015+)
nihil is null, nothingness, etc, while nihil is useful and elegant
NPM
nihil is a lot inspired by bnb
But written in the style of FP(Functional Programming), mainly, I thought
details in ## Reference
nihil has zero dependencies and is about 5.5KB(exactly 5646Bytes) raw Javascript
As for minifying and zipping, I haven't tried it.
nihil source code uses some features of ES2015(or ES6),
such as destructuring assignment, arrow function
So you'd better use modern browser like Chrome, Firefox, etc.
nihil library contains only, maybe you can consider, one object nihil
So it of course supports both Browser and NodeJS
nihil is elegant and easy to use, try it? -> Read ## Tutorial
Reference
nihil is a lot inspired by bnb
At first, I used bnb for another toy project
However, I met some troubles :
- Creating a recursive parser is a little hard
- Assign bnb.parser.parse to a variable will throw an error
Because
bnbimplement parser with class Assign instance.parse to a variable will losethispointer I decide to implement my own one, and I read the source code ofbnbButnihilis totally new written, I thought it should be NOT a derivative ofbnb
The similarity between nihil and bnb might only be
bnb.match(RegExp) <=> nihil(RegExp)
and chain method .and, .or, etc.
Currently, I send an email to the author of bnb to talk about it
Tutorial
RegExp
You can use nihil(re) to create parsers
and parse raw string with method .candy
nihil(/a+/)
.candy("aaaa")//{ value: [ 'aaaa' ] }and : Sequential Parser
You can use nihil.and(re,...) to create a sequential parser
nihil.and(/a*/,/b*/,/c*/)
.candy("bbbbcccc")//{ value: [ '', 'bbbb', 'cccc' ] }If exists a parser k, you can use k.and(parser,...)
const k = nihil(/k+/)
const g = nihil(/g+/)
k.and(g,k)
.candy("kkkggggkkkk")//{ value: [ 'kkk', 'gggg', 'kkkk' ] }But It would trouble you if the second parser is a simple parser generated by nihil
Well, you can simply enter RegExps in place of simple parsers
k.and(/g+/,k)
.candy("kkkggggkkkk")//{ value: [ 'kkk', 'gggg', 'kkkk' ] }or : Choose Parsers
If you want to parse "a" or "b", You can of course use RegExp /a|b/
But if you want to parse [one complex language] or [another complex language],
You can also do it with a complex RegExp which might bring errors,
but using nihil.or, you can implement the same function as /a|b/
with well-understanding code style
const a_b = nihil.or(/a/,/b/)
a_b.candy("a")//{ value: [ 'a' ] }
a_b.candy("b")//{ value: [ 'b' ] }.or is similar to .and, so you can also do like these:
const a = nihil(/a/)
const b = nihil(/b/)
const c = nihil(/c/)
a.or(b,c).candy("c")//{ value: [ 'c' ] }
a.or(b,/c/).candy("c")//{ value: [ 'c' ] }ATTENTION : The order of parser makes difference The combined parser will try to parse using from left parser to right parser
nihil.or(/a/,/a+/).candy("aaa")//{ error: { expected: '<eof>', location: 1 } }
nihil.or(/a+/,/a/).candy("aaa")//{ value: [ 'aaa' ] }Well, maybe you are considering it is just another style to write RegExp
and doubting whether it is necessary to learn .or method.
What if the complex language is not a RE language but a LL(1) language?
You can not implement it with RegExp!!!
Read the following tutorial, you can create a parser of LL(m)
(m as big as you want and if your computer can compute it in time)
keep : Select Parsers
.keep method is used for selecting parsers according former value
With this, you can implement a LL(m) parser.
For example, maybe improper, we implement a LL(1) parser
which accepts a string like 'aA', 'bB', ..., 'zZ'
const parser = nihil(/[a-z]/)
.keep(([$1])=>nihil(RegExp($1.toUpperCase())))
.candy
parser("nN")//{ value: [ 'n', 'N' ] }
parser("iJ")//{ error: [ { expected: '/I/', location: 1 } ] }ATTENTION : in .keep method, you must give a function
which returns a parser but NOT a RegExp,
different from .and and .or
As for LL(m), the format is like this, simply m = 2
const char4 = nihil.and(/./,/./,/./,/./)
//result of char4 is an array with a length of 4 > m = 2
const parser = char4.keep(([$1,$2])=>{
if($1=='a'){return nihil(/r/)}
else
{
if($2=='b'){return nihil(/s/)}
else{return nihil(/t/)}
}
}).candy
//parser accepts /...ar/, /..b[^a]s/ or /..[^b][^a]t/
parser("wxyar")//{ value: [ 'w', 'x', 'y', 'a', 'r' ] }
parser("wxbzs")//{ value: [ 'w', 'x', 'b', 'z', 's' ] }
parser("wxyzt")//{ value: [ 'w', 'x', 'y', 'z', 't' ] }ATTENTION : $1 is the left first of current parser
"[$4][$3][$2][$1]" is the correct order for raw string
REASON : Why name this method after 'keep'?
Because this method KEEP the former value (e.g. result of char4)
Can we choose not to keep it?
Of course, use .drop instead of .keep
But it is directly used in a low frequency, in my view.
Yes, we indirectly call method .drop by .map
map : Transform Value
Until now, value of result of parsers are all strings
Sometimes, we need to transform them, e.g. transform '3' to 3
nihil(/[1-9][0-9]*/)//string of number, decimal
.map(([$1])=>[Number($1)])
.candy("3")//{ value: [ 3 ] }Is it a little similar to .keep?
Yes, it is calling method .drop to drop ['3'] and push [3]
ATTENTION:fn, in .map(fn), must return an ARRAY of values
.map could do other things, like verifying the value:
For example, we decide to reject 114514
const num = nihil(/[1-9][0-9]*/)
.map(([$1])=>[Number($1)])
.map(([$1])=>{
if($1==114514)
{
return []//or return undefined or return;
}
else{return [$1]}
})
num.and(/ /,num)
.candy("114514 3")////{ value: [ ' ', 3 ] } where ' ' is the result of / /However, you might realize that the parser, in fact, accept 114514 but drop it.
What if you want throw an error when meeting 114514? Use .drop
drop : Intercept Value (Optional)
Except that .drop drops the former parser's result,
it is nearly the same as .keep
What needs to be emphasized is
that fn in .drop(fn) MUST return a parser
NEITHER RegExp, NOR array of values in .map
nihil.box(obj) will return a parser
which does nothing but return obj,
so called a box (a parser containing result)
You might feels it useful, right?
const num = nihil(/[1-9][0-9]*/)
.map(([$1])=>[Number($1)])
.drop(([$1])=>{
if($1==114514)
{
return nihil.box({error:[{reason:"Found 114514"}]})
//error : [must be an array]
}
else
{
return nihil.box({value:[$1]})
//value : [must be an array]
}
})
const foo = num.and(/ /,num).candy
foo("114514 3")//{ error: { reason: 'Found 114514' } }
foo("10492 3")//{ value: [ 10492, ' ', 3 ] }However, .drop could not throw an error with its location
Because .keep and .drop use fn as selector in such format :
//in .keep, .drop
...
const a = A(source)//using parser A to parse source into result a
const B = selector(a.value)//using value of result to generate parser B
...Location throwing is of course essential, but currently not supported. TODO: next version would support it, as well as pretty location {index,line,column}
sep : Parsing With Seperator
In the example of .map and .drop,
you see an ugly value ' ' of result of RegExp / /
Can we drop it? Using .map or .drop is bothersome
Use .sep! See the following exmaple:
const num = nihil(/[1-9][0-9]*/)
.map(([$1])=>[Number($1)])
.sep(/ /)
.candy
num("2333")//{ value: [ [ 2333 ] ] }
num("34 15 27")//{ value: [ [ 34, 15, 27 ] ] }ATTENTION : value of .sep would be an array of array (a second [])
candy&nihil : Some Explains
Now, you might use .candy for many times
But you can't use nihil.candy because nihil has no .candy
nihil(RegExp) return a TRUE parser, thus having .candy
.candy can be assigned to a variable,
but the variable is not a parser thus having no chain-style .and, .or, etc
.candy can only do parsing
nihil, as its name, is a nihil (NULL parser, PSEUDO parser),
a so special parser that nihil has many useful functions, like .and, .or, etc
Anyway, nihil is also a parser though special
You can use it to do nothing... Wait, have you felt struggle to handle?
See an example:
const x_y = nihil.or(/x+/,/y+/)
x_y.sep(/ /).candy("xxxyyy") //{ value: [ [ 'xxx', 'yyy' ] ] }
x_y.sep(nihil).candy("xxxyyy")//{ value: [ [ 'xxx', 'yyy' ] ] }Though "xxxyyy" have no space between "xxx" and "yyy",
.sep parser seperate them automatically!
It acts like .sep(nihil), RegExp / / do nothing as nihil
As a matter of fact, we implement .sep with the help of nihil
And you might realize that .sep parser would parse undefinite times and return an array
Right, we implement .sep with the help of .loop which actually use nihil
But don't hurry to search .loop, let's learn .recur first.
recur : Recursively Parsing
Recall a recursive implement of Fact, Function Fact calls itself
const Fact = (n)=>{
if(n==0){return 1;}
else{return n*Fact(n-1)}
}
Fact(4)//4*3*2*1=24Our parser could call itself to parse string like this "((a))"
A universal recursive parse have 4 main part, L, I, R and fn
To parse "((a))" into [["a"]], we let L = /(/, I=/a/, R=/)/ and fn = ([L,I,R])=>([[I]])
See example:
const a_recur = nihil(/a/)
.recur(/\(/,/\)/,([L,I,R])=>([[I]]))
a_recur.candy("((a))")//{value:[[["a"]]]}Wait, why there are TRIPLE [] ? Well, remember, the value MUST be a ARRAY []
a_recur.sep(nihil).candy("((a))(a)")//{value:[[[["a"]],["a"]]]}See, [["a"]] is the actual result.
loop : Looping Parse?
See example
nihil(/x/).loop().candy("xxxx")//{ value: [ 'x', 'x', 'x', 'x' ] }ADVANCED : If familiar with recursion, you would quickly comprehend: loop is a subset of recursion We write recursive grammar like this:
M -> L M R | IIf we let I = R = nihil, (also fn=x=>x,i.e fn(x)=x, which is default)
then we could get a loop parser
But different from I.recur(L,R,fn), because nihil is a special parser
We use nihil.recur(L)(nihil)(nihil)() to implement L.loop()
You can see API.nihil.loop and source code for more understanding
WARNING : We use a few FP(Functional Programming) styles which might trouble you
API
nihil
nihil(parser A)=>parser A nihil(source)=>nihil.nihil nihil(RegExp)=>parser(RE)
nihil.parser
It is the helper function of nihil
If you are a new hand of Javascript but familiar with OOP(Object Oriented Programming),
you can consider it as class parser
As a matter of fact, its real function is .candy
for easily parse string and return proper result
As for other functions, they are all for convenience of creating parsers
in the grammar of chain calling
You can intuitively feels it in Tutorial.
result format
nihil.nihil is used for loop, result={nihil:true,error:[...]}
doesn't mean error occurs
nihil.nihil = {nihil:true}
nihil.eof = {eof:true}
result=
{
value:[],
error:[{expected,location}],
nihil:true||undefined//true->return by nihil(source)
eof:true||undefined//true->reach end of file/source
}nihil.and
input array of RegExp or parser, return a parser which parses source sequentially in the order of array
nihil.or
input array of RegExp or parser, return a parser which tries to parse source from array[0] to array[array.length-1]:=end. If array[k] succeeds, array[k+1],...,end wouldn't be used to parse source.
nihil.keep
It helps implement the parser of LL($\infty$)
For example, we want to implement a LL(m) parser,
We first define a function fn
with an array of values [$1,$2,...,$m], return by parse, as arguments
and a parser as return value, according to [$1,$2,...,$m].
var fn = ([$1,$2,...,$m])=>parser
var LLm = nihil.keep(parse)(fn)nihil.drop
It is similar to nihil.keep with difference
that it would DROP the values parsed out by parse
It is used for nihil.map
nihil.box
It is to create a parser with value in it and do nothing to ```source``
nihil.box = value=>nihil.parser(source=>value)nihil.map
Used to map the value of parse to the format you like
nihil.map(parse)(mapping)nihil.lazy
Inspired by bnb,bread-n-butter It is used to generate recursive parser(see source code of nihil.recur) Input a function fn=>parser Return the parser with lazily called fn at the parsing time
creating time : time when creating parsers parsing time : time when using parsers to parse
nihil.recur
It is used to implement the grammar shown below:
M -> L M R
M -> I
/* I,L,R could be terminates */That is, you can have a parser parse "LLLLLLIRRRRR"
For convenience, you can input a function f
to map result of recursive parser to the format you like
nihil.recur(L)(I)(R)(f=x=>x)ADVANCED:some of L,I,R could be nihil
For Example, you want to implement a loop parser parse "LLLL...L"
ATTENTION: L is a syntax, it could be a sentence of LL(m), e.g. L=$a^nb^n$
See nihil.loop
nihil.loop
It is implemented with nihil.recur
nihil.loop = nihil.recur(L)(nihil)(nihil)()nihil.sep
It is used for parsing "1 2 3" into [1,2,3] with seperator " "(SPACE)
nihil.sep(parse)(seperator)nihil.nest
It is used in nihil.sep
It map [value] to [[value]]
nihil.reverse
It is used in nihil.sep and nihil.parser.candy
It reverse value from [$1,$2,...,$m] to [$m,$(m-1),...,$2,$1]
REASON to reverse:
the parsing result of nihil.and(parser1,parser2,...parserM)
is [resultM,result(M-1),...,result1] which is reverse.
When returning array of value, you'd better use nihil.reverse