JSPM

  • Created
  • Published
  • Downloads 316
  • Score
    100M100P100Q84876F
  • License MIT Style License

Syntactic analysis toolkit for education, tracing, and parsers generation

Package Exports

  • syntax-cli
  • syntax-cli/dist/special-symbols.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (syntax-cli) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

syntax

Syntactic analysis toolkit for education, tracing the parsing process, and parsers generation.

Implements LR and LL parsing algorithms.

See also LL(1) parser repo (will be merged here).

Installation

The tool can be installed as an NPM module (notice, it's called syntax-cli there):

npm install -g syntax-cli

syntax-cli --help

Or for development, from the github repository. Run build command to transpile ES6 code:

git clone https://github.com/DmitrySoshnikov/syntax.git
cd syntax
npm install
npm run build

./bin/syntax --help

For development one can also use the watch command:

npm run watch

CLI usage example:

./bin/syntax --grammar examples/grammar.lr0 --parse "aabb" --mode lr0 --table --collection

Parser generation

To generate a parser module, specify the --output option, which is a path to the output parser file. Once generated, the module can normally be required, and used for parsing strings based on a given grammar.

Example for the JSON grammar:

./bin/syntax --grammar examples/json.ast.js --mode SLR1 --output json-parser.js

✓ Successfully generated: json-parser.js

Loading as a JS module:

const JSONParser = require('./json-parser');

let value = JSONParser.parse('{"x": 10, "y": [1, 2]}');

console.log(value); // JS object: {x: 10, y: [1, 2]}

Using custom tokenizer

NOTE: built-in tokenizer uses underlying regexp implementation to extract stream of tokens.

It is possible to provide a custom tokenizer if a built-in isn't sufficient. For this pass the --custom-tokenizer option, which is a path to a file that implements a tokenizer. In this case the built-in tokenizer code won't be generated.

./bin/syntax --grammar examples/json.ast.js --mode SLR1 --output json-parser.js --custom-tokenizer './my-tokenizer.js'

✓ Successfully generated: json-parser.js

In the generated code, the custom tokenizer is just required as a module: require('./my-tokenizer.js').

The tokenizer should implement the following iterator-like interface:

  • initString: initializes a parsing string;
  • hasMoreTokens: whether stream of tokens still has more tokens to consume;
  • getNextToken: returns next token in the format {type, value};

For example:

// File: ./my-tokenizer.js

const MyTokenizer = {

  initString(string) {
    this._string = string;
    this._cursor = 0;
  },

  hasMoreTokens() {
    return this._cursor < this._string.length;
  },

  getNextToken() {
    // Implement logic here.

    return {
      type: <<TOKEN_TYPE>>,
      value: <<TOKEN_VALUE>>,
    }
  },
};

module.exports = MyTokenizer;

Parsing modes

Syntax supports several LR parsing modes: LR(0), SLR(1), LALR(1), CLR(1) as well LL(1) mode. The same grammar can be analyzed in different modes, from the CLI it's controlled via the --mode option, e.g. --mode slr1.

Note: de facto standard for automatically generated parsers is usually the LALR(1) parser. The CLR(1) parser, being the most powerful, and able to parse wider grammar sets, can have much more states than LALR(1), and usually is suitable for educational purposes. As well as its less powerful counterparts, LR(0) and SLR(1) which are less used on practice (although, some production-ready grammars can also normally be parsed by SLR(1), e.g. JSON grammar).

Some grammars can be handled by one mode, but not by another. In this case a conflict will be shown in the table.

LR conflicts

In LR parsing there are two main types of conflicts: "shift-reduce" (s/r) conflict, and "reduce-reduce" (r/r) conflict. Taking as an example grammar from examples/example1.slr1, we see that the parsing table is normally constructed for SLR(1) mode, but has a "shift-reduce" conflict if ran in the LR(0) mode:

./bin/syntax --grammar examples/example1.slr1 --table
./bin/syntax --grammar examples/example1.slr1 --table --mode lr0

sl1-grammar sl1-grammar-lr0-m

Conflicts resolution

Sometimes changing parsing mode is not enough for fixing conflicts: for some grammars conflicts may stay and in the LALR(1), and even the CLR(1) modes. LR conflicts can be resolved though automatically and semi-automatically by specifying precedence and associativity of operators.

For example, the following grammar has a shift-reduce conflict:

%token id

%%

E
  : E '+' E
  | E '*' E
  | id
  ;

Therefore, a parsing is not possible using this grammar:

./bin/syntax -g examples/calculator-assoc-conflict.g -m lalr1 -w -p 'id * id + id'

Parsing mode: LALR(1).

Parsing: id * id + id

Rejected: Parse error: Found "shift-reduce" conflict at state 6, terminal '+'.

This can be fixed though using operators associativity and precedence:

%token id

%left '+'
%left '*'

%%

E
  : E '+' E
  | E '*' E
  | id
  ;

See detailed description of the conflicts resolution algorithm in this example grammar, which is can be parsed normally:

./bin/syntax -g examples/calculator-assoc.g -m lalr1 -w -p 'id * id + id'

Parsing mode: LALR(1).

Parsing: id * id + id

✓ Accepted
Module include, and parser events

The moduleInclude directive allows injecting an arbitrary code to the generated parser file. This is usually code to require needed dependencies, or to define them inline. As an example, see the corresponding example grammar, which defines all classes for AST nodes inline, and then uses them in the rule handlers.

In addition, two parser events: onParseBegin and onParseEnd allow injecting a code which is executed when the parsing starts, and ends respectively.

"moduleInclude": `
  class Node {
    constructor(type) {
      this.type = type;
    }
  }

  class BinaryExpression extends Node {
    ...
  }
`,

"onParseBegin": `
  console.log('Parsing code:', $1);
`

"onParseEnd": `
  console.log('Parsed value:', $1)
`,

...

["E + E",  "$$ = new BinaryExpression($1, $3, $2)"],