JSPM

pascal-tokenizer

0.0.3
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 1
  • Score
    100M100P100Q28759F
  • License MIT

A tokenizer for Pascal programming language

Package Exports

  • pascal-tokenizer
  • pascal-tokenizer/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (pascal-tokenizer) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

Pascal Tokenizer

A simple and efficient tokenizer for the Pascal programming language, written in TypeScript.

Installation

Install the package using npm:

npm install pascal-tokenizer

Usage

Basic Usage

You can import the tokenizePascal function and optionally the PascalToken and TokenType types

Input example:

import { tokenizePascal, PascalToken, TokenType } from "pascal-tokenizer";

const pascalCode = `
program HelloWorld;
begin
  writeln('Hello, World!'); // Example comment
end.
`;

// Tokenize the code (comments skipped by default)
const tokens: PascalToken[] = tokenizePascal(pascalCode);
console.log(tokens);

// Example: Find the first keyword token
const firstKeyword = tokens.find((token) => token.type === "KEYWORD"); // You can use the imported TokenType for type checking

// Tokenize including comments
const tokensWithComments = tokenizePascal(pascalCode, false);
console.log(tokensWithComments);

Default output:

[
  { type: "KEYWORD", value: "program" },
  { type: "IDENTIFIER", value: "HelloWorld" },
  { type: "DELIMITER_SEMICOLON", value: ";" },
  { type: "KEYWORD", value: "begin" },
  { type: "IDENTIFIER", value: "writeln" },
  { type: "DELIMITER_LPAREN", value: "(" },
  { type: "STRING_LITERAL", value: "Hello, World!" },
  { type: "DELIMITER_RPAREN", value: ")" },
  { type: "DELIMITER_SEMICOLON", value: ";" },
  { type: "KEYWORD", value: "end" },
  { type: "DELIMITER_DOT", value: "." },
  { type: "EOF", value: "" },
];

Output with comments:

[
  { type: "KEYWORD", value: "program" },
  { type: "IDENTIFIER", value: "HelloWorld" },
  { type: "DELIMITER_SEMICOLON", value: ";" },
  { type: "KEYWORD", value: "begin" },
  { type: "IDENTIFIER", value: "writeln" },
  { type: "DELIMITER_LPAREN", value: "(" },
  { type: "STRING_LITERAL", value: "Hello, World!" },
  { type: "DELIMITER_RPAREN", value: ")" },
  { type: "DELIMITER_SEMICOLON", value: ";" },
  { type: "COMMENT_LINE", value: "// Example comment" },
  { type: "KEYWORD", value: "end" },
  { type: "DELIMITER_DOT", value: "." },
  { type: "EOF", value: "" },
];

Including Comments

By default, comments are skipped. If you want to include comments in the token stream:

const tokens = tokenizePascal(pascalCode, false); // false = don't skip comments

Token Structure

Each token has the following structure:

interface PascalToken {
  type: TokenType;
  value: string;
}

Token Types

The tokenizer recognizes the following token types:

Token Type (TokenType) Symbol / Example in Pascal Description
Comments
COMMENT_BLOCK_BRACE { ... } Comment delimited by braces
COMMENT_STAR (* ... *) Comment delimited by parenthesis/asterisk
COMMENT_LINE // ... Single-line comment (Delphi/FPC style)
Operators
OPERATOR_ASSIGN := Assignment
OPERATOR_LESS_EQUAL <= Less than or equal to
OPERATOR_GREATER_EQUAL >= Greater than or equal to
OPERATOR_NOT_EQUAL <> Not equal to
OPERATOR_RANGE .. Range (used in subranges, case)
OPERATOR_PLUS + Addition
OPERATOR_MINUS - Subtraction
OPERATOR_MULTIPLY * Multiplication
OPERATOR_DIVIDE / Division (real)
OPERATOR_EQUAL = Equality (comparison)
OPERATOR_LESS < Less than
OPERATOR_GREATER > Greater than
OPERATOR_POINTER ^ Pointer dereference / Pointer type
OPERATOR_ADDRESSOF @ Memory address (@ operator)
Delimiters
DELIMITER_SEMICOLON ; Semicolon
DELIMITER_COMMA , Comma
DELIMITER_DOT . Dot (end of program, record access)
DELIMITER_LPAREN ( Left parenthesis
DELIMITER_RPAREN ) Right parenthesis
DELIMITER_LBRACKET [ Left bracket (arrays)
DELIMITER_RBRACKET ] Right bracket (arrays)
DELIMITER_COLON : Colon (type declaration, labels)
Literals
STRING_LITERAL 'Hello', 'A' String literal (without quotes in value)
BOOLEAN_LITERAL true, false Boolean value (case-insensitive)
NUMBER_REAL 1.23, -0.5, .5 Number with decimal part
NUMBER_INTEGER 100, 0, -42 Integer number
Others
KEYWORD (See list below) Language reserved word
IDENTIFIER myVariable, Counter, TMyObject User-defined name
EOF End Of File marker (internal)
WHITESPACE Can be used for external consumption Unused.

Keywords (KEYWORD) Currently Recognized (according to tokenizer.ts):

  • program
  • const
  • var
  • begin
  • end
  • if
  • then
  • else
  • while
  • do
  • for
  • to
  • downto
  • repeat
  • until
  • case
  • of
  • record
  • array
  • procedure
  • function
  • type
  • integer
  • real
  • char
  • string
  • boolean
  • uses
  • label
  • goto

(Note: true and false are specifically handled as BOOLEAN_LITERAL in your code, even though they might be in an initial keyword list).

Considerations

Please note that if you have writeln('Hello, World!'), the resulting token will be { type: "STRING_LITERAL", value: "Hello, World!" }. The single quotes (') will not be included directly in the value. This is intentional, so that you can render it on the screen as desired later on.

License

MIT