Package Exports
- @digitak/grubber
- @digitak/grubber/Fragment.d.ts
- @digitak/grubber/Fragment.js
- @digitak/grubber/Language.d.ts
- @digitak/grubber/Language.js
- @digitak/grubber/Parser.d.ts
- @digitak/grubber/Parser.js
- @digitak/grubber/Rule.d.ts
- @digitak/grubber/Rule.js
- @digitak/grubber/index.d.ts
- @digitak/grubber/index.js
- @digitak/grubber/languages/c.d.ts
- @digitak/grubber/languages/c.js
- @digitak/grubber/languages/cpp.d.ts
- @digitak/grubber/languages/cpp.js
- @digitak/grubber/languages/css.d.ts
- @digitak/grubber/languages/css.js
- @digitak/grubber/languages/es.d.ts
- @digitak/grubber/languages/es.js
- @digitak/grubber/languages/index.d.ts
- @digitak/grubber/languages/index.js
- @digitak/grubber/languages/nim.d.ts
- @digitak/grubber/languages/nim.js
- @digitak/grubber/languages/py.d.ts
- @digitak/grubber/languages/py.js
- @digitak/grubber/languages/rs.d.ts
- @digitak/grubber/languages/rs.js
- @digitak/grubber/languages/sass.d.ts
- @digitak/grubber/languages/sass.js
- @digitak/grubber/languages/scss.d.ts
- @digitak/grubber/languages/scss.js
- @digitak/grubber/library/Fragment.d.ts
- @digitak/grubber/library/Fragment.js
- @digitak/grubber/library/Language.d.ts
- @digitak/grubber/library/Language.js
- @digitak/grubber/library/Parser.d.ts
- @digitak/grubber/library/Parser.js
- @digitak/grubber/library/Rule.d.ts
- @digitak/grubber/library/Rule.js
- @digitak/grubber/library/index.d.ts
- @digitak/grubber/library/index.js
- @digitak/grubber/library/languages/c.d.ts
- @digitak/grubber/library/languages/c.js
- @digitak/grubber/library/languages/cpp.d.ts
- @digitak/grubber/library/languages/cpp.js
- @digitak/grubber/library/languages/css.d.ts
- @digitak/grubber/library/languages/css.js
- @digitak/grubber/library/languages/es.d.ts
- @digitak/grubber/library/languages/es.js
- @digitak/grubber/library/languages/index.d.ts
- @digitak/grubber/library/languages/index.js
- @digitak/grubber/library/languages/nim.d.ts
- @digitak/grubber/library/languages/nim.js
- @digitak/grubber/library/languages/py.d.ts
- @digitak/grubber/library/languages/py.js
- @digitak/grubber/library/languages/rs.d.ts
- @digitak/grubber/library/languages/rs.js
- @digitak/grubber/library/languages/sass.d.ts
- @digitak/grubber/library/languages/sass.js
- @digitak/grubber/library/languages/scss.d.ts
- @digitak/grubber/library/languages/scss.js
- @digitak/grubber/library/utilities/addJsExtensions.d.ts
- @digitak/grubber/library/utilities/addJsExtensions.js
- @digitak/grubber/library/utilities/patchJsImports.d.ts
- @digitak/grubber/library/utilities/patchJsImports.js
- @digitak/grubber/library/utilities/resolveAliases.d.ts
- @digitak/grubber/library/utilities/resolveAliases.js
- @digitak/grubber/utilities/addJsExtensions.d.ts
- @digitak/grubber/utilities/addJsExtensions.js
- @digitak/grubber/utilities/patchJsImports.d.ts
- @digitak/grubber/utilities/patchJsImports.js
- @digitak/grubber/utilities/resolveAliases.d.ts
- @digitak/grubber/utilities/resolveAliases.js
Readme
Grubber is a lightweight and friendly utility to parse code with regular expressions in a 100% safe way - without having to use an AST 🐛
In a higher level, Grubber also exposes helper functions to parse the dependencies of a file in many languages (Javascript, Typescript, Css, Scss, Python, Rust, C / C++, Nim, ...).
How?
The problem with parsing a source file with regular expressions is that you cannot be sure your match is not commented or inside a string.
For example, let's say you are looking for all const
statements in a Javascript file - you would use a regular expression similar to:
/\bconst\s+/g
But what if the file you want to parse is something like:
const x = 12;
// const y = 13;
let z = "const ";
Then you would match three const
when only one should be matched.
Grubber understands what is a string, what is a comment and what is code so that you can overcome the issue very easily:
import { grub } from "@digitak/grubber";
const content = `
const x = 12
// const y = 13
let z = "const "
`;
const results = grub(content).find(/\bconst\s+/);
console.log(results.length); // will print 1 as expected
For the sake of the demonstration we used a simple regex, but remember that Ecmascript is a tricky language! Effectively finding all
const
statements would require a more refined regex. Ex:foo.const = 12
would be matched. Languages that use semi-colon at the end of every statement or strict indentation are much easier to parse in a 100% safe way.
Installation
Use your favorite package manager:
npm install @digitak/grubber
Grubber API
Grubber exports one main function grub
:
export function grub(
source: string,
languageOrRules: LanguageName | Rule[] = "es"
): {
// find one or more expressions and return an array of fragments
find: (...expressions: Array<string | RegExp>) => Fragment[],
// replace one or more expressions and return the patched string
replace: (...fromTos: Array<{
from: string | RegExp,
to: string | RegExp
}>) => string,
// find all dependencies (ex: `imports` in Typescript, `use` in Rust)
findDependencies: () => Fragment[],
// replace all dependencies by the given value
// you can use special replace patterns like "$1" to replace
// with the first captured group
replaceDependencies: (to: string) => string,
}
The find
and findDependencies
methods both return an array of fragments:
export type Fragment = {
slice: string // the matched substring
start: number // start of the matched substring
end: number // end of the matched substring
groups: string[] = [] // the captured groups
};
Using grubber with one of the preset languages
You can use any of the preset languages:
export type LanguageName =
| 'es' // Ecmascript (Javascript / Typescript / Haxe): the default
| 'rs' // Rust
| 'css'
| 'scss'
| 'sass'
| 'c'
| 'cpp'
| 'py' // Python
| 'nim'
;
Example:
// find all semi-colons inside the rust source code
grub(rustCodeToParse, "rs").find(";");
Using grubber with custom rules
You may define custom rules for the grubber parser, ie. what should be ignored an treated as "not code".
A Rule
has the following type:
export type Rule =
| {
expression: string | RegExp // the expression to ignore
// if returns false, the match is ignored
onExpressionMatch?: (match: RegExpMatchArray) => boolean | void
}
| {
startAt: string | RegExp // start of the expression to ignore
stopAt: string | RegExp // stop of the expression to ignore
// if returns false, the match is ignored
onStartMatch?: (match: RegExpMatchArray) => boolean | void
onStopMatch?: (match: RegExpMatchArray) => boolean | void
}
;
For example, the rules used for the C
language are:
const rules: Rule[] = [
{
// string
expression: /".*?[^\\](?:\\\\)*"/,
},
{
// single line comment
expression: /\/\/.*/,
},
{
// multiline comment
expression: /\/\*((?:.|\s)*?)\*\//,
},
];
Rules are quite simple for most languages but get complicated for Ecmascript because of the
${...}
syntax. Hopefully the job is already done for you!
🌿 🐛 🌿