Package Exports
- @luciformresearch/xmlparser
- @luciformresearch/xmlparser/diagnostics
- @luciformresearch/xmlparser/document
- @luciformresearch/xmlparser/migration
- @luciformresearch/xmlparser/sax
- @luciformresearch/xmlparser/scanner
- @luciformresearch/xmlparser/types
Readme
LR XMLParser — Modular, robust and safe XML parser
High‑performance XML parser designed for modern AI pipelines. LR XMLParser is optimized for LLM‑generated XML (permissive mode with error recovery) while remaining strict, traceable, and secure for production workloads.
Project by LuciformResearch (Lucie Defraiteur).
— Français: see README.fr.md
Getting started (npm)
Install:
npm install @luciformresearch/xmlparserpnpm add @luciformresearch/xmlparser
Examples (ESM and CommonJS):
// ESM import { LuciformXMLParser } from '@luciformresearch/xmlparser'; const result = new LuciformXMLParser(xml, { mode: 'luciform-permissive' }).parse();
// CommonJS const { LuciformXMLParser } = require('@luciformresearch/xmlparser'); const result = new LuciformXMLParser(xml, { mode: 'luciform-permissive' }).parse();
Subpath exports (optional):
@luciformresearch/xmlparser/document,.../scanner,.../diagnostics,.../types,.../migration.
License
MIT with reinforced attribution. See LICENSE for terms, attribution obligations, and allowed uses.
Overview
LR XMLParser follows a modular architecture (scanner → parser → models → diagnostics) focused on clarity, testability, and performance.
Key use cases
- Structured LLM responses ("luciform‑permissive" mode to tolerate and recover from common LLM formatting issues).
- General XML parsing with precise diagnostics (line/column) and configurable limits.
- Integration in AI pipelines (LR HMM) and larger systems (LR Hub).
Example within a hierarchical memory engine:
const parser = new LuciformXMLParser(xml, {
mode: 'luciform-permissive',
maxTextLength: 100_000,
});
const result = parser.parse();
if (result.success) {
const summary = result.document?.findElement('summary')?.getText();
}Code structure
lr_xmlparser/
├── index.ts # Main parser (public API)
├── scanner.ts # Stateful tokenizer
├── document.ts # XML models (Document/Element/Node)
├── diagnostics.ts # Diagnostics (codes, messages, suggestions)
├── migration.ts # Compatibility layer (legacy → new)
├── types.ts # Shared types and interfaces
└── test-integration.tsWhy LR XMLParser
- Performance: fast on practical workloads (see
test-integration.ts). - Maintainability: focused modules with clear separation of concerns.
- Testability: isolated components, validated integration, easier debugging.
- Reusability: standalone scanner, extensible diagnostics, independent models.
- LLM‑oriented: permissive mode, error recovery, CDATA handling, format tolerance.
Edge cases covered
- Attributes and self-closing tags (
<child a="1" b="two"/>) - Unclosed comments/CDATA: permissive mode recovers and logs diagnostics
- Mismatched tags: errors with precise codes and locations
- Limits:
maxDepth,maxTextLength,maxPILength - Processing instructions and DOCTYPE handling
- BOM + whitespace tolerance
- Namespaces:
xmlns/xmlns:prefixmapping, unbound prefix diagnostics
Express API
export class LuciformXMLParser {
constructor(content: string, options?: ParserOptions);
parse(): ParseResult;
}Options include security and performance limits (depth, text length, entity expansion), plus mode: strict | permissive | luciform-permissive.
Namespace-aware queries:
// Given <root xmlns:foo="urn:foo"><foo:item/></root>
const item = result.document?.findByNS('urn:foo', 'item');
const items = result.document?.findAllByNS('urn:foo', 'item');SAX/streaming (large inputs):
import { LuciformSAX } from '@luciformresearch/xmlparser/sax';
new LuciformSAX(xml, {
onStartElement: (name, attrs) => { /* ... */ },
onEndElement: (name) => {},
onText: (text) => {},
}).run();Namespaces
- Default namespace applies to elements, not attributes.
- Prefixed names (e.g.,
foo:bar) require a boundxmlns:fooin scope. - Reserved:
xmlnsprefix/name;xmlmust map tohttp://www.w3.org/XML/1998/namespace. - Use
findByNS(nsUri, local)/findAllByNSfor ns-aware traversal.
Error handling
- Inspect
result.diagnosticsfor structured issues (code, message, suggestion, location). result.successis false when errors are present; permissive mode may still return a usabledocument.- Typical codes:
UNCLOSED_TAG,MISMATCHED_TAG,INVALID_COMMENT,INVALID_CDATA,MAX_DEPTH_EXCEEDED,MAX_TEXT_LENGTH_EXCEEDED. - Recovery cap: set
maxRecoveriesto cap automatic fixes in permissive modes. When the cap is exceeded, the parser stops further scanning, addsRECOVERY_ATTEMPTEDandPARTIAL_PARSEinfo diagnostics, and returns a partial document. Seeresult.recoveryReportfor{ attempts, capped }.
Testing and validation
npx tsx test-integration.tsValidated internally on:
- Valid simple XML
- Malformed XML (permissive mode)
- Complex XML with CDATA and comments
- Performance and limits
- Compatibility wrapper available
Links and integrations
- GitLab (source): https://gitlab.com/luciformresearch/lr_xmlparser
- GitHub mirror: https://github.com/LuciformResearch/LR_XMLParser
- Used by:
- LR HMM (L1/L2 memory compression, "xmlEngine")
- LR Hub (origin/base): https://gitlab.com/luciformresearch/lr_chat
Contributing
PRs welcome.
- Fork → feature branch → MR/PR
- Keep modules focused; avoid unnecessary deps
- Add tests for affected modules
Support
- Issues: open on GitLab
- Questions: GitLab discussions or direct contact
- Contact: luciedefraiteur@luciformresearch.com
—