JSPM

  • Created
  • Published
  • Downloads 15
  • Score
    100M100P100Q53553F
  • License ISC

A modular, open source library for converting HTML content into professional document formats. Initially focused on HTML-to-DOCX conversion, with planned support for PDF and XLSX. Built with TypeScript, it features a core HTML parsing engine and separate format-specific modules, offering a unified API for seamless integration.

Package Exports

  • html-to-document
  • html-to-document/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (html-to-document) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

npm version Build Status License: ISC

html‑to‑document

Convert any HTML into production‑ready documents — DOCX today, PDF/XLSX.

html‑to‑document parses HTML into an intermediate, format‑agnostic tree and then feeds that tree to adapters (e.g. DOCX, PDF).
Write HTML → get Word, PDFs, spreadsheets, and more — all with one unified TypeScript API.


How It Works

Below is a high-level overview of the conversion pipeline. The library processes the HTML input through optional middleware steps, parses it into a structured intermediate representation, and then delegates to an adapter to generate the desired output format.

Conversion Pipeline Diagram

The stages are:

  • Input: Raw HTML input as a string.
  • Middleware: One or more middleware functions can inspect or transform the HTML string before parsing (e.g., sanitization, custom tags).
  • Parser: Converts the (possibly modified) HTML string into an array of DocumentElement objects, representing a structured AST.
  • Adapter: Takes the parsed DocumentElement[] and renders it into the target format (e.g., DOCX, PDF, Markdown) via a registered adapter.

✨ Key Features

Feature Description
Format‑agnostic core Converts HTML into a reusable DocumentElement[] structure
DOCX adapter (built‑in) Powered by docx with rich style support
Pluggable adapters Create and add your own adapter for PDF, XLSX, Markdown, etc.
Style mapping engine Define your own css mappings for the adapters and set per‑format defaults
Custom tag handlers Override or extend how any HTML tag is parsed
Middleware pipeline Transform or sanitise HTML before parsing

📦 Installation

npm install html-to-document

🚀 Quick Start

import { init, DocxAdapter } from 'html-to-document';
import fs from 'fs';

const converter = init({
  adapters: {
    register: [
      { format: 'docx', adapter: DocxAdapter },
    ],
  },
});

const html = '<h1>Hello World</h1>';
const buffer = await converter.convert(html, 'docx');   // ↩️ Buffer in Node / Blob in browser
fs.writeFileSync('output.docx', buffer);

Registering adapters manually

import { init } from 'html-to-document';
import { DocxAdapter } from 'html-to-document-adapter-docx';

const converter = init({
  adapters: {
    register: [
      { format: 'docx', adapter: DocxAdapter },
    ],
  },
});

Tip: you can bundle multiple adapters:

register: [
  { format: 'docx', adapter: DocxAdapter },
  { format: 'pdf',  adapter: PdfAdapter },
]

The rest of the API stays the same—convert(html, 'docx'), convert(html, 'pdf'), etc.

Need just the parsed structure?

const elements = await converter.parse('<p>Some HTML</p>');
console.log(elements); // => DocumentElement[]

📚 Documentation & Demo

Resource Link
Full Docs https://html-to-document.vercel.app/
Live Demo (TinyMCE) https://html-to-document-demo.vercel.app

🛠 Extending

  • Style mappings: fine‑tune CSS → DOCX/PDF with StyleMapper
  • Tag handlers: intercept <custom-tag> → your own DocumentElement
  • Custom adapters: implement IDocumentConverter to target new formats

See the Extensibility Guide.


🧑‍💻 Contributing

Contributions are welcome!
Please read CONTRIBUTING.md and follow the Code of Conduct.


📝 Changelog

All notable changes are documented in CHANGELOG.md.


📄 License

ISC — a permissive, MIT‑style license that allows free use, modification, and distribution without requiring permission.