JSPM

  • Created
  • Published
  • Downloads 605
  • Score
    100M100P100Q89785F
  • License MIT

High-performance HTML to Markdown converter - WebAssembly bindings

Package Exports

    This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (html-to-markdown-wasm) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

    Readme

    html-to-markdown-wasm

    Universal HTML to Markdown converter using WebAssembly.

    Powered by the same Rust engine as the Node.js, Python, and Ruby bindings, so Markdown output stays identical regardless of runtime.

    Runs anywhere: Node.js, Deno, Bun, browsers, and edge runtimes.

    Crates.io npm version PyPI version Gem Version License: MIT

    Performance

    Universal WebAssembly bindings with excellent performance across all JavaScript runtimes.

    Benchmark Results (Apple M4)

    Document Type ops/sec Notes
    Small (5 paragraphs) 70,300 Simple documents
    Medium (25 paragraphs) 15,282 Nested formatting
    Large (100 paragraphs) 3,836 Complex structures
    Tables (20 tables) 3,748 Table processing
    Lists (500 items) 1,391 Nested lists
    Wikipedia (129KB) 1,022 Real-world content
    Wikipedia (653KB) 147 Large documents

    Average: ~15,536 ops/sec across varied workloads.

    Comparison

    • vs Native NAPI: ~1.17× slower (WASM has minimal overhead)
    • vs Python: ~6.3× faster (no FFI overhead)
    • Best for: Universal deployment (browsers, Deno, edge runtimes, cross-platform apps)

    Installation

    npm / Yarn / pnpm

    npm install html-to-markdown-wasm
    # or
    yarn add html-to-markdown-wasm
    # or
    pnpm add html-to-markdown-wasm

    Deno

    // Via npm specifier
    import { convert } from "npm:html-to-markdown-wasm";

    Usage

    Basic Conversion

    import { convert } from 'html-to-markdown-wasm';
    
    const html = '<h1>Hello World</h1><p>This is <strong>fast</strong>!</p>';
    const markdown = convert(html);
    console.log(markdown);
    // # Hello World
    //
    // This is **fast**!

    With Options

    import { convert } from 'html-to-markdown-wasm';
    
    const markdown = convert(html, {
      headingStyle: 'atx',
      codeBlockStyle: 'backticks',
      listIndentWidth: 2,
      bullets: '-',
      wrap: true,
      wrapWidth: 80
    });

    Preserve Complex HTML (NEW in v2.5)

    import { convert } from 'html-to-markdown-wasm';
    
    const html = `
    <h1>Report</h1>
    <table>
      <tr><th>Name</th><th>Value</th></tr>
      <tr><td>Foo</td><td>Bar</td></tr>
    </table>
    `;
    
    const markdown = convert(html, {
      preserveTags: ['table'] // Keep tables as HTML
    });

    Deno

    import { convert } from "npm:html-to-markdown-wasm";
    
    const html = await Deno.readTextFile("input.html");
    const markdown = convert(html, { headingStyle: "atx" });
    await Deno.writeTextFile("output.md", markdown);

    Performance Tip: For Node.js/Bun, use html-to-markdown-node for 1.17× better performance with native bindings.

    Browser (ESM)

    <!DOCTYPE html>
    <html>
    <head>
      <title>HTML to Markdown</title>
    </head>
    <body>
      <script type="module">
        import init, { convert } from 'https://unpkg.com/html-to-markdown-wasm/dist-web/html_to_markdown_wasm.js';
    
        // Initialize WASM module
        await init();
    
        const html = '<h1>Hello World</h1><p>This runs in the <strong>browser</strong>!</p>';
        const markdown = convert(html, { headingStyle: 'atx' });
    
        console.log(markdown);
        document.body.innerHTML = `<pre>${markdown}</pre>`;
      </script>
    </body>
    </html>

    Vite / Webpack / Bundlers

    import { convert } from 'html-to-markdown-wasm';
    
    const markdown = convert('<h1>Hello</h1>', {
      headingStyle: 'atx',
      codeBlockStyle: 'backticks'
    });

    Cloudflare Workers

    import { convert } from 'html-to-markdown-wasm';
    
    export default {
      async fetch(request: Request): Promise<Response> {
        const html = await request.text();
        const markdown = convert(html, { headingStyle: 'atx' });
    
        return new Response(markdown, {
          headers: { 'Content-Type': 'text/markdown' }
        });
      }
    };

    TypeScript

    Full TypeScript support with type definitions:

    import {
      convert,
      convertWithInlineImages,
      WasmInlineImageConfig,
      type WasmConversionOptions
    } from 'html-to-markdown-wasm';
    
    const options: WasmConversionOptions = {
      headingStyle: 'atx',
      codeBlockStyle: 'backticks',
      listIndentWidth: 2,
      wrap: true,
      wrapWidth: 80
    };
    
    const markdown = convert('<h1>Hello</h1>', options);

    Inline Images

    Extract and decode inline images (data URIs, SVG):

    import { convertWithInlineImages, WasmInlineImageConfig } from 'html-to-markdown-wasm';
    
    const html = '<img src="..." alt="Logo">';
    
    const config = new WasmInlineImageConfig(5 * 1024 * 1024); // 5MB max
    config.inferDimensions = true;
    config.filenamePrefix = 'img_';
    config.captureSvg = true;
    
    const result = convertWithInlineImages(html, null, config);
    
    console.log(result.markdown);
    console.log(`Extracted ${result.inlineImages.length} images`);
    
    for (const img of result.inlineImages) {
      console.log(`${img.filename}: ${img.format}, ${img.data.length} bytes`);
      // img.data is a Uint8Array - save to file or upload
    }

    Build Targets

    Three build targets are provided for different environments:

    Target Path Use Case
    Bundler html-to-markdown-wasm Webpack, Vite, Rollup, esbuild
    Node.js html-to-markdown-wasm/dist-node Node.js, Bun (CommonJS/ESM)
    Web html-to-markdown-wasm/dist-web Direct browser ESM imports

    Runtime Compatibility

    Runtime Support Package
    Node.js 18+ Full support dist-node
    Deno Full support npm: specifier
    Bun Full support (prefer native) Default export
    Browsers Full support dist-web
    Cloudflare Workers Full support Default export
    Deno Deploy Full support npm: specifier

    When to Use

    Choose html-to-markdown-wasm when:

    • 🌐 Running in browsers or edge runtimes
    • 🦕 Using Deno
    • ☁️ Deploying to Cloudflare Workers, Deno Deploy
    • 📦 Building universal libraries
    • 🔄 Need consistent behavior across all platforms

    Use html-to-markdown-node for:

    • ⚡ Maximum performance in Node.js/Bun (~3× faster)
    • 🖥️ Server-side only applications

    Configuration Options

    See the TypeScript definitions for all available options:

    • Heading styles (atx, underlined, atxClosed)
    • Code block styles (indented, backticks, tildes)
    • List formatting (indent width, bullet characters)
    • Text escaping and formatting
    • Tag preservation (preserveTags) and stripping (stripTags)
    • Preprocessing for web scraping
    • hOCR table extraction
    • And more...

    Examples

    Preserving HTML Tags

    Keep specific HTML tags in their original form:

    import { convert } from '@html-to-markdown/wasm';
    
    const html = `
    <p>Before table</p>
    <table class="data">
        <tr><th>Name</th><th>Value</th></tr>
        <tr><td>Item 1</td><td>100</td></tr>
    </table>
    <p>After table</p>
    `;
    
    const markdown = convert(html, {
      preserveTags: ['table']
    });
    
    // Result includes the table as HTML

    Combine with stripTags:

    const markdown = convert(html, {
      preserveTags: ['table', 'form'],  // Keep as HTML
      stripTags: ['script', 'style']    // Remove entirely
    });

    Deno Web Server

    import { convert } from "npm:html-to-markdown-wasm";
    
    Deno.serve((req) => {
      const url = new URL(req.url);
    
      if (url.pathname === "/convert" && req.method === "POST") {
        const html = await req.text();
        const markdown = convert(html, { headingStyle: "atx" });
    
        return new Response(markdown, {
          headers: { "Content-Type": "text/markdown" }
        });
      }
    
      return new Response("Not found", { status: 404 });
    });

    Browser File Conversion

    <input type="file" id="htmlFile" accept=".html">
    <button onclick="convertFile()">Convert to Markdown</button>
    <pre id="output"></pre>
    
    <script type="module">
      import init, { convert } from 'https://unpkg.com/html-to-markdown-wasm/dist-web/html_to_markdown_wasm.js';
    
      await init();
    
      window.convertFile = async () => {
        const file = document.getElementById('htmlFile').files[0];
        const html = await file.text();
        const markdown = convert(html, { headingStyle: 'atx' });
        document.getElementById('output').textContent = markdown;
      };
    </script>

    Web Scraping (Deno)

    import { convert } from "npm:html-to-markdown-wasm";
    
    const response = await fetch("https://example.com");
    const html = await response.text();
    
    const markdown = convert(html, {
      preprocessing: {
        enabled: true,
        preset: "aggressive",
        removeNavigation: true,
        removeForms: true
      },
      headingStyle: "atx",
      codeBlockStyle: "backticks"
    });
    
    console.log(markdown);

    Other Runtimes

    The same Rust engine ships as native bindings for other ecosystems:

    License

    MIT