JSPM

@deepcrawl/html-torch

0.0.4
    • ESM via JSPM
    • ES Module Entrypoint
    • Export Map
    • Keywords
    • License
    • Repository URL
    • TypeScript Types
    • README
    • Created
    • Published
    • Downloads 9
    • Score
      100M100P100Q29130F
    • License MIT

    Burn the rest of the HTML, leaving only the tags that are meaningful to LLM.

    Package Exports

    • @deepcrawl/html-torch

    Readme

    html-torch Open Graph Image

    html-torch 🔥

    html-torch is a library designed to clean up HTML by removing everything but the tags meaningful to Large Language Models (LLMs). It strips away unnecessary scripts, styles, attributes, and more to tidy up HTML content.

    Getting Started

    Installation

    npm install html-torch

    Usage Examples

    Here's a basic example of how to use html-torch to clean up an HTML file:

    import htmlTorch from 'html-torch';
    
    const html = '<html>....</html>';
    const { torchedHTML, summaryJSON } = await htmlTorch(html);
    const { elements, selectors } = summaryJSON;
    
    // html (Original) -> 1.4MB
    // torchedHTML (Torched) -> 179KB
    // elements (Summary JSON) -> 43KB

    For more options and detailed usage, refer to the html-torch.ts file.

    Node Version Management

    Before running this project locally, set up the Node.js version and install the necessary packages using the following commands:

    nvm install
    nvm use
    npm install

    Running Tests

    To ensure everything is working correctly, you can run the tests using the following command:

    npm test

    License

    This project is licensed under the MIT License - see the LICENSE file for details.