JSPM

  • Created
  • Published
  • Downloads 34246507
  • Score
    100M100P100Q222584F

Forgiving HTML/XML/RSS Parser for Node. This version is optimised and cleaned and provides a SAX interface.

Package Exports

  • htmlparser2
  • htmlparser2/lib
  • htmlparser2/lib/FeedHandler.js
  • htmlparser2/lib/Parser
  • htmlparser2/lib/Parser.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (htmlparser2) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

#htmlparser2 A forgiving HTML/XML/RSS parser written in JS for NodeJS. The parser can handle streams (chunked data) and supports custom handlers for writing custom DOMs/output.

##Installing npm install htmlparser2

##Running Tests node tests/00-runtests.js

This project is linked to Travis CI. The latest builds status is:

Build Status

##How is this different from node-htmlparser? This is a fork of the project above. The main difference is that this is just intended to be used with node. Besides, the code is much better structured, has less duplications and is remarkably faster than the original.

Besides, the parser now provides the interface of sax.js (originally intended for my readability port readabilitySAX). I also fixed a couple of bugs & included some pull requests for the original project (eg. RDF feed support).

The support for location data and verbose output was removed a couple of versions ago. It's still available in the verbose branch (if you really need it, for whatever reason that may be).

##Usage

var htmlparser = require("htmlparser");
var rawHtml = "Xyz <script language= javascript>var foo = '<<bar>>';< /  script><!--<!-- Waah! -- -->";
var handler = new htmlparser.DefaultHandler(function (error, dom) {
    if (error)
        [...do something for errors...]
    else
        [...parsing done, do something...]
        console.log(dom);
});
var parser = new htmlparser.Parser(handler);
parser.write(rawHtml);
parser.done();

Output:

[{
    data: 'Xyz ',
    type: 'text'
}, {
    type: 'script',
    name: 'script',
    attribs: {
        language: 'javascript'
    },
    children: [{
        data: 'var foo = \'<bar>\';<',
        type: 'text'
    }]
}, {
    data: '<!-- Waah! -- ',
    type: 'comment'
}]

##Streaming To Parser

while (...) {
    ...
    parser.write(chunk);
}
parser.done();

##Parsing RSS/RDF/Atom Feeds

new htmlparser.FeedHandler(function (error, feed) {
    ...
});

##Further reading