JSPM

  • Created
  • Published
  • Downloads 74
  • Score
    100M100P100Q114052F
  • License MIT

Answers, is the string input string more an HTML or XHTML (or neither)

Package Exports

  • detect-is-it-html-or-xhtml

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (detect-is-it-html-or-xhtml) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

detect-is-it-html-or-xhtml

Standard JavaScript

Answers, is the string input string more an HTML or XHTML (or neither)

Build Status bitHound Overall Score bitHound Dependencies bitHound Dev Dependencies Downloads/Month

Purpose

As you know, XHTML is slightly different from HTML: HTML (4 and 5) does not close the <img> and other single tags, while XHTML does. There are more to that, but that's the major thing from developer's perspective.

When I was working on the email-remove-unused-css, I was parsing the HTML and rendering it back. Upon this rendering-back stage, I had to identify, is the source code of the HTML type, or XHTML, because I could instruct renderer the renderer to close all the single tags (or not). I couldn't find any library that analyses the code, is it HTML or XHTML. That's how detect-is-it-html-or-xhtml was born.

Feed the string into this library. If it's more of an HTML, it will output a string "html". If it's more of an XHTML, it will output a string xhtml. If it doesn't contain any tags, or it does, but there is no doctype, and it's impossible to distinguish between the two, it will output null.

Install

$ npm install --save detect-is-it-html-or-xhtml

Use

var detect = require('detect-is-it-html-or-xhtml')
console.log(detect('<img src="some.jpg" width="zzz" height="zzz" border="0" style="display:block;" alt="zzz"/>'))
// => 'xhtml'

API

detect(
  htmlAsString   // Some code in string format. Or some other string.
)
// => 'html'|'xhtml'|null

Under the hood

The algorithm is the following:

  1. Look for doctype. If recognised, Bob's your uncle.
  2. IF there's no doctype or it's messed up beyond recognition, DO scan all singleton tags (<img>, <br> and <hr>) and see which type the majority is (closed or not closed).
  3. In a rare case when there is an equal amount of both closed and unclosed tags, lean for html.
  4. If (there are no tags in the input) OR (there are no doctype tags and no singleton tags), return null.

Contributing & testing

All contributions are welcome. This library uses Standard JavaScript notation. See test.js. It's very minimalistic testing setup using AVA.

npm test

If you see anything incorrect whatsoever, raise an issue. PR's are welcome — fork, hack and PR.

Licence

MIT © Roy Reveltas