The WHATWG Blog

Please leave your sense of logic at the door, thanks!

Validator.nu HTML Parser 1.1.0

Monday, August 25th, 2008

I have released a new version of the Validator.nu HTML Parser (an implementation of the HTML5 parsing algorithm in Java). The new release supports SVG and MathML subtrees, is faster than the old version, fixes bugs, is more portable and supports applications that want to do document.write().

The parser comes with a sample app that makes it possible to use XSLT programs written for XHTML5+SVG+MathML with text/html.

Warning! The internal APIs have changed. Please refer to the Upgrade Guide below.

Change Log

Upgrade Guide from 1.0.7 to 1.1.0

In all cases, you need to check that your application does not break when it receives SVG or MathML subtrees.

If you use the parser through the SAX, DOM or XOM API and do not pass an explicit XmlViolationPolicy to the constructor of HtmlParser, HtmlDocumentBuilder or HtmlBuilder:

If you really wanted the old default behavior, you should now pass XmlViolationPolicy.FATAL to the constructor.

If you did not really want to have fatal errors by default, you do not need to do anything, since ALTER_INFOSET is now the default.

If you use the parser through the SAX, DOM or XOM API and do pass an explicit XmlViolationPolicy to the constructor of HtmlParser, HtmlDocumentBuilder or HtmlBuilder:

You do not need to change your code to upgrade.

If you have your own subclass of TreeBuilder:

The abstract methods on TreeBuilder now have additional arguments for passing the namespace URI. You should upgrade your subclass to deal with the namespace URIs. (The URI is always an interned string, so you can use == to compare.)

The entry point for passing in a SAX InputSource has moved from the Tokenizer class to the Driver class (in the io package), so you should change your references from Tokenizer to Driver.

If you have your own implementation of TokenHandler:

Please refer to the JavaDocs of TokenHandler. Also note the new separation of Tokenizer and Driver mentioned above.

Posted in Syntax | Comments Off on Validator.nu HTML Parser 1.1.0