The WHATWG Blog

Please leave your sense of logic at the door, thanks!

This Week in HTML 5 – Episode 9

Tuesday, October 14th, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

Most of the changes in the spec this week revolve around the <textarea> element.

Shelley Powers pointed out that I haven't mentioned the issue of distributed extensibility yet. (The clearest description of the issue is Sam Ruby's message from last year, which spawned a long discussion.) The short version: XHTML (served with the proper MIME type, application/xhtml+xml) supports embedding foreign data in arbitrary namespaces, including SVG and MathML. None of these technologies (XHTML, SVG, or MathML) have had much success on the public web. Despite Chris Wilson's assertion that "we cannot definitively say why XHTML has not been successful on the Web," I think it's pretty clear that Internet Explorer's complete lack of support for the application/xhtml+xml MIME type has something to do with it. (Chris is the project lead on Internet Explorer 8.)

Still, it is true that XHTML does support distributed extensibility, and many people believe that the web would be richer if SVG and MathML (and other as-yet-unknown technologies) could be embedded and rendered in HTML pages. The key phrase here is "as-yet-unknown technologies." In that light, the recent SVG-in-HTML proposal (which I mentioned several weeks ago) is beside the point. The point of distributed extensibility is that it does not require approval from a standards body. "Let a thousand flowers bloom" and all that, where by "flowers," I mean "namespaces." This is an unresolved issue.

Other interesting changes this week:

Around the web:

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 2 Comments »

Validator.nu HTML Parser 1.1.0

Monday, August 25th, 2008

I have released a new version of the Validator.nu HTML Parser (an implementation of the HTML5 parsing algorithm in Java). The new release supports SVG and MathML subtrees, is faster than the old version, fixes bugs, is more portable and supports applications that want to do document.write().

The parser comes with a sample app that makes it possible to use XSLT programs written for XHTML5+SVG+MathML with text/html.

Warning! The internal APIs have changed. Please refer to the Upgrade Guide below.

Change Log

Upgrade Guide from 1.0.7 to 1.1.0

In all cases, you need to check that your application does not break when it receives SVG or MathML subtrees.

If you use the parser through the SAX, DOM or XOM API and do not pass an explicit XmlViolationPolicy to the constructor of HtmlParser, HtmlDocumentBuilder or HtmlBuilder:

If you really wanted the old default behavior, you should now pass XmlViolationPolicy.FATAL to the constructor.

If you did not really want to have fatal errors by default, you do not need to do anything, since ALTER_INFOSET is now the default.

If you use the parser through the SAX, DOM or XOM API and do pass an explicit XmlViolationPolicy to the constructor of HtmlParser, HtmlDocumentBuilder or HtmlBuilder:

You do not need to change your code to upgrade.

If you have your own subclass of TreeBuilder:

The abstract methods on TreeBuilder now have additional arguments for passing the namespace URI. You should upgrade your subclass to deal with the namespace URIs. (The URI is always an interned string, so you can use == to compare.)

The entry point for passing in a SAX InputSource has moved from the Tokenizer class to the Driver class (in the io package), so you should change your references from Tokenizer to Driver.

If you have your own implementation of TokenHandler:

Please refer to the JavaDocs of TokenHandler. Also note the new separation of Tokenizer and Driver mentioned above.

Posted in Syntax | Comments Off on Validator.nu HTML Parser 1.1.0