Experience the HTML5 parsing algorithm in the Live DOM Viewer
If you’ve investigated how browsers parse HTML, you’ve probably used Hixie’s Live DOM Viewer to see what happens. Wouldn’t it be cool, though, if you could experiment with the HTML5 parsing algorithm in the same UI? Well, now you can.
I was looking for a way to experiment with document.write()
in the code base of the Validator.nu HTML Parser and I was looking for a way to let people see the parse tree output of the HTML5 parsing algorithm more easily. Instead of writing a test harness fully in Java, I thought it would be better to use the Live DOM Viewer and a browser engine as the test harness. The good news is that Google Web Toolkit makes it possible to put these pieces together, and the trunk of the Validator.nu HTML parser now comes with a document.write()
-aware tokenizer driver and a tree builder subclass for GWT.
The bad news is that the Java-to-JavaScript compiler of GWT has a bug that blocks me from putting the result online as JavaScript. The Hosted Mode of GWT, works, though.
Here’s how you can run the Validator.nu HTML Parser in the Live DOM Viewer locally in the Hosted Mode of GWT (on Mac or Linux):
- Check out the source: svn co http://svn.versiondude.net/whattf/htmlparser/trunk/ htmlparser
- Download and untar GWT 1.5 RC1
- On Linux, install libstdc++5 and a JDK (Ubuntu's OpenJDK-based package worked for me).
- Edit the paths in
HtmlParser-shell
(Mac) orHtmlParser-linux
(Linux) to point to the location of GWT. - Run
HtmlParser-shell
(Mac) orHtmlParser-linux
(Linux)
Known problems:
- The Linux version of GWT runs an outdated version of Gecko, and the rendered view doesn't work. The DOM view does.
- The Mac version of GWT runs a Web Inspector-enabled version of WebKit, but SVG does not draw.
document.write()
semantics are right only for inline scripts.- Copying and pasting using keyboard shortcuts doesn’t work. (Use the context menu.)
- On Linux, GTW prints a lot of harmless warnings about not finding annotations. (I don’t know why that happens. The annotations should be among translatables.)
- Gecko (used by GTW on Linux) doesn't allow the creation of xmlns attributes in no namespace, so things stop working if you try to put an attribute called
xmlns
on HTML elements. - The DOM view on Linux doesn't report names with colons in them per the HTML5 spec.
(Aside: This code could have applicability beyond testing the parser. If the compiler bug were fixed or worked around, a script could document.write()
a math
element and an svg
element to sniff if they are parsed according to HTML5 and if they aren't, move aside load event handlers, document.write()
<plaintext style='display:none'>
, wait until DOMContentLoaded
, load the the already created html
, head
and body
elements onto the tree builder stack and head pointer of the HTML5 parser to and reparse the content of the plaintext element as HTML5 and call the load event handlers. See Philip Taylor’s proof of concept with S-expressions.)