Experience the HTML5 parsing algorithm in the Live DOM Viewer
If you’ve investigated how browsers parse HTML, you’ve probably used Hixie’s Live DOM Viewer to see what happens. Wouldn’t it be cool, though, if you could experiment with the HTML5 parsing algorithm in the same UI? Well, now you can.
I was looking for a way to experiment with
document.write() in the code base of the Validator.nu HTML Parser and I was looking for a way to let people see the parse tree output of the HTML5 parsing algorithm more easily. Instead of writing a test harness fully in Java, I thought it would be better to use the Live DOM Viewer and a browser engine as the test harness. The good news is that Google Web Toolkit makes it possible to put these pieces together, and the trunk of the Validator.nu HTML parser now comes with a
document.write()-aware tokenizer driver and a tree builder subclass for GWT.
Here’s how you can run the Validator.nu HTML Parser in the Live DOM Viewer locally in the Hosted Mode of GWT (on Mac or Linux):
- Check out the source: svn co http://svn.versiondude.net/whattf/htmlparser/trunk/ htmlparser
- Download and untar GWT 1.5 RC1
- On Linux, install libstdc++5 and a JDK (Ubuntu's OpenJDK-based package worked for me).
- Edit the paths in
HtmlParser-linux(Linux) to point to the location of GWT.
- The Linux version of GWT runs an outdated version of Gecko, and the rendered view doesn't work. The DOM view does.
- The Mac version of GWT runs a Web Inspector-enabled version of WebKit, but SVG does not draw.
document.write()semantics are right only for inline scripts.
- Copying and pasting using keyboard shortcuts doesn’t work. (Use the context menu.)
- On Linux, GTW prints a lot of harmless warnings about not finding annotations. (I don’t know why that happens. The annotations should be among translatables.)
- Gecko (used by GTW on Linux) doesn't allow the creation of xmlns attributes in no namespace, so things stop working if you try to put an attribute called
xmlnson HTML elements.
- The DOM view on Linux doesn't report names with colons in them per the HTML5 spec.
(Aside: This code could have applicability beyond testing the parser. If the compiler bug were fixed or worked around, a script could
math element and an
svg element to sniff if they are parsed according to HTML5 and if they aren't, move aside load event handlers,
<plaintext style='display:none'>, wait until
DOMContentLoaded, load the the already created
body elements onto the tree builder stack and head pointer of the HTML5 parser to and reparse the content of the plaintext element as HTML5 and call the load event handlers. See Philip Taylor’s proof of concept with S-expressions.)