Archive for the ‘Syntax’ Category
I gave a talk at Google on Monday demonstrating the various features of HTML5 that are implemented in browsers today. The video is now on YouTube, so now you too can watch and laugh at my lame presentation skills!
The segments of this talk are as follows. Some of the demos are available online for you to play with and are linked to from the following list:
- Drag and Drop API (29:05)
- Form Controls (40:50)
- Validation (1:07:20)
- Questions and Answers (1:09:35)
If you're very interested in watching my typos, the high quality version of the video on the YouTube site is clear enough to see the text being typed. More details about the demos can be found on the corresponding demo page.
I have released a new version of the Validator.nu HTML Parser (an implementation of the HTML5 parsing algorithm in Java). The new release supports SVG and MathML subtrees, is faster than the old version, fixes bugs, is more portable and supports applications that want to do
The parser comes with a sample app that makes it possible to use XSLT programs written for XHTML5+SVG+MathML with
Warning! The internal APIs have changed. Please refer to the Upgrade Guide below.
- Made the SAX, DOM and XOM parser entry point constructors default to altering the infoset instead of throwing when the input needs coercing to be an XML 1.0 4th ed. plus Namespaces infoset.
- Isolated Java IO dependent code from the parser core. The parser core now compiles on Google Web Toolkit.
- Refactored the tokenizer to use a
switch branch per state instead of method per state.
- Made various performance tweaks to the tokenizer.
- Implemented support for MathML and SVG foreign content. (Note that the SVG part is based on spec text that has been commented out from the spec at the request of the SVG WG.)
- Made the parser suspendable after any input character.
- Made it possible for custom
TreeBuilder subclasses to request parser suspension. (Applications wishing to implement
document.write() should provide their own
TreeBuilder subclass and a
document.write()-aware replacement of the
Driver class. Look in the
gwt-src/ directory for sample code.)
- Made changes to the parser core to make it more suitable for mechanical translation into other object-oriented programming languages that have C-like control structures but not necessarily a garbage collector (with focus on targeting C++). This work is not complete.
- Made the HTML serializer do the right thing when input represents a conforming XHTML+SVG+MathML tree. (Results may be bad for non-conforming input trees.)
- Developed sample programs for converting between HTML5 and XHTML5 when the input is known to be conforming.
- Provided an XML serializer so that the sample code no longer depends on the Xalan serializer.
- Improved API documentation.
- Fixed bugs in the tokenizer, tree builder and the input stream character encoding decoder.
- Made coercion to an XML infoset work according to the HTML5 spec.
- Added ID uniqueness checking.
- Various other fixes.
Upgrade Guide from 1.0.7 to 1.1.0
In all cases, you need to check that your application does not break when it receives SVG or MathML subtrees.
- If you use the parser through the SAX, DOM or XOM API and do not pass an explicit
XmlViolationPolicy to the constructor of
If you really wanted the old default behavior, you should now pass
XmlViolationPolicy.FATAL to the constructor.
If you did not really want to have fatal errors by default, you do not need to do anything, since
ALTER_INFOSET is now the default.
- If you use the parser through the SAX, DOM or XOM API and do pass an explicit
XmlViolationPolicy to the constructor of
You do not need to change your code to upgrade.
- If you have your own subclass of
The abstract methods on
TreeBuilder now have additional arguments for passing the namespace URI. You should upgrade your subclass to deal with the namespace URIs. (The URI is always an interned string, so you can use
== to compare.)
The entry point for passing in a SAX
InputSource has moved from the
Tokenizer class to the
Driver class (in the
io package), so you should change your references from
- If you have your own implementation of
Please refer to the JavaDocs of
TokenHandler. Also note the new separation of
Driver mentioned above.
Try pasting in the SVG lion or some MathML in Firefox 3 and Opera 9.5.
use does not work in Firefox. Update: Fixed in Minefield nightlies.
- SVG does not render is Safari.
- IE does not support
createElementNS and, thus, does not work at all.
A big thanks for the GWT team for making this work!
If you’ve investigated how browsers parse HTML, you’ve probably used Hixie’s Live DOM Viewer to see what happens. Wouldn’t it be cool, though, if you could experiment with the HTML5 parsing algorithm in the same UI? Well, now you can.
I was looking for a way to experiment with
document.write() in the code base of the Validator.nu HTML Parser and I was looking for a way to let people see the parse tree output of the HTML5 parsing algorithm more easily. Instead of writing a test harness fully in Java, I thought it would be better to use the Live DOM Viewer and a browser engine as the test harness. The good news is that Google Web Toolkit makes it possible to put these pieces together, and the trunk of the Validator.nu HTML parser now comes with a
document.write()-aware tokenizer driver and a tree builder subclass for GWT.
Here’s how you can run the Validator.nu HTML Parser in the Live DOM Viewer locally in the Hosted Mode of GWT (on Mac or Linux):
- Check out the source: svn co http://svn.versiondude.net/whattf/htmlparser/trunk/ htmlparser
- Download and untar GWT 1.5 RC1
- On Linux, install libstdc++5 and a JDK (Ubuntu's OpenJDK-based package worked for me).
- Edit the paths in
HtmlParser-shell (Mac) or
HtmlParser-linux (Linux) to point to the location of GWT.
HtmlParser-shell (Mac) or
- The Linux version of GWT runs an outdated version of Gecko, and the rendered view doesn't work. The DOM view does.
- The Mac version of GWT runs a Web Inspector-enabled version of WebKit, but SVG does not draw.
document.write() semantics are right only for inline scripts.
- Copying and pasting using keyboard shortcuts doesn’t work. (Use the context menu.)
- On Linux, GTW prints a lot of harmless warnings about not finding annotations. (I don’t know why that happens. The annotations should be among translatables.)
- Gecko (used by GTW on Linux) doesn't allow the creation of xmlns attributes in no namespace, so things stop working if you try to put an attribute called
xmlns on HTML elements.
- The DOM view on Linux doesn't report names with colons in them per the HTML5 spec.
(Aside: This code could have applicability beyond testing the parser. If the compiler bug were fixed or worked around, a script could
math element and an
svg element to sniff if they are parsed according to HTML5 and if they aren't, move aside load event handlers,
<plaintext style='display:none'>, wait until
DOMContentLoaded, load the the already created
body elements onto the tree builder stack and head pointer of the HTML5 parser to and reparse the content of the plaintext element as HTML5 and call the load event handlers. See Philip Taylor’s proof of concept with S-expressions.)
One of the newly introduced features in HTML 5 is the ability to mark up reverse ordered lists. These are the same as ordered lists, but instead of counting up from 1, they instead count down towards 1. This can be used, for example, to count down the top 10 movies, music, or LOLCats, or anything else you want to present as a countdown list.
In previous versions of HTML, the only way to achieve this was to place a
value attribute on each
li element, with successively decreasing values.
<h3>Top 5 TV Series</h3>
<li value="3">The Simpsons
<li value="2">Stargate Atlantis
<li value="1">Stargate SG-1
The problem with that approach is that manually specifying each value can be time consuming to write and maintain, and the
value attribute was not allowed in the HTML 4.01 or XHTML 1.0 Strict DOCTYPEs (although HTML 5 fixes that problem and allows the
The new markup is very simple: just add a
reversed attribute to the
ol element, and optionally provide a
start value. If there’s no start value provided, the browser will count the number of list items, and count down from that number to 1.
<h3>Greatest Movies Sagas of All Time</h3>
<li>Police Academy (Series)
<li>Harry Potter (Series)
<li>Back to the Future (Trilogy)
<li>Star Wars (Saga)
<li>The Lord of the Rings (Trilogy)
Since there are 5 list items in that list, the list will count down from 5 to 1.
reversed attribute is a boolean attribute. In HTML, the value may be omitted, but in XHTML, it needs to be written as:
start attribute can be used to specify the starting number for the countdown, or the
value attribute can be used on an
li element. Subsequent list items will, by default, be numbered with the value of 1 less than the previous item.
The following example starts counting down from 100, but omits a few items from the middle of the list and resumes from 3.
<h3>Top 100 Logical Fallacies Used By Creationists</h3>
<ol reversed="reversed" start="100">
<li>Appeal to Ridicule</li>
<li>Begging the Question (Circular Logic)</li>
<!-- Items omitted here -->
<li>Bare Assertion Fallacy</li>
<li>Argumentum ad Ignorantiam</li>