The WHATWG Blog

Archive for the ‘Syntax’ Category

Google Tech Talk: HTML5 demos

Friday, September 26th, 2008

I gave a talk at Google on Monday demonstrating the various features of HTML5 that are implemented in browsers today. The video is now on YouTube, so now you too can watch and laugh at my lame presentation skills!

The segments of this talk are as follows. Some of the demos are available online for you to play with and are linked to from the following list:

Introduction
<video> (00:35)
postMessage() (05:40)
localStorage (15:20)
sessionStorage (21:00)
Drag and Drop API (29:05)
onhashchange (37:30)
Form Controls (40:50)
<canvas> (56:55)
Validation (1:07:20)
Questions and Answers (1:09:35)

If you're very interested in watching my typos, the high quality version of the video on the YouTube site is clear enough to see the text being typed. More details about the demos can be found on the corresponding demo page.

Posted in Browser API, Browsers, Conformance Checking, DOM, Elements, Events, Forms, Multimedia, Syntax, WHATWG | 7 Comments »

Validator.nu HTML Parser 1.1.0

Monday, August 25th, 2008

I have released a new version of the Validator.nu HTML Parser (an implementation of the HTML5 parsing algorithm in Java). The new release supports SVG and MathML subtrees, is faster than the old version, fixes bugs, is more portable and supports applications that want to do document.write().

The parser comes with a sample app that makes it possible to use XSLT programs written for XHTML5+SVG+MathML with text/html.

Warning! The internal APIs have changed. Please refer to the Upgrade Guide below.

Change Log

Made the SAX, DOM and XOM parser entry point constructors default to altering the infoset instead of throwing when the input needs coercing to be an XML 1.0 4th ed. plus Namespaces infoset.
Isolated Java IO dependent code from the parser core. The parser core now compiles on Google Web Toolkit.
Refactored the tokenizer to use a switch branch per state instead of method per state.
Made various performance tweaks to the tokenizer.
Implemented support for MathML and SVG foreign content. (Note that the SVG part is based on spec text that has been commented out from the spec at the request of the SVG WG.)
Made the parser suspendable after any input character.
Made it possible for custom TreeBuilder subclasses to request parser suspension. (Applications wishing to implement document.write() should provide their own TreeBuilder subclass and a document.write()-aware replacement of the Driver class. Look in the gwt-src/ directory for sample code.)
Made changes to the parser core to make it more suitable for mechanical translation into other object-oriented programming languages that have C-like control structures but not necessarily a garbage collector (with focus on targeting C++). This work is not complete.
Made the HTML serializer do the right thing when input represents a conforming XHTML+SVG+MathML tree. (Results may be bad for non-conforming input trees.)
Developed sample programs for converting between HTML5 and XHTML5 when the input is known to be conforming.
Provided an XML serializer so that the sample code no longer depends on the Xalan serializer.
Improved API documentation.
Fixed bugs in the tokenizer, tree builder and the input stream character encoding decoder.
Made coercion to an XML infoset work according to the HTML5 spec.
Added ID uniqueness checking.
Various other fixes.

Upgrade Guide from 1.0.7 to 1.1.0

In all cases, you need to check that your application does not break when it receives SVG or MathML subtrees.

If you use the parser through the SAX, DOM or XOM API and do not pass an explicit XmlViolationPolicy to the constructor of HtmlParser, HtmlDocumentBuilder or HtmlBuilder:

If you really wanted the old default behavior, you should now pass XmlViolationPolicy.FATAL to the constructor.

If you did not really want to have fatal errors by default, you do not need to do anything, since ALTER_INFOSET is now the default.

If you use the parser through the SAX, DOM or XOM API and do pass an explicit XmlViolationPolicy to the constructor of HtmlParser, HtmlDocumentBuilder or HtmlBuilder:

You do not need to change your code to upgrade.

If you have your own subclass of TreeBuilder:

The abstract methods on TreeBuilder now have additional arguments for passing the namespace URI. You should upgrade your subclass to deal with the namespace URIs. (The URI is always an interned string, so you can use == to compare.)

The entry point for passing in a SAX InputSource has moved from the Tokenizer class to the Driver class (in the io package), so you should change your references from Tokenizer to Driver.

If you have your own implementation of TokenHandler:

Please refer to the JavaDocs of TokenHandler. Also note the new separation of Tokenizer and Driver mentioned above.

Posted in Syntax | Comments Off on Validator.nu HTML Parser 1.1.0

HTML5 Live DOM Viewer—Now in Your Browser

Thursday, August 14th, 2008

Earlier, I blogged about running the Validator.nu HTML Parser inside Hixie’s Live DOM Viewer using the magic of the hosted mode of the Google Web Toolkit. Back then, a compiler bug in GTW 1.5 RC1 prevented the parser from running as JavaScript in the Web mode. Google has now released GWT 1.5 RC2, which contains a fix for the bug.

So without further ado, here’s Live DOM Viewer with an HTML5 parser running as JavaScript in your browser.

Try pasting in the SVG lion or some MathML in Firefox 3 and Opera 9.5.

Known problems:

SVG use does not work in Firefox. Update: Fixed in Minefield nightlies.
SVG does not render is Safari.
IE does not support createElementNS and, thus, does not work at all.

A big thanks for the GWT team for making this work!

Posted in DOM, Syntax | Comments Off on HTML5 Live DOM Viewer—Now in Your Browser

Experience the HTML5 parsing algorithm in the Live DOM Viewer

Monday, June 30th, 2008

If you’ve investigated how browsers parse HTML, you’ve probably used Hixie’s Live DOM Viewer to see what happens. Wouldn’t it be cool, though, if you could experiment with the HTML5 parsing algorithm in the same UI? Well, now you can.

I was looking for a way to experiment with document.write() in the code base of the Validator.nu HTML Parser and I was looking for a way to let people see the parse tree output of the HTML5 parsing algorithm more easily. Instead of writing a test harness fully in Java, I thought it would be better to use the Live DOM Viewer and a browser engine as the test harness. The good news is that Google Web Toolkit makes it possible to put these pieces together, and the trunk of the Validator.nu HTML parser now comes with a document.write()-aware tokenizer driver and a tree builder subclass for GWT.

The bad news is that the Java-to-JavaScript compiler of GWT has a bug that blocks me from putting the result online as JavaScript. The Hosted Mode of GWT, works, though.

Here’s how you can run the Validator.nu HTML Parser in the Live DOM Viewer locally in the Hosted Mode of GWT (on Mac or Linux):

Check out the source: svn co http://svn.versiondude.net/whattf/htmlparser/trunk/ htmlparser
Download and untar GWT 1.5 RC1
On Linux, install libstdc++5 and a JDK (Ubuntu's OpenJDK-based package worked for me).
Edit the paths in HtmlParser-shell (Mac) or HtmlParser-linux (Linux) to point to the location of GWT.
Run HtmlParser-shell (Mac) or HtmlParser-linux (Linux)

Known problems:

The Linux version of GWT runs an outdated version of Gecko, and the rendered view doesn't work. The DOM view does.
The Mac version of GWT runs a Web Inspector-enabled version of WebKit, but SVG does not draw.
document.write() semantics are right only for inline scripts.
Copying and pasting using keyboard shortcuts doesn’t work. (Use the context menu.)
On Linux, GTW prints a lot of harmless warnings about not finding annotations. (I don’t know why that happens. The annotations should be among translatables.)
Gecko (used by GTW on Linux) doesn't allow the creation of xmlns attributes in no namespace, so things stop working if you try to put an attribute called xmlns on HTML elements.
The DOM view on Linux doesn't report names with colons in them per the HTML5 spec.

(Aside: This code could have applicability beyond testing the parser. If the compiler bug were fixed or worked around, a script could document.write() a math element and an svg element to sniff if they are parsed according to HTML5 and if they aren't, move aside load event handlers, document.write() <plaintext style='display:none'>, wait until DOMContentLoaded, load the the already created html, head and body elements onto the tree builder stack and head pointer of the HTML5 parser to and reparse the content of the plaintext element as HTML5 and call the load event handlers. See Philip Taylor’s proof of concept with S-expressions.)

Posted in Syntax | 1 Comment »

Reverse Ordered Lists

Wednesday, April 23rd, 2008

One of the newly introduced features in HTML 5 is the ability to mark up reverse ordered lists. These are the same as ordered lists, but instead of counting up from 1, they instead count down towards 1. This can be used, for example, to count down the top 10 movies, music, or LOLCats, or anything else you want to present as a countdown list.

In previous versions of HTML, the only way to achieve this was to place a value attribute on each li element, with successively decreasing values.

<h3>Top 5 TV Series</h3>
<ol>
  <li value="5">Friends
  <li value="4">24
  <li value="3">The Simpsons
  <li value="2">Stargate Atlantis
  <li value="1">Stargate SG-1
</ol>

The problem with that approach is that manually specifying each value can be time consuming to write and maintain, and the value attribute was not allowed in the HTML 4.01 or XHTML 1.0 Strict DOCTYPEs (although HTML 5 fixes that problem and allows the value attribute)

The new markup is very simple: just add a reversed attribute to the ol element, and optionally provide a start value. If there’s no start value provided, the browser will count the number of list items, and count down from that number to 1.

<h3>Greatest Movies Sagas of All Time</h3>
<ol reversed>
  <li>Police Academy (Series)
  <li>Harry Potter (Series)
  <li>Back to the Future (Trilogy)
  <li>Star Wars (Saga)
  <li>The Lord of the Rings (Trilogy)
</ol>

Since there are 5 list items in that list, the list will count down from 5 to 1.

The reversed attribute is a boolean attribute. In HTML, the value may be omitted, but in XHTML, it needs to be written as: reversed="reversed".

The start attribute can be used to specify the starting number for the countdown, or the value attribute can be used on an li element. Subsequent list items will, by default, be numbered with the value of 1 less than the previous item.

The following example starts counting down from 100, but omits a few items from the middle of the list and resumes from 3.

<h3>Top 100 Logical Fallacies Used By Creationists</h3>
<ol reversed="reversed" start="100">
  <li>False Dichotomy</li>
  <li>Appeal to Ridicule</li>
  <li>Begging the Question (Circular Logic)</li>
  <!-- Items omitted here -->
  <li value="3">Strawman</li>
  <li>Bare Assertion Fallacy</li>
  <li>Argumentum ad Ignorantiam</li>
</ol>

Posted in Elements, Syntax | 26 Comments »