Archive for August, 2008
Thursday, August 28th, 2008
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
The big news this week is the birth of the W3C's experimental HTML 5 validator (announcement). It is based on Henri Sivonen's experimental HTML 5 validator, although there are still some integration bugs to shake out. Related discussion on Sam Ruby's blog.
SVG is back in the news. In a presentation to the Mozilla Corporation in December 2005, a Firefox developer asked me what I had against SVG. I replied, "I have nothing against SVG; make it work in HTML." Last week, Doug Schepers, on behalf of the SVG Working Group, reported that their SVG-in-HTML proposal was ready for another review, having incorporated the feedback from their first draft, released in July. Earlier today, Ian Hickson provided his review of the latest SVG-in-HTML proposal. You should read the whole thing, as it details the goals of the HTML Working Group and how they relate to the possible inclusion of SVG. Ian concluded with this:
In general, my conclusions are are [sic] somewhat negative:
- There are a lot of goals that aren't met.
- It seems to me that this proposal goes to great lengths to support some syntax (e.g. namespaces) despite evidence that doing so is not necessary, and it makes sacrifices regarding potential optimisations (like making the tokeniser case-insensitive, avoiding substring searches, avoiding attribute searches) despite evidence that browsers consider performance critical.
- It leaves some aspects quite poorly defined, such as how encoding errors are handled, exactly where parse errors are to be established as occuring, and how the XML parser is expected to interact with
document.write()
.
- It rather poorly handles typical authoring mistakes such as copying and pasting half of an SVG or MathML fragment into an HTML page, or omitting namespace declarations altogether.
In other news, the image alt
argument is finally over! Ha ha, just kidding. But Ian Hickson did summarize all of the proposed solutions to date:
- We can't require that every image have non-empty
alt
, because there are images that do nothing to help image-free users (A).
- We can't say that making a site like Flickr requires asking all users for alternative text, since users simply won't provide that data (B, B.1).
- We can't just omit
alt=""
with nothing else, since then users of image navigation will get lost (B.2.i).
- We can't use special syntax, since it hurts sites that care about accessibility more than anyone else, which just hurts the accessibility cause (B.2.ii.a, B.2.ii.b, B.2.ii.c).
- We can't introduce a new attribute because this will legitimise omitting
alt
far too much, again hurting the accessibility cause, and any new attribute will likely be misused to the point of making the attribute useless, due to the copy-paste mentality of authors who don't understand the spec (B.2.iii.a, B.2.iii.b, .2.iii.c.I, B.2.iii.c.II, B.2.iii.c.III).
- We can't just use
alt=""
with captions instead of replacement text, as that would both give a mixed message for authors, reducing the quality of alternative text in general, and would make it harder to understand pages with a lot of images even if they used alt=""
correctly, if they sometimes had to use this technique (B.2.iv).
- We can't require that all such images be links or be in a
<figure>
, since both of these over-constrain the author and will likely just be requirements that are ignored (B.2.v, B.2.vi).
- We don't want to have multiple levels of conformance because authors seem happy to aim for the lower level (as seen with HTML4 Transitional), and because just doing this still doesn't address the problem (we have to pick one of the other solutions for the "lesser" conformance class), and because this isn't necessarily something that is fixable (we want full conformance to be something that authors can always aim for) (B.3).
- We don't want to just say authors can punt on alternative text altogether, as that doesn't help accessibility (C).
- We don't want to not require alternative text at all, since in most cases alternative text is quite easy to add and massively helps non-image users (D).
- We don't want to ban alternative text as there is simply no other alternative for handling images these days (E).
As you might expect, this generated much followup discussion. Some accessibility experts liked it, others didn't. John Foliot still felt that alt
should be required. I'd bet good money that this won't be the last word on the subject. See revisions 2106, 2110, 2113, and 2115.
Other interesting changes this week:
I will be on vacation next week, so tune in in two weeks for a special double feature of "This Week in HTML 5." Try not to break the web while I'm gone.
Posted in Weekly Review | 4 Comments »
Monday, August 25th, 2008
I have released a new version of the Validator.nu HTML Parser (an implementation of the HTML5 parsing algorithm in Java). The new release supports SVG and MathML subtrees, is faster than the old version, fixes bugs, is more portable and supports applications that want to do document.write()
.
The parser comes with a sample app that makes it possible to use XSLT programs written for XHTML5+SVG+MathML with text/html
.
Warning! The internal APIs have changed. Please refer to the Upgrade Guide below.
Change Log
- Made the SAX, DOM and XOM parser entry point constructors default to altering the infoset instead of throwing when the input needs coercing to be an XML 1.0 4th ed. plus Namespaces infoset.
- Isolated Java IO dependent code from the parser core. The parser core now compiles on Google Web Toolkit.
- Refactored the tokenizer to use a
switch
branch per state instead of method per state.
- Made various performance tweaks to the tokenizer.
- Implemented support for MathML and SVG foreign content. (Note that the SVG part is based on spec text that has been commented out from the spec at the request of the SVG WG.)
- Made the parser suspendable after any input character.
- Made it possible for custom
TreeBuilder
subclasses to request parser suspension. (Applications wishing to implement document.write()
should provide their own TreeBuilder
subclass and a document.write()
-aware replacement of the Driver
class. Look in the gwt-src/
directory for sample code.)
- Made changes to the parser core to make it more suitable for mechanical translation into other object-oriented programming languages that have C-like control structures but not necessarily a garbage collector (with focus on targeting C++). This work is not complete.
- Made the HTML serializer do the right thing when input represents a conforming XHTML+SVG+MathML tree. (Results may be bad for non-conforming input trees.)
- Developed sample programs for converting between HTML5 and XHTML5 when the input is known to be conforming.
- Provided an XML serializer so that the sample code no longer depends on the Xalan serializer.
- Improved API documentation.
- Fixed bugs in the tokenizer, tree builder and the input stream character encoding decoder.
- Made coercion to an XML infoset work according to the HTML5 spec.
- Added ID uniqueness checking.
- Various other fixes.
Upgrade Guide from 1.0.7 to 1.1.0
In all cases, you need to check that your application does not break when it receives SVG or MathML subtrees.
- If you use the parser through the SAX, DOM or XOM API and do not pass an explicit
XmlViolationPolicy
to the constructor of HtmlParser
, HtmlDocumentBuilder
or HtmlBuilder
:
If you really wanted the old default behavior, you should now pass XmlViolationPolicy.FATAL
to the constructor.
If you did not really want to have fatal errors by default, you do not need to do anything, since ALTER_INFOSET
is now the default.
- If you use the parser through the SAX, DOM or XOM API and do pass an explicit
XmlViolationPolicy
to the constructor of HtmlParser
, HtmlDocumentBuilder
or HtmlBuilder
:
You do not need to change your code to upgrade.
- If you have your own subclass of
TreeBuilder
:
The abstract methods on TreeBuilder
now have additional arguments for passing the namespace URI. You should upgrade your subclass to deal with the namespace URIs. (The URI is always an interned string, so you can use ==
to compare.)
The entry point for passing in a SAX InputSource
has moved from the Tokenizer
class to the Driver
class (in the io
package), so you should change your references from Tokenizer
to Driver
.
- If you have your own implementation of
TokenHandler
:
Please refer to the JavaDocs of TokenHandler
. Also note the new separation of Tokenizer
and Driver
mentioned above.
Posted in Syntax | Comments Off on Validator.nu HTML Parser 1.1.0
Friday, August 22nd, 2008
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
The biggest news this week is the birth of the event loop.
To coordinate events, user interaction, scripts, rendering, networking, and so forth, user agents must use event loops as described in this section.
... An event loop has one or more task queues. A task queue is an ordered list of tasks, which can be:
- Events
Asynchronously dispatching an Event
object at a particular EventTarget
object is a task.
- Parsing
The HTML parser tokenising a single byte, and then processing any resulting tokens, is a task.
- Callbacks
Calling a callback asynchronously is a task.
- Using a resource
When an algorithm fetches a resource, if the fetching occurs asynchronously then the processing of the resource once some or all of the resource is available is a task.
- Reacting to DOM manipulation
Some elements have tasks that trigger in response to DOM manipulation, e.g. when that element is inserted into the document.
The purpose of defining an event loop is to unify the definition of things that happen asychronously. (I want to avoid saying "events" since that term is already overloaded.) For example, if an image defines an onload
callback function, exactly when does it get called? Questions like this are now answered in terms of adding tasks to a queue and processing them in an event loop.
- Revision 2074 defines event loops and task queues (as quoted above).
- Revision 2076, 2079, 2080, 2081, 2082, and 2083 define the behavior of media elements (like
<audio>
and <video>
) in terms of the event loop.
- Revision 2084 defines the behavior of
template
and ref
attributes, local database storage, and remote events in terms of the event loop.
- Revision 2085 defines the behavior of web sockets,
postMessage
, message ports, and setTimeout
in terms of the event loop.
- Revision 2097 defines the behavior of an image's
load
event in terms of the event loop.
The other major news this week is the addition of the hashchange
event, which occurs when the user clicks an in-page link that goes somewhere else on the same page, or when a script programmatically sets the location.hash
property. This is primarily useful for AJAX applications that wish to maintain a history of user actions while remaining on the same page. As a concrete example, executing a search of your messages in GMail takes you to a list of search results, but does not change the base URL, just the hash; clicking the Back button takes you back to the previous view within GMail (such as your inbox), again without changing the base URL (just the hash). GMail employs some nasty hacks to make this work in all browsers; the hashchange
event is designed to make those hacks slightly less nasty. Microsoft Internet Explorer 8 pioneered the hashchange
event, and its definition in HTML 5 is designed to match Internet Explorer's behavior.
Other interesting changes this week:
- In last week's episode, I mentioned revision 2063, which allows HTML documents to contain both
xml:lang
and lang
attributes as long as they are identical. Revision 2091 relaxes this restriction slightly to allow the xml:lang
and lang
attributes to differ by case (i.e. one could be uppercase and the other could be lowercase, and that is no longer an error). Discussion: xml:lang="" and lang=""
- Revision 2092 defines the parsing algorithm for empty table rows.
- Revision 2094 clarifies the meaning of whitespace by deferring to the Unicode definitions.
- Revision 2096 forbids content sniffing for SVG images. In order to use an SVG image in an
<img src="">
attribute, the web server must ensure that the SVG image is served with a Content-Type: image/svg+xml
HTTP header.
Tune in next week for another exciting episode of "This Week in HTML 5."
Posted in Weekly Review | 4 Comments »
Thursday, August 14th, 2008
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
The biggest news this week is revision 2020, which standardizes the navigator
object:
The navigator
attribute of the Window
interface must return an instance of the Navigator
interface, which represents the identity and state of the user agent (the client), and allows Web pages to register themselves as potential protocol and content handlers.
Currently, HTML 5 defines four properties and two methods:
appName
appVersion
platform
userAgent
registerProtocolHandler
registerContentHandler
This is only a subset of navigator
properties and methods that browsers already support. See Navigator Object on Google Doctype for complete browser compatibility information.
Next up: Content-Language
. No, not the HTTP header, not even the <html lang>
attribute, but the <meta>
tag! As reported by Henri Sivonen,
It seems that some authoring tools and authors use <meta http-equiv='content-language' content='languagetag'>
instead of <html lang='languagetag'>
.
This led to revision 2057, which defines the <meta> http-equiv="Content-Language">
directive and its relationship with lang
, xml:lang
, and the Content-Language
HTTP header.
In the continuing saga of the alt
attribute, the new syntax for alternate text of auto-generated images (which I covered in last week's episode) has generated some followup discussion. Philip Taylor is concerned that it will increase complexity for authoring tools; others feel the complexity is worth the cost. James Graham suggested a no-text-equivalent
attribute; similar proposals have been discussed before and rejected.
Switching to the new Web Workers specification (which I also covered last week), Aaron Boodman (one of the developers of Google Gears) posted his initial feedback. This kicked off a long discussion and led to the creation of the Worker
object.
Other interesting changes this week:
Administrivia: "This Week in HTML 5" now has its own feed.
Tune in next week for another exciting episode of "This Week in HTML 5."
Posted in Weekly Review | 3 Comments »
Thursday, August 14th, 2008
Earlier, I blogged about running the Validator.nu HTML Parser inside Hixie’s Live DOM Viewer using the magic of the hosted mode of the Google Web Toolkit. Back then, a compiler bug in GTW 1.5 RC1 prevented the parser from running as JavaScript in the Web mode. Google has now released GWT 1.5 RC2, which contains a fix for the bug.
So without further ado, here’s Live DOM Viewer with an HTML5 parser running as JavaScript in your browser.
Try pasting in the SVG lion or some MathML in Firefox 3 and Opera 9.5.
Known problems:
- SVG
use
does not work in Firefox. Update: Fixed in Minefield nightlies.
- SVG does not render is Safari.
- IE does not support
createElementNS
and, thus, does not work at all.
A big thanks for the GWT team for making this work!
Posted in DOM, Syntax | Comments Off on HTML5 Live DOM Viewer—Now in Your Browser