The WHATWG Blog — thisweekinhtml5

This Week in HTML 5 – Episode 6

Tuesday, September 23rd, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

There is no big news this week. Work continued on last week's orgy of Web Forms-related check-ins. This week adds the <label> element and the jack-of-all-forms <input> element. [r2191, r2192, r2197, r2200, r2202, r2204, r2205, r2207, r2211, r2212, r2213, r2214, r2218, r2219, r2220, r2222, r2223]

Laura Carlson and others have begun to review the accessibility of multimedia on the web. Most accessibility discussions revolve around the needs of visually impaired users, but hearing impaired users are also important and too often ignored. There was a long discussion last month (and continuing into this month) about the accessibility implications of the <audio> and <video> elements for hearing impaired users. YouTube (owned by Google, my employer) recently announced support for captions on YouTube videos and published a tutorial on adding them to your own videos.

Ian Hickson (the HTML 5 editor) gave an interview about HTML 5 in which he reiterated his goal of having two independent, complete, interoperable implementations of HTML 5 by 2022. (By contrast, HTML 4.0 was "finalized" 11 years ago but still doesn't have two independent, complete, interoperable implementations.) This led to a mini-firestorm among bloggers who misunderstood "2022" as "the date when I can start using HTML 5 features." It bears repeating that the "2022" date has no significance at all for web developers. Most browser vendors are actively involved in HTML 5, several browsers are already shipping HTML 5 features, and developers who are holding their breath until 2022 are going to find themselves seriously behind the curve.

On that note, Brenton Strine asks a very good question: "Is there some place that documents the parts of HTML 5 that are already up and running? Can I use <canvas> or <video>? In which browsers? What other tags can I use? What other fancy HTML 5 stuff can I do today in 2008?" On the video front, Mozilla will be shipping Ogg Theora support in Firefox 3.1. (You can read more about why Ogg matters.) Last year, Opera released experimental builds with Ogg Theora support, and they now have video-enabled builds on 3 platforms. The Wikimedia Foundation has a few Theora-encoded videos you can watch.

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 6 Comments »

This Week in HTML 5 – Episode 5

Monday, September 15th, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news this week is the merging of the Web Forms 2 specification into the HTML 5 specification, and updating it with the collected feedback of the last two years.

form [r2142]
fieldset [r2143]
input [r2144]
button [r2145]
label [r2146]
select [r2148]
datalist [r2150]
optgroup [r2151]
option [r2152]
textarea [r2153]
output [r2154]
The form attribute on various elements, which associates an element with its parent form, and the elements attribute of the form element, which associates the form with its elements. (Form-associated elements no longer need to be children of the <form> element itself within the DOM, so explicit association is required. Form-associated elements that are DOM children of the <form> element are implicitly associated, so your existing markup will continue to work the way you think it does.) [r2157]

Meanwhile, revisions 2160, 2161, 2163, 2164, and 2165 begin the long, hard process of defining when and how a form is submitted. This is one of those things that "everybody knows" but nobody has actually, you know, documented. For example, do you submit a form when you toggle a checkbox? Of course not, "everybody knows" that. Is an unchecked checkbox included in the form data when it is submitted? No, "everybody knows" that too. How do you submit to an ftp:// URL? A mailto:// URL? A data:// URL? What are the three values of the enctype attribute, and how do they affect the form data when it is submitted to a data:// URL with the PUT method?¹ Umm... How exactly do you construct the names of the X and Y coordinates to submit a server-side image map? (By the way, server-side image maps are inaccessible, so don't use them unless you provide an accessible fallback form with equivalent functionality.) Web Forms 2 (and now HTML 5) will tell you.

Another interesting set of changes revolves around character encoding. If you don't know anything about character encoding, I would strongly recommend Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) Tim Bray's three-part series, On the Goodness of Unicode, On Character Strings, and Characters vs. Bytes, and anything written by Martin Dürst.

Now then: r2125 warns against using EBCDIC on public-facing web pages. For those of you under 30, EBCDIC is a character encoding invented by IBM in the 1960s for their System/360 mainframe. On non-IBM hardware, EBCDIC lost the encoding war to ASCII, and later Unicode, and it is rarely seen on the public web. r2131 says that browsers should ignore an out-of-band encoding definition that they do not support. For example, if a web page is served with an HTTP Content-Type header with a charset parameter that defines a character encoding the browser does not support, the browser should ignore it and continue the process of determining the character encoding by other means. And finally, r2137 says that browsers should treat US-ASCII as Windows-1252 when determining character encoding. As the HTML 5 specification notes, "The requirement to treat certain encodings as other encodings according to the table above is a willful violation of the W3C Character Model specification."

Other interesting changes this week:

r2122 makes it non-conforming to have empty unquoted attribute values like <input value=>
r2130 provides a special case for the definitionURL attribute for those embedding MathML in HTML.
r2140 and r2141 allow a legacy DOCTYPE if and only if the HTML page is the result of an XSLT transform. The exact DOCTYPE string is <!DOCTYPE HTML PUBLIC 'XSLT-compat'>

Tune in next week for another exciting episode of "This Week in HTML 5."

Footnotes:

When submitting to a data:// URL with the PUT method, the three values of enctype are application/x-www-form-urlencoded, multipart/form-data, and text/plain. Amaze your friends at the next tech conference!

Posted in Weekly Review | 15 Comments »

This Week in HTML 5 – Episode 4

Thursday, August 28th, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news this week is the birth of the W3C's experimental HTML 5 validator (announcement). It is based on Henri Sivonen's experimental HTML 5 validator, although there are still some integration bugs to shake out. Related discussion on Sam Ruby's blog.

SVG is back in the news. In a presentation to the Mozilla Corporation in December 2005, a Firefox developer asked me what I had against SVG. I replied, "I have nothing against SVG; make it work in HTML." Last week, Doug Schepers, on behalf of the SVG Working Group, reported that their SVG-in-HTML proposal was ready for another review, having incorporated the feedback from their first draft, released in July. Earlier today, Ian Hickson provided his review of the latest SVG-in-HTML proposal. You should read the whole thing, as it details the goals of the HTML Working Group and how they relate to the possible inclusion of SVG. Ian concluded with this:

In general, my conclusions are are [sic] somewhat negative:

There are a lot of goals that aren't met.

It seems to me that this proposal goes to great lengths to support some syntax (e.g. namespaces) despite evidence that doing so is not necessary, and it makes sacrifices regarding potential optimisations (like making the tokeniser case-insensitive, avoiding substring searches, avoiding attribute searches) despite evidence that browsers consider performance critical.

It leaves some aspects quite poorly defined, such as how encoding errors are handled, exactly where parse errors are to be established as occuring, and how the XML parser is expected to interact with document.write().

It rather poorly handles typical authoring mistakes such as copying and pasting half of an SVG or MathML fragment into an HTML page, or omitting namespace declarations altogether.

In other news, the image alt argument is finally over! Ha ha, just kidding. But Ian Hickson did summarize all of the proposed solutions to date:

We can't require that every image have non-empty alt, because there are images that do nothing to help image-free users (A).

We can't say that making a site like Flickr requires asking all users for alternative text, since users simply won't provide that data (B, B.1).

We can't just omit alt="" with nothing else, since then users of image navigation will get lost (B.2.i).

We can't use special syntax, since it hurts sites that care about accessibility more than anyone else, which just hurts the accessibility cause (B.2.ii.a, B.2.ii.b, B.2.ii.c).

We can't introduce a new attribute because this will legitimise omitting alt far too much, again hurting the accessibility cause, and any new attribute will likely be misused to the point of making the attribute useless, due to the copy-paste mentality of authors who don't understand the spec (B.2.iii.a, B.2.iii.b, .2.iii.c.I, B.2.iii.c.II, B.2.iii.c.III).

We can't just use alt="" with captions instead of replacement text, as that would both give a mixed message for authors, reducing the quality of alternative text in general, and would make it harder to understand pages with a lot of images even if they used alt="" correctly, if they sometimes had to use this technique (B.2.iv).

We can't require that all such images be links or be in a <figure>, since both of these over-constrain the author and will likely just be requirements that are ignored (B.2.v, B.2.vi).

We don't want to have multiple levels of conformance because authors seem happy to aim for the lower level (as seen with HTML4 Transitional), and because just doing this still doesn't address the problem (we have to pick one of the other solutions for the "lesser" conformance class), and because this isn't necessarily something that is fixable (we want full conformance to be something that authors can always aim for) (B.3).

We don't want to just say authors can punt on alternative text altogether, as that doesn't help accessibility (C).

We don't want to not require alternative text at all, since in most cases alternative text is quite easy to add and massively helps non-image users (D).

We don't want to ban alternative text as there is simply no other alternative for handling images these days (E).

As you might expect, this generated much followup discussion. Some accessibility experts liked it, others didn't. John Foliot still felt that alt should be required. I'd bet good money that this won't be the last word on the subject. See revisions 2106, 2110, 2113, and 2115.

Other interesting changes this week:

Revision 2099 attempts to make the <style scoped> content model clearer. Reported by Henri Sivonen.
Revision 2106 mentions how to provide alternate text for a Rorschach inkblot test. Seriously. Reported by Steven Faulkner.
Revision 2117 changes postMessage() to return void instead of boolean, since "even if the worker is currently up and running when postMessage is called, there is no guarantee that the worker will run long enough to actually get to process the message." Reported by Jonas Sicking.
Revision 2119 renames the irrelevant attribute to hidden. Reported by Mihai Sucan.

I will be on vacation next week, so tune in in two weeks for a special double feature of "This Week in HTML 5." Try not to break the web while I'm gone.

Posted in Weekly Review | 4 Comments »

This Week in HTML 5 – Episode 3

Friday, August 22nd, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The biggest news this week is the birth of the event loop.

To coordinate events, user interaction, scripts, rendering, networking, and so forth, user agents must use event loops as described in this section.

... An event loop has one or more task queues. A task queue is an ordered list of tasks, which can be:

Events

Asynchronously dispatching an Event object at a particular EventTarget object is a task.

Parsing

The HTML parser tokenising a single byte, and then processing any resulting tokens, is a task.

Callbacks

Calling a callback asynchronously is a task.

Using a resource

When an algorithm fetches a resource, if the fetching occurs asynchronously then the processing of the resource once some or all of the resource is available is a task.

Reacting to DOM manipulation

Some elements have tasks that trigger in response to DOM manipulation, e.g. when that element is inserted into the document.

The purpose of defining an event loop is to unify the definition of things that happen asychronously. (I want to avoid saying "events" since that term is already overloaded.) For example, if an image defines an onload callback function, exactly when does it get called? Questions like this are now answered in terms of adding tasks to a queue and processing them in an event loop.

Revision 2074 defines event loops and task queues (as quoted above).
Revision 2076, 2079, 2080, 2081, 2082, and 2083 define the behavior of media elements (like <audio> and <video>) in terms of the event loop.
Revision 2084 defines the behavior of template and ref attributes, local database storage, and remote events in terms of the event loop.
Revision 2085 defines the behavior of web sockets, postMessage, message ports, and setTimeout in terms of the event loop.
Revision 2097 defines the behavior of an image's load event in terms of the event loop.

The other major news this week is the addition of the hashchange event, which occurs when the user clicks an in-page link that goes somewhere else on the same page, or when a script programmatically sets the location.hash property. This is primarily useful for AJAX applications that wish to maintain a history of user actions while remaining on the same page. As a concrete example, executing a search of your messages in GMail takes you to a list of search results, but does not change the base URL, just the hash; clicking the Back button takes you back to the previous view within GMail (such as your inbox), again without changing the base URL (just the hash). GMail employs some nasty hacks to make this work in all browsers; the hashchange event is designed to make those hacks slightly less nasty. Microsoft Internet Explorer 8 pioneered the hashchange event, and its definition in HTML 5 is designed to match Internet Explorer's behavior.

Other interesting changes this week:

In last week's episode, I mentioned revision 2063, which allows HTML documents to contain both xml:lang and lang attributes as long as they are identical. Revision 2091 relaxes this restriction slightly to allow the xml:lang and lang attributes to differ by case (i.e. one could be uppercase and the other could be lowercase, and that is no longer an error). Discussion: xml:lang="" and lang=""
Revision 2092 defines the parsing algorithm for empty table rows.
Revision 2094 clarifies the meaning of whitespace by deferring to the Unicode definitions.
Revision 2096 forbids content sniffing for SVG images. In order to use an SVG image in an <img src=""> attribute, the web server must ensure that the SVG image is served with a Content-Type: image/svg+xml HTTP header.

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 4 Comments »

This Week in HTML 5 – Episode 2

Thursday, August 14th, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The biggest news this week is revision 2020, which standardizes the navigator object:

The navigator attribute of the Window interface must return an instance of the Navigator interface, which represents the identity and state of the user agent (the client), and allows Web pages to register themselves as potential protocol and content handlers.

Currently, HTML 5 defines four properties and two methods:

appName
appVersion
platform
userAgent
registerProtocolHandler
registerContentHandler

This is only a subset of navigator properties and methods that browsers already support. See Navigator Object on Google Doctype for complete browser compatibility information.

Next up: Content-Language. No, not the HTTP header, not even the <html lang> attribute, but the <meta> tag! As reported by Henri Sivonen,

It seems that some authoring tools and authors use <meta http-equiv='content-language' content='languagetag'> instead of <html lang='languagetag'>.

This led to revision 2057, which defines the <meta> http-equiv="Content-Language"> directive and its relationship with lang, xml:lang, and the Content-Language HTTP header.

In the continuing saga of the alt attribute, the new syntax for alternate text of auto-generated images (which I covered in last week's episode) has generated some followup discussion. Philip Taylor is concerned that it will increase complexity for authoring tools; others feel the complexity is worth the cost. James Graham suggested a no-text-equivalent attribute; similar proposals have been discussed before and rejected.

Switching to the new Web Workers specification (which I also covered last week), Aaron Boodman (one of the developers of Google Gears) posted his initial feedback. This kicked off a long discussion and led to the creation of the Worker object.

Other interesting changes this week:

Revision 2034 and revision 2035 define the outerHTML property, and revision 2040 defines the insertAdjacentHTML method. Both properties originally appeared in Microsoft Internet Explorer 5 (outerHTML on MSDN, insertAdjacentHTML on MSDN).
Revision 2044 disallows scripts executing while an alert is displayed.
Revision 2046 requires that <script src="javascript:"> not execute script (reported by Simon Pieters)
Revision 2063 allows an HTML document to declare xml:lang if and only if it also declares lang, to ease migration between HTML and XHTML. The language values must be identical. (Reported by Simon Pieters.)
Revision 2064 defines the behavior when calling document.open("text/plain"). Re: type parameter of Document.open() (detailed review of the DOM) documents the incompatibilities in existing browsers.
Revision 2066 defines the order for getElementsByName() (reported by Maciej Stachowiak)
Revision 2068 defines window.frameElement
Revision 2069: don't require Document.location to do anything when the Document isn't in a Window (reported by Anne van Kesteren)

Administrivia: "This Week in HTML 5" now has its own feed.

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 3 Comments »