The WHATWG Blog — 2008

Archive for November, 2008

This Week in HTML 5 – Episode 14

Tuesday, November 25th, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news this week is a radical proposal for integrating HTTP authentication with HTML forms. r2432 defines a new token for the WWW-Authenticate header: "HTML".

A common use for forms is user authentication. To indicate that an HTTP URL requires authentication through such a form before use, the HTTP 401 response code with a WWW-Authenticate challenge "HTML" may be used.

For this authentication scheme, the framework defined in RFC2617 is used as follows. [RFC2617]
challenge = "HTML" [ form ]
form      = "form" "=" form-name 
form-name = quoted-string
The form parameter, if present, indicates that the first form element in the entity body whose name is the specified string, in tree order, if any, is the login form. If the parameter is omitted, then the first form element in the entity body, in tree order, if any, is the login form.

There is no credentials production for this scheme because the login information is to be sent as a normal form submission and not using the Authorization HTTP header.

This idea has been kicked around for more than a decade. Microsoft wrote User Agent Authentication Forms in 1999. Mark Nottingham asked the WHATWG to investigate the idea in 2004. Better late than never, Ian Hickson summarizes the feedback to date. No doubt this new proposal will generate further discussion. No browsers currently support this proposal.

Other interesting tidbits this week:

r2429 adds the <input type=search> form field. [<input type=search> discussion]
r2440 allows the multiple attribute to appear on <input type=email> and <input type=file>. [<input type=email multiple> discussion]
r2423 specifies how <object> elements are submitted in forms. Unbeknownst to me, this feature was present in HTML 4 and is supported across multiple browsers. If a plugin exposes a value getter, the name of the <object> element is submitted with the value exposed by the plugin. [<object> form submission example, Mozilla bug 188938]
r2434 seriously revamps the concept of "vaguer moments in time." r2433 notes, correctly, that there is no year zero in the Gregorian calendar. r2437 further refines the calculation of dates before 1582. [date and time discussion]
r2426 clarifies the fallback behavior of the <object> element.
r2427 documents existing browser behavior in sending all attributes and attribute values to a plugin invoked from an <object> element. Previously, HTML 5 has specified that only specific parameters were sent, but browsers consistently send all attributes, so there it is.
r2424 explains the intended audience of the HTML 5 specification itself.

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 7 Comments »

This Week in HTML 5 – Episode 13

Tuesday, November 18th, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news this week is a major revamping of how browsers should process multimedia in the <audio> and <video> elements.

r2404 makes a number of important changes. First, the canPlayType() method has moved from the navigator object to HTMLMediaElement (i.e. a specific <audio> or <video> element), and it now returns a string rather than an integer. [canPlayType() discussion]

The canPlayType(type) method must return the string "no" if type is a type that the user agent knows it cannot render; it must return "probably" if the user agent is confident that the type represents a media resource that it can render if used in with this audio or video element; and it must return "maybe" otherwise. Implementors are encouraged to return "maybe" unless the type can be confidently established as being supported or not. Generally, a user agent should never return "probably" if the type doesn't have a codecs parameter.

Wait, what codecs parameter? That's the second major change: the <source type> attribute (which previously could only contain a MIME type like "video/mp4", which is insufficient to determine playability) can now contain a MIME type and a codecs parameter. As specified in RFC 4281, the codecs parameter specifies the specific codecs used by the individual streams within the audio/video container. The section on the type attribute contains several examples of using the codecs parameter.

Third, the <source type> attribute is now optional. If you aren't sure what kind of video you're serving, you can just throw one or more <source> elements into a <video> element and the browser will try each of them in the order specified [r2403] until it finds something it can play. [load() algorithm discussion] Of course, if you include a type attribute (and codecs parameter within it), the browser may use it to determine playability without loading multiple resources, but this is no longer required.

The final change (this week) to multimedia elements is the elimination of the start, end, loopstart, loopend, and playcount attributes. They are all replaced by a single attribute, loop, which takes a boolean. To handle initially seeking to a specific timecode (like the now-eliminated start attribute), the HTML 5 spec vaguely declares, "For example, a fragment identifier could be used to indicate a start position." This obviously needs further specification.

One multimedia-related issue that did not change in the spec this week is same-origin checking for media elements. Robert O'Callahan asked whether video should be allowed to load from another domain, noting (correctly) that it could lead to information leakage about files posted on private intranets. Chris Double outlines the issues and some proposed solutions. However, contrary to Chris' expectation, HTML 5 will not (yet) mandate cross-site restrictions for audio/video files. This is an ongoing discussion. [WHATWG discussion thread, Theora discussion thread]

In other news, Ian Hickson summarized the discussion around the <input placeholder> attribute (which I first mentioned in This Week in HTML 5 Episode 8) and committed r2409 that defines the new attribute:

The placeholder attribute represents a short hint (a word or short phrase) intended to aid the user with data entry. A hint could be a sample value or a brief description of the expected format.

For a longer hint or other advisory text, the title attribute is more appropriate.

The placeholder attribute should not be used as an alternative to a label.

User agents should present this hint to the user only when the element's value is the empty string and the control is not focused (e.g. by displaying it inside a blank unfocused control).

Read the section on the placeholder attribute for an example of its proper use.

Other interesting tidbits this week:

Ian Hickson explains why the <link rev> attribute was dropped.
Lots of feedback about the Web Workers draft.
Lots of feedback about the WebSocket interface.
r2413 adds a resolveURL() method to the window.location object. [resolveURL() discussion]
r2406 defines negative pixelratio.
r2407 defines how often browsers should fire the timeupdate event while playing audio or video.
r2416 simplifies the content model of the <menu> element. [<menu> discussion]

Around the web:

The W3C published an editor's draft of Lachlan Hunt's Web Developer's Guide to HTML 5.
Austin Chau posted a demo of HTML 5 cross-document messaging. Further discussion: Using HTML5 postMessage, postMessage API changes, and the unfortunately-named xssinterface library which implements a postMessage-like API in browsers that do not yet support it.
Ryan Tomayko posted an excellent summary of things caches do, specifically HTTP caches like Squid and rack-cache.
Joe Clark posted This is How the Web Gets Regulated, a call to action on video accessibility.
mv_embed is a GPL-licensed Javascript shim that takes <video> elements that point to Ogg Theora video files and replaces them with plugin-specific markup to play the video through oggplay, vlc-plugin, Java cortado, mplayer, Totem, or Apple Quicktime (if Xiph's Ogg Theora Quicktime component is installed). A demo page demonstrates the technique.
Everyone should go admire my new dog Beauregard, then scroll down to read "dave"'s non-Beau-related but extremely interesting comment on an experimental Ogg Theora video encoder. From there, I learned about this Ogg Vorbis audio decoder written in pure ActionScript (Flash), leading to the tantalizing but as-yet-unrealized possibility of a Javascript shim like mv_embed that could take <audio> elements that point to Ogg Vorbis audio files and replace them with a Flash wrapper that could play the audio file, even in browsers that do not support the <audio> element or the Ogg Vorbis audio codec.

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 4 Comments »

The Road to HTML 5: getElementsByClassName()

Tuesday, November 11th, 2008

Welcome back to my semi-regular column, "The Road to HTML 5," where I'll try to explain some of the new elements, attributes, and other features in the upcoming HTML 5 specification.

The feature of the day is getElementsByClassName(). Long desired by web developers and implemented in Javascript libraries like Prototype, this function does exactly what it says on the tin: it returns a list of elements in the DOM that define one or more classnames in the class attribute. getElementsByClassName() exists as a method of the document object (for searching the entire DOM), as well as on each HTMLElement object (for searching the children of an element).

The HTML 5 specification defines getElementsByClassName():

The getElementsByClassName(classNames) method takes a string that contains an unordered set of unique space-separated tokens representing classes. When called, the method must return a live NodeList object containing all the elements in the document, in tree order, that have all the classes specified in that argument, having obtained the classes by splitting a string on spaces. If there are no tokens specified in the argument, then the method must return an empty NodeList. If the document is in quirks mode, then the comparisons for the classes must be done in an ASCII case-insensitive manner, otherwise, the comparisons must be done in a case-sensitive manner.

A Brief History of getElementsByClassName()

September 2005: getElementsByClassName() Implementation Questions and followups discuss whether getElementsByClassName() should take a string with multiple classnames, and if so, what that would mean. Despite early protests, it was eventually decided that getElementsByClassName() should take multiple classnames in a single string, separated by spaces, and that the function should return elements that define all of given classnames.
February 2006: "I see this is still an open issue."
October 2006: "This omnibus edition of your WHATWG mail includes replies to 50 or so separate e-mails about getElementsByClassName()."
February 2007: "I landed support for this on the Mozilla trunk, and it will appear in upcoming Firefox alphas and betas. The array argument turned out to be quite a pain to implement, so we cut it." Ian Hickson responds: "I've changed the spec to use a space-separated list. ... I've dropped the array form altogether."
March 2007: John Resig puts Firefox 3's implementation to the test in getElementsByClassName Speed Comparison.
July 2007: "I suggest that participants in this thread reacquaint themselves with the discussion in the previous one before having it again." Good luck with that.
September 2007: "A popular and useful feature [in Opera 9.5] will be the addition of native support for getElementByClassName from HTML5."
December 2007: "Last week WebKit joined upcoming versions of Firefox and Opera in supporting [getElementsByClassName]."
December 2007: "Currently getElementsByClassName is specified as always being case sensitive. This would be fine, except that the primary other use of html class names (CSS selector matching) is only case sensitive in standards mode. This leads to situations ... in which CSS and getElementsByClassName match different results."
July 2008: Ian Hickson responds: "I've made it consistent with how classes work in CSS (case-insensitive for quirks and case-sensitive otherwise)."

Can We Use It?

Yes We Can! As you can tell from the timeline, getElementsByClassName() is supported natively in Firefox 3, Opera 9.5, Safari 3.1, and all versions of Google Chrome. It is not available in any version of Microsoft Internet Explorer. (IE 8 beta 2 is the latest version as of this writing.) To use it in browsers that do not support it natively, you will need a wrapper script. There are many such scripts; I myself am partial to Robert Nyman's Ultimate GetElementsByClassName. It uses the native getElementsByClassName() method in modern browsers that support it, then falls back to the little-known document.evaluate() method, which is supported by older versions of Firefox (since at least 1.5) and Opera (since at least 9.27). If all else fails, Robert's script falls back to recursively traversing the DOM and collecting elements that match the given classnames.

And in conclusion

getElementsByClassName() is well-supported across all modern browsers except IE, and a performance-optimized open source wrapper script can cover IE and older browsers.

Posted in Tutorials, WHATWG | 7 Comments »

This Week in HTML 5 – Episode 12

Tuesday, November 11th, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group. The primary editor was traveling this week, so there are very few spec changes to discuss. Instead, I'd like to try something a little different.

It has been suggested (1, 2, 3, &c.) that HTML 5 is trying to bite off more than it can metaphorically chew. It is true that it is a large specification, and it might benefit from being split into several pieces. But it is not true that it includes everything but the kitchen sink.

For example, HTML 5 will not

Daniel Schattenkirchner asked whether Almost-Standards mode is still needed. Almost-Standards mode is a form of DOCTYPE sniffing invented by Mozilla in 2002 to address line heights in table cells containing images. Bug 153032 implemented the new mode, which Mozilla called "Almost Standards mode" and HTML 5 calls "limited quirks mode." Henri Sivonen made the point that it would probably be too expensive to get rid of the mode. Like it or not, we're probably stuck with it.

And finally, a gem that I missed when it was first discussed: back in July, "Lars" provided the best documentation of the <keygen> element, ever.

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 3 Comments »

The Road to HTML 5 – Episode 1: the section element

Wednesday, November 5th, 2008

Welcome to a new semi-regular column, "The Road to HTML 5," where I'll try to explain some of the new elements, attributes, and other features in the upcoming HTML 5 specification.

The element of the day is the <section> element.

The section element represents a generic document or application section. A section, in this context, is a thematic grouping of content, typically with a header, possibly with a footer. Examples of sections would be chapters, the various tabbed pages in a tabbed dialog box, or the numbered sections of a thesis. A Web site's home page could be split into sections for an introduction, news items, contact information.

Discussion of sections and headers dates back several years. In November 2004, Ian Hickson wrote:

Basically I want three things:

It has to be possible to take existing markup (which correctly uses <h1>-<h6>) and wrap the sections up with <section> (and the other new section elements) and have it be correct markup. Basically, allowing authors to replace <div class="section"> with <section>, <div class="post"> with <article>, etc.

It has to be possible to write new documents that use the section elements and have the headers be automatically styled to the right depth (and maybe automatically numbered, with appropriate CSS), and yet still be readable in legacy UAs, without having to think about old UAs. Basically, the header element has to be header-like in old browsers.

It shouldn't be too easy to end up with meaningless markup when doing either of the above. So a random <h4> in the middle of an <h2> and an <h3> has to be defined as meaning _something_.

At the moment what I'm thinking of doing is this (most of these ideas are in the draft at the moment, but mostly in contradictory ways):

The section elements would be:

<body> <section> <article> <navigation> <sidebar>

The header elements would be:

<header> <h1> <h2> <h3> <h4> <h5> <h6>

<h1> gives the heading of the current section.

<header> wraps block-level content to mark the whole thing as a header, so that you can have, e.g., subtitles, or "Welcome to" paragraphs before a header, or "Presented by" kind of information. <header> is equivalent to an <h1>. The first highest-level header in the <header> is the "title" of the section for outlining purposes.

<h2> to <h6> are subsection headings when used in <body>, and equivalent to <h1> when used in one of the section elements.

<h1> automatically sizes to fit the current nesting depth. This could be a problem in CSS since CSS can't handle this kind of thing well -- it has no "or" operator at the simple selector level.

<h2>-<h6> keep their legacy renderings for compatibility.

Further discussion:

August 2004: <section> and headings
September 2004: <section> and headings
November 2004: The Section Header Problem
March 2005: <h1> to <h6> in <body>
April 2005: <section>and headings and other threads

Fast-forward to modern times. Using the <section> element instead of, say, <div class="section">, seems like a no-brainer. Unfortunately, there's a catch. (Hey, it's the web; there's always a catch.) Not all modern browsers recognize the <section> element, which means that they fall back to their default handling of unknown elements.

A long digression into browsers' handling of unknown elements

Every browser has a master list of HTML elements that it supports. For example, Mozilla Firefox's list is stored in nsElementTable.cpp. Elements not in this list are treated as "unknown elements." There are two fundamental problems with unknown elements:

How should the element be styled? By default, <p> has spacing on the top and bottom, <blockquote> is indented with a left margin, and <h1> is displayed in a larger font.
What should the element's DOM look like? Mozilla's nsElementTable.cpp includes information about what kinds of other elements each element can contain. If you include markup like <p><p>, the second paragraph element implicitly closes the first one, so the elements end up as siblings, not parent-and-child. But if you write <p><span>, the span does not close the paragraph, because Firefox knows that <p> is a block element that can contain the inline element <span>. So the <span> ends up as a child of the <p> in the DOM.

Different browsers answer these questions in different ways. (Shocking, I know.) Of the major browsers, Microsoft Internet Explorer's answer to both questions is the most problematic.

The first question should be relatively simple to answer: don't give any special styling to unknown elements. Just let them inherit whatever CSS properties are in effect wherever they appear on the page, and let the page author specify all styling with CSS. Unfortunately, Internet Explorer does not allow styling on unknown elements. For example, if you had this markup:

<style type="text/css">
  section { border: 1px solid red }
</style>
...
<section>
<h1>Welcome to Initech</h1>
<p>This is our <span>home page</span>.</p>
</section>

Internet Explorer (up to and including IE8 beta 2) will not put a red border around the section.

The second problem is the DOM that browsers create when they encounter unknown elements. Again, the most problematic browser is Internet Explorer. If IE doesn't explicitly recognize the element name, it will insert the element into the DOM as an empty node with no children. All the elements that you would expect to be direct children of the unknown element will actually be inserted as siblings instead. I've posted an ASCII graph that illustrates this mismatch.

Sjoerd Visscher discovered a workaround for this problem: after you create a dummy element with that name, IE will recognize the element enough to let you style it with CSS. You can put the script in the <head> of your page, and there is no need to ever insert it into the DOM. Simply creating the element once (per page) is enough to teach IE to style the element it doesn't recognize. Sample code and markup:

<html>
<head>
<style type="text/css">
  section { display: block; border: 1px solid red }
</style>
<script type="text/javascript">
  document.createElement("section");
</script>
</head>
<body>
<section>
<h1>Welcome to Initech</h1>
<p>This is our <span>home page</span>.</p>
</section>
</body>
</html>

This hack works in IE 6, IE 7, and IE 8 beta 1, but it doesn't work in IE 8 beta 2. (bug report, test case) The purpose of this illustration is not to blame IE; there's no specification that says what the DOM ought to look like in this case, so IE's handling of the "unknown element" problem is not any more or less correct than any other browser. With the createElement workaround, you can use the <section> element (or any other new HTML 5 element) in all browsers except IE 8 beta 2. I am not aware of any workaround for this problem.

And in conclusion

The <section> element is a very straightforward HTML 5 feature that you can't actually use yet.

Posted in Tutorials, Weekly Review | 8 Comments »