The WHATWG Blog — thisweekinhtml5

This Week in HTML 5 – Episode 11

Monday, November 3rd, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group. Last Friday was Halloween for some of you; in the United States, it involves dressing up in slutty costumes, begging your neighbors for handouts, and getting diabetes. Yesterday, many of you set your clocks back one hour for Daylight Savings Time. And for those of you on the Gregorian calendar, it is now November.

Dates and times loom large in this week's updates. "What is today's date?" is a deceptively simple question, matched in complexity only by the related question, "What time is it?" Sources for Time Zone and Daylight Saving Time Data gives a good overview of the current state of the art for answering both questions. In the movie Crocodile Dundee, Mick says he once asked an Aboriginal elder when he was born; the elder replied, "in the summertime."

r2381 defines global dates and times:

A global date and time consists of a specific Gregorian date, consisting of a year, a month, and a day, and a time, consisting of an hour, a minute, a second, and a fraction of a second, expressed with a time zone, consisting of a number of hours and minutes.

r2382 defines local dates and times:

A local date and time consists of a specific Gregorian date, consisting of a year, a month, and a day, and a time, consisting of an hour, a minute, a second, and a fraction of a second, but expressed without a time zone.

r2383 defines a month:

A month consists of a specific Gregorian date with no timezone information and no date information beyond a year and a month.

r2384 and r2385 define a week:

A week consists of a week-year number and a week number representing a seven day period. Each week-year in this calendaring system has either 52 weeks or 53 weeks, as defined below. A week is a seven-day period. The week starting on the Gregorian date Monday December 29th 1969 (1969-12-29) is defined as week number 1 in week-year 1970. Consecutive weeks are numbered sequentially. The week before the number 1 week in a week-year is the last week in the previous week-year, and vice versa.

A week-year with a number year that corresponds to a year year in the Gregorian calendar that has a Thursday as its first day (January 1st), and a week-year year where year is a number divisible by 400, or a number divisible by 4 but not by 100, has 53 weeks. All other week-years have 52 weeks.

The week number of the last day of a week-year with 53 weeks is 53; the week number of the last day of a week-year with 52 weeks is 52.

Note: The week-year number of a particular day can be different than the number of the year that contains that day in the Gregorian calendar. The first week in a week-year year is the week that contains the first Thursday of the Gregorian year year.

<input> form elements can be declared to take a local date and time, a global date and time, a date, a time, a month, or a week. You can also declare a global date and time in a <time> element or in the datetime attribute of <ins> and <del>.

HTML 5 does not define weekends or holidays, and therefore does not define business days. Interstellar datekeeping has been pushed back to HTML 6.

In other news, Chris Wilson suggested a different strategy for the much-maligned <q> element, which kicked off a long discussion, which in turn spawned several tangential discussions: <q> and commas, <q> vs <p>, UA style sheet for <q>, <q addmarks=true>, and the overly-optimistically-titled Final thoughts on <q>. The basic problem is that, while HTML 4 clearly states that user agents should render with delimiting quotation marks, Microsoft Internet Explorer (prior to IE8b2) did not do so. IE8b2 does do so, but it falls back to client-side regional settings to display quotation marks in pages where the author has not specified the language (which is the vast majority of pages). Also, in some languages, convention dictates alternating single and double quotes for nested quotations, but HTML 4 did not specify how to handle this, and different browsers handle nested quotation marks in different ways.

Other interesting tidbits this week:

r2361 clarifies that disabled form controls don't see click events.
r2362 suggests how to inform users about the acceptable patterns available in an <input pattern="..."> element.
r2363 changes the serialization algorithm for HTML fragments. Less-than (<) and greater-than (>) characters are no longer escaped in attribute values. This seems important.
r2371 defines the optional 4th argument, selected, which can be passed to Option() to create a new <option> element. Hat tip: Anne van Kesteren.
r2379 defines implicit form submission.
r2368 clarifies that <input value> attributes may not contain carriage returns or line feeds.
A controversial feature of the original Web Forms 2 was that form controls could be associated with multiple forms. HTML 5 has now dropped this feature; form controls can only be associated with (at most) one form. Hat tip: Anne van Kesteren.
Ian Hickson suggests a simple script to test for Web Forms 2 support.
Aaron Leventhal provides feedback on the algorithm to associate table headers with table cells, from the perspective of assistive technologies, which are the primary target audience for such an algorithm.
Ian Hickson is asking for co-editors for 10 parts of HTML 5 that could reasonably be split off into separate specifications.

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 1 Comment »

This Week in HTML 5 – Episode 10

Monday, October 20th, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news this week is offline caching. This has been in HTML 5 for a while, but this week Ian Hickson caught up with his email and integrated all outstanding feedback. He summarizes the changes:

Made the online whitelist be prefix-based instead of exact match. [r2337]

Removed opportunistic caching, leaving only the fallback behavior part. [r2338]

Made fallback URLs be prefix-based instead of only path-prefix based (we no longer ignore the query component). [r2343]

Made application caches scoped to their browsing context, and allowed iframes to start new scopes. By default the contents of an iframe are part of the appcache of the parent, but if you declare a manifest, you get your own cache. [r2344]

Made fallback pages have to be same-origin (security fix). [r2342]

Made the whole model treat redirects as errors to be more resilient in the face of captive portals when offline (it's unclear what else would actually be useful and safe behavior anyway). [r2339]

Fixed a bunch of race conditions by redefining how application caches are created in the first place. [r2346]

Made 404 and 410 responses for application caches blow away the application cache. [r2348]

Made checking and downloading events fire on ApplicationCache objects that join an update process midway. [r2353]

Made the update algorithm check the manifest at the start and at the end and fail if the manifest changed in any way. [r2350]

Made errors on master and dynamic entries in the cache get handled in a non-fatal manner (and made 404 and 410 remove the entry). [r2348]

Changed the API from .length and .item() to .items and .hasItem(). [r2352]

And now, a short digression into video formats...

You may think of video files as "AVI files" or "MP4 files". In reality, "AVI" and "MP4" are just container formats. Just like a ZIP file can contain any sort of file within it, video container formats only define how to store things within them, not what kinds of data are stored. (It's a little more complicated than that, because container formats do limit what codecs you can store in them, but never mind.) A video file usually contains multiple tracks -- a video track (without audio), one or more audio tracks (without video), one or more subtitle/caption tracks, and so forth. Tracks are usually inter-related; an audio track contains markers within it to help synchronize the audio with the video, and a subtitle track contains time codes marking when each phrase should be displayed. Individual tracks can have metadata, such as the aspect ratio of a video track, or the language of an audio or subtitle track. Containers can also have metadata, such as the title of the video itself, cover art for the video, episode numbers (for television shows), and so on.

Individual video tracks are encoded with a certain video codec, which is the algorithm by which the video was authored and compressed. Modern video codecs include H.264, DivX, VC-1, but there are many, many others. Audio tracks are also encoded in a specific codec, such as MP3, AAC, or Ogg Vorbis. Common video containers are ASF, MP4, and AVI. Thus, saying that you have sent someone an "MP4 file" is not specific enough for the recipient to determine if they can play it. The recipient needs to know the container format (such as MP4 or AVI), but also the video codec (such as H.264 or Ogg Theora) and the audio codec (such as MP3 or Ogg Vorbis). Furthermore, video codecs (and some audio codecs) are broad standards with multiple profiles, so saying that you have sent someone an "MP4 file with H.264 video and AAC audio" is still not specific enough. An iPhone can play MP4 files with "baseline profile" H.264 video and "low complexity" AAC audio. (These are well-defined technical terms, not laymen's terms.) Desktop Macs can play MP4 files with "main profile" H.264 video and "main profile" AAC audio. Adobe Flash can play MP4 files with "high profile" H.264 video and "HE" AAC audio. Of course, it's a little more complicated than that.

Thus...

r2332 adds a navigator.canPlayType() method. This is intended for scripts to query whether the client can play a certain type of video. There are two major problems with this: first, MIME types are not specific enough, as they will only describe the video container. Learning that the client "can play" MP4 files is useless without knowing what video codecs it supports inside the container, not to mention what profiles of that video codec it supports. The second problem is that, unless the browser itself ships with support for specific video and audio codecs (as Firefox 3.1 will do with Ogg Theora and Ogg Vorbis), it will need to rely on some multimedia library provided by the underlying operating system. Windows has DirectShow, Mac OS X has QuickTime, but neither of these libraries can actually tell you whether a codec is supported. The best you can do is try to play the video and notice if it fails. [WHATWG thread]

Other interesting changes and discussions this week:

r2333 changes the data type of the width and height attributes on <embed>, <object>, and <iframe> elements to match current browser behavior. These attributes reflect strings, not integers. No one knows why.
Ian Hickson kicked off another round of video accessibility discussion, with this philosophical statement:

Fundamentally, I consider <video> and <audio> to be simply windows onto pre-existing content, much like <iframe>, but for media data instead of for "pages" or document data. Just as with <iframe>s, the principle I had in mind is that it should make sense for the user to take the content of the element and view it independent of its hosting page. You should be able to save the remote file locally and open it in a media player and you should be able to write a new page with a different media player interface, without losing any key aspect of the media. In particular, any accessibility features must not be lost when doing this. For example, if the video has subtitles or PiP hand language signing, or multiple audio tracks, or a transcript, or lyrics, or metadata, _all_ of this data should survive even if the video file is saved locally without the embedding page.

In other words, video accessibility should be handled within the video container, not in the surrounding HTML markup. On the plus side, all modern video containers can handle subtitle tracks, secondary audio tracks, and so forth. Unfortunately, authors may be hesitant to add to their bandwidth costs by including tracks that must be downloaded by everyone but appreciated (or even noticed) by very few.

[W3C discussion thread on video accessibility]
Sander van Zoest noticed the pixelaspectratio attribute of the <video> element, and he asked why it was a float instead of a ratio of two rationals (as is standard practice in the video authoring world). Ultimately, he agreed with Eric Carlson that pixelaspectratio should be dropped from HTML 5 because it doesn't really give enough information about how to scale the video properly. As with so many other things in the video world, the problem is much more complicated that it first appears.

Around the web:

Chris Double posted an update on Firefox 3.1's Ogg Theora video support and a list of demo sites.
WGBH outlines the regulations surrounding closed captioning and video description and the state of the art for captioning and description for the Web.
Joe Clark, in his book Building Accessible Websites, states "Video is still a bit of a boondoggle. And making video accessible is so difficult you had best leave the job to the experts. And at present, there is no way for you the Web developer to become an expert." The book was published in 2002, but the point remains true today.
A list of use cases for video accessibility
Multimedia accessibility proposals

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 5 Comments »

This Week in HTML 5 – Episode 9

Tuesday, October 14th, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

Most of the changes in the spec this week revolve around the <textarea> element.

r2305 covers editing a <textarea>.
r2309 defines the rows and cols attributes.
r2310 defines the wrap attribute. (Long supported by Netscape and still supported in Internet Explorer, the wrap attribute has never been standardized until now.)
r2311 defines the maxlength attribute.
r2312 defines the required attribute, also new in HTML 5.
r2313 removes support for the accept attribute, which has always been problematic and its (limited) potential has never been implemented. This only affects the <textarea> element; <input type=file> elements still have an accept attribute that controls what types of files may be uploaded.

Shelley Powers pointed out that I haven't mentioned the issue of distributed extensibility yet. (The clearest description of the issue is Sam Ruby's message from last year, which spawned a long discussion.) The short version: XHTML (served with the proper MIME type, application/xhtml+xml) supports embedding foreign data in arbitrary namespaces, including SVG and MathML. None of these technologies (XHTML, SVG, or MathML) have had much success on the public web. Despite Chris Wilson's assertion that "we cannot definitively say why XHTML has not been successful on the Web," I think it's pretty clear that Internet Explorer's complete lack of support for the application/xhtml+xml MIME type has something to do with it. (Chris is the project lead on Internet Explorer 8.)

Still, it is true that XHTML does support distributed extensibility, and many people believe that the web would be richer if SVG and MathML (and other as-yet-unknown technologies) could be embedded and rendered in HTML pages. The key phrase here is "as-yet-unknown technologies." In that light, the recent SVG-in-HTML proposal (which I mentioned several weeks ago) is beside the point. The point of distributed extensibility is that it does not require approval from a standards body. "Let a thousand flowers bloom" and all that, where by "flowers," I mean "namespaces." This is an unresolved issue.

Other interesting changes this week:

r2314 ensures that the required attribute only applies to form controls whose value can change.
r2316 defines the name attribute for form controls.
r2317 defines the disabled attribute for form controls.
r2320 defines all the different ways that a form control can fail to satisfy its constraints. For example, an <input maxlength=20> element with a 21-character value.
r2322 defines exactly how form data should be encoded before being submitted to the server. I've previously mentioned character encoding in this series; this revision marks the first time that an HTML specification has acknowledged the existence of <input type=hidden name=_charset_> method of specifying the character encoding of submitted form data.
r2319 removes support for data templates and repetition templates. These were inventions in the original Web Forms 2 specification, but they were never picked up by any major browser.

Around the web:

Anne van Kesteren gave an interview on the state of several bleeding edge web standards.
Simon Pieters ponders what to do about nested <h1> elements.
In response to Olivier Gendrin, Anne van Kesteren points out that the CSSOM standard defines a window.media attribute. Unlike CSS media types, window.media would be directly queryable from Javascript.

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 2 Comments »

This Week in HTML 5 – Episode 8

Wednesday, October 8th, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

It's time to catch up on the myriad of changes to the HTML 5 spec. The big news this week is the continued merging of Web Forms 2 into HTML 5.

<button> [r2280]
<select> [r2285, r2287, r2288, r2290]
<input type="submit"> [r2269]
<input type="reset"> [r2270]
<input type="button"> [r2271]
<input type="image"> [r2276]
<input type="file">> [r2274]
<input type="checkbox"> [r2257, r2258]
<input type="radio"> [r2259]
<input type="hidden"> [r2268]
<input type="email"> [r2227]
<input type="url"> [r2228, r2231, r2235]
<input type="number"> [r2254]
<input type="range"> [r2255]
<input type="date"> [r2252]
<input type="time"> [r2253]
<input type="datetime"> [r2229, r2230, r2231, r2239, 2243, r2247, r2251]
<input type="week"> [r2252]
<input type="month"> [r2252]
<input type="datetime-local"> [r2249]

In other news, Andy Lyttle wants to standardize one particular feature of <input type="search"> (which is already supported by Safari, but not standardized): placeholder text for input fields. The text would initially display in the input field (possibly in a stylized form, smaller font, or lighter color), then disappear when the field receives focus. Lots of sites use Javascript to achieve this effect, but it is surprisingly difficult to get right, in part because no one can quite agree on exactly how it should work. Mozilla Firefox displays the name of your current search engine in its dedicated search box until you focus the search box, at which point it blanks out and allows you to type. Safari's search box is initially blank (at least on Windows), and only displays the name of your default search engine after it has received focus and lost it again. Google Chrome's "omnibox" displays "Type to search", right-justified, even when the omnibox has focus, then removes it after you've typed a single character. Adding an <input placeholder> attribute would allow each browser on each platform to match their users' expectations (and possibly even allow end-user customization) of how placeholder text should work for web forms. Discussion threads: 1, 2, 3. So far, there is no consensus on whether this should be added to HTML 5, or what the markup would look like.

Other interesting changes this week:

r2273 defines the <input required> attribute.
r2272 defines what it means to "activate" a form field, so that "clicking a button" and "setting focus to the button and pressing space" result in the same click event being triggered.
r2277 defines the <input size> attribute, which controls the displayed size of the field (but not the length of the field's value, that's <input maxlength> [r2233]).
r2278 defines the <input pattern> attribute, which is an arbitrary regular expression against which the field's value should be matched.
r2282 defines the input and change events. The input event occurs during typing in a form field (and therefore may trigger multiple times as the user types); the change event triggers when a change is committed, even if typing was not involved (such as choosing files to upload with an <input type="file"> field.
r2242 tweaks the definition of floating point numbers to allow specifying an exponent.

Around the web:

Following up on last week's article on clickjacking, the security researcher who discovered (and named) it has posted details of his discovery. Short version: it's even worse than we thought, but vendors are working on it. Here's a proof-of-concept against Adobe Flash that, quite literally, spys on you (via your webcam) without the usual warning dialogs; here's Adobe's response. NoScript now offers enhanced protection against some clickjacking attack vectors.
Anne van Kesteren gives an update on IE 8's support for HTML 5 and other emerging standards.
Matt Ryall has a good article on HTML 5, headings and sections, which documents the differences between HTML 4 and 5's header elements. My personal opinion: I once wrote a 500 page book in Docbook, a non-HTML markup language for technical writers. Docbook 3 had separate elements for <sect1>, <sect2>, <sect3>, &c, and it was a massive pain in the ass to cut-and-paste sections, or try to reuse them in different documents. Docbook 4 added a generic <section> element which can be nested indefinitely, and all those problems went away. Lots of web authors copy-and-paste HTML markup; anything that helps that "just work" is a good thing.

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 6 Comments »

This Week in HTML 5 – Episode 7

Monday, September 29th, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

Work continued this week on Web Forms 2, but I'm going to hold off on that until next week. And in case you missed it, Ian Hickson gave a tech talk on HTML 5, including live demos of some features recently implemented in nightly browser builds.

The big news this week is the disclosure of a vulnerability that researchers have dubbed "clickjacking." To understand it, start with Giorgio Maone's post, Clickjacking and NoScript. Giorgio is the author of the popular NoScript extension for Firefox. In its default configuration, NoScript protects against this vulnerability on most sites in most situations; you can configure it to defeat the attack entirely, but only at the cost of usability and functionality.

Of course, most web users do not run Firefox, and fewer still run NoScript, so web developers still need to be aware of it. Michal Zalewski's post, Dealing with UI redress vulnerabilities inherent to the current web, addresses some possible workarounds:

Using Javascript hacks to detect that window.top != window to inhibit rendering, or override window.top.location. These mechanisms work only if Javascript is enabled, however, and are not guaranteed to be reliable or future-safe. If the check is carried on every UI click, performance penalties apply, too. Not to mention, the extra complexity is just counterintuitive and weird.

Requiring non-trivial reauthentication (captcha, password reentry) on all UI actions with any potential for abuse. Although this is acceptable for certain critical operations, doing so every time a person adds Bob as a friend on a social networking site, or deletes a single mail in a webmail system, is very impractical.

Worried yet? Now let's turn to the question of what browser vendors can do to mitigate the vulnerability. Michal offers several proposals. It is important to realize that none of these proposals have been implemented yet, so don't go rushing off to your text editor and expecting them to do something useful.

Create a HTTP-level (or HTTP-EQUIV) mechanism along the lines of "X-I-Do-Not-Want-To-Be-Framed-Across-Domains: yes" that permits a web page to inhibit frame rendering in potentially dangerous situations.

Add a document-level mechanism to make "if nested <show this> else <show that>" conditionals possible without Javascript. One proposal is to do this on the level of CSS (by using either the media-dependency features of CSS or special classes); another is to introduce new HTML tags. This would make it possible for pages to defend themselves even in environments where Javascript is disabled or limited.

Add an on-by-default mechanism that prevents UI actions to be taken when a document tries to obstruct portions of a non-same-origin frame. By carefully designing the mechanism, we can prevent legitimate uses (such as dynamic menus that overlap with advertisements, gadgets, etc) from being affected, yet achieve a high reliability in stopping attacks.

Enforce a click-to-work mechanism (resembling the Eolas patent workaround) for all cross-domain IFRAMEs.

Rework everything we know about HTML / browser security models to make it possible for domains and pages to specify very specific opt-in / opt-out policies for all types of linking, referencing, such that countering UI redress attacks would be just one of the cases controlled by this mechanism.

To this list, Colin Jackson added two more suggestions:

New cookie attribute: The "httpOnly" cookie flag allows sites to put restrictions on how a cookie can be accessed. We could allow a new flag to be specified in the Set-Cookie header that is designed to prevent CSRF and "UI redress" attacks. If a cookie is set with a "sameOrigin" flag, we could prevent that cookie from being sent on HTTP requests that are initiated by other origins, or were made by frames with ancestors of other origins. In a CSRF or "UI redress" attack scenario, it will appear as though the user is not logged in, and thus the HTTP request will be unable to affect the user's account.

New HTTP request header: Browser vendors seem to be moving away from "same origin restrictions" towards "verifiable origin labels" that let the site decide whether two security origins trust each other. ... [I]nstead of making it an "X-I-Do-Not-Want-To-Be-Framed-Across-Domains: yes" HTTP response header, make it an "X-Ancestor-Frame-Origin: http://www.evil.com" HTTP request header. This header could be a list of all the origins that are ancestors of the frame that triggered the request. If the site decides it does not like the ancestor frame origin it could reject the request. This could be added as a property of MessageEvent as well to detect client-side-only UI redress attacks.

This last approach moves us down a slippery slope towards site security policies for IFRAMEs and embedded content, similar to the Flash security model that allows trusted sites to access cross-domain resources. In practice, Flash crossdomain.xml files have a number of problems, and such an approach would still only cover a fraction of the possible use cases.

You can read the full thread for all the gory details and back-and-forth among browser vendors (Maciej Stachowiak works on WebKit, Robert O'Callahan works on Firefox) and other interested parties. As Maciej notes, user experience may suffer: "[Under proposal #3] iGoogle widgets would become disabled if scrolled partially off the top of the page under your proposal. And even if scrolled back into view, would remain disabled for a second. With possibly a jarring visual effect, or alternately, no visual indication that they are disabled. Hard to decide which is worse." As Rob notes, any solution will also need to deal with IFRAMEs styled with opacity:0, related attacks using some little-known (but widely supported) capabilities of SVG, and possibly other vectors that the world collectively hasn't figured out yet. If you're getting a mental image of the game "Whack-a-Mole," you're not alone.

Ironically, the best example of "clickjacking" is the download page for the NoScript extension, which uses it for good rather than evil. Thanks to some fancy JavaScript (search for "installer"), Giorgio embeds the addons.mozilla.org download page for NoScript in an IFRAME on his own page on noscript.net, sets the IFRAME to "opacity:0" (an attack vector that Robert O'Callahan specifically warned about), scrolls the embedded addons.mozilla.org page to the top corner of its "Add to Firefox" button, and sets the z-index of the IFRAME to 100. Thus, the IFRAME is floating (due to "z-index:100") invisibly (due to "opacity:0") over Giorgio's own "Install Now" button (due to the positioning of the IFRAME element itself). When you think you're clicking the button on noscript.net you are actually clicking the button on addons.mozilla.org. What's the difference? By default, Firefox treats addons.mozilla.org as a trusted download site, so it immediately pops up the extension installation dialog instead of blocking the installation with an infobar saying "Firefox prevented this site (noscript.net) from installing software on your computer." From a user experience standpoint, this is great -- one less click to download and install an extension. From a security standpoint, this is incredibly scary -- the end user has no idea they're interacting with a third-party site.

Ian Hickson, the editor of HTML 5, weighed in with his opinion:

I would like feedback from browser vendors on this topic, ideally in the form of experimental implementations. Personally I think the idea of disabling the contents of a cross-origin iframe that has been partially obscured or rendered partially off-screen is the best idea, but whether we can adopt it depends somewhat on whether browser vendors are willing to adopt it and implement it. It requires no standards changes to implement.

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 9 Comments »