The WHATWG Blog — security

This Week in HTML5 – Episode 36

Sunday, September 27th, 2009

Since I started publishing these weekly summaries over a year ago, I've watched the HTML5 specification grow up. In episode 1, the big news of the week was the birth of an entirely new specification (Web Workers). Slowly, steadily, and sometimes painstakingly, the HTML5 specification has matured to the point where the hottest topic last week was the removal of a little-used element (<dialog>) and the struggle to find a suitable replacement for marking up conversations.

This week's changes are mundane, and I expect (and hope!) that future summaries will be even more mundane. That's a good thing; it tells me that, as implementors continue implementing and authors continue authoring, there are no show-stoppers and fewer and fewer "gotchas" to trip them up. Thus, the overarching theme this week -- and I use the term "theme" very loosely -- is "the never-ending struggle to get the details right."

Parsing

HTML5 is full of algorithms. Most of them are small parts of one mega-algorithm, called Parsing HTML Documents. Contrary to popular belief, the HTML parsing algorithm is deterministic: for any sequence of bytes, there is one (and only one) "correct" way to interpret it as HTML. Notice I said "any sequence of bytes," not just "any sequence of bytes that conforms to a specific DTD or schema." This is intentional; HTML5 not only defines what constitutes "valid" HTML markup (for the benefit of conformance checkers), it also defines how to parse "invalid" markup (for the benefit of browsers and other HTML consumers that take existing web content as input). And sweet honey on a stick, there sure is a lot of invalid markup out there.

r3896 tells parsers to ignore almost any end tags before the <html> tags. There are a few special end tags which cause the parser to start constructing a new document: </html>, </head>, </body>, and oddly enough, </br>. [Related: Bug 7672]
r3909 clarifies how user agents should parse the type attribute of a <script> tag. The type attribute is optional; authors can simply omit it if they're embedding JavaScript.
r3923 tweaks the algorithm for parsing the DOCTYPE declaration. This affects DOCTYPE sniffing.
r3967 clarifies the algorithm for ignoring the first newline or carriage return character at the beginning of a <pre> block. [Background: [whatwg] Initial carriage return in <pre> and <textarea>]
r3968 explains why the <embed> element can have an infinite number of attributes. (Answer: because they are passed directly to the third-party plugin that handles the embedded content, and there are no restrictions on what kind of plugins you can have or what attributes they can take as input.)
r3991 adds to the already-long list of legacy, non-conforming attributes that user agents may encounter in existing web content.
r3871 and r3982 tweak the handling of Unicode surrogates. [Background: [whatwg] Surrogate pairs and character references]

Accessibility

As with so many things in the accessibility world, all of this week's changes revolve around the thorny problem of focus. I previously explained why focus is so important in episode 24.

r3887 specifies that each <area> in a client-side image map should be focusable.
r3919 encourages browser vendors to expose tooltips to keyboard-only users. For example, in Firefox 3.5, if you hover your cursor over a hyperlink that defines a title attribute, you will see the title attribute as a tooltip. But if you tab to the same link with the keyboard, no tooltip appears. Now imagine that you're physically unable to use a mouse, and you begin to see the problem. [Background: Bug 7362 and Issue 80]
r3928 defines an intriguing proposal about canvas accessibility, which probably deserves its own article. Here's the short version: you can already define "fallback content" within a <canvas> element that is shown to browsers that don't support the canvas API. This change dictates that the "fallback content" should remain keyboard-focusable even in browsers that do support the canvas API. To quote the spec: "This allows authors to make an interactive canvas keyboard-focusable: authors should have a one-to-one mapping of interactive regions to focusable elements in the fallback content." This is a draft proposal; as far as I know, no browser actually supports it yet, and it may get reverted in the future. [Background: Bug 7404]
r3969 clarifies that browsers must do nothing when the user activates a label whose corresponding input control is hidden (in any manner, including a display: none CSS rule). [Background: Bug 7583]

Security

All of this week's security-related changes revolve around document.domain. As you might expect from its name, this property returns the domain name of the current document. Unfortunately (for security), the property is not read-only; you can also set document.domain to pretty much anything. This can cause all sorts of horrible side effects, since so many things (cookies, local storage, same-origin restrictions on XMLHttpRequest) rely on the domain of the document. This set of changes attempts to reduce the nasty side effects (and the possible attack surface) in case you absolutely must set document.domain to something other than its default calcuated value.

r3875 states that setting document.domain should release the storage mutex. (The storage mutex is a global lock that is acquired when setting cookies and released immediately afterwards. Since cookies are domain-specific, changing the domain dynamically like this needs to release the lock in case the page wants to update the cookies on the new domain.) [Background: [whatwg] Storage mutex and cookies can lead to browser deadlock, [whatwg] RFC: Alternatives to storage mutex for cookies and localStorage, [whatwg] Application defined "locks"]
r3878 states that setting document.domain makes Web Storage unusable, to avoid deadlocks with Web Storage's own locking mechanism. [Background: [whatwg] localStorage, the storage mutex, document.domain, and workers]
r3879 warns against setting document.domain on web applications that are hosted on shared servers. The spec explains the problem: "If an untrusted third party is able to host an HTTP server at the same IP address but on a different port, then the same-origin protection that normally protects two different sites on the same host will fail, as the ports are ignored when comparing origins after the document.domain attribute has been used."

Semantics

r3905, r3948, and r3966 clarify that the profile attribute (used by various microformats) takes a space-separated list of addresses, not just a single address. This has been the subject of heated debate for over 12 years, because HTML 4 claims that "the value of the profile attribute is a URI; user agents may use this URI in two ways..." while simultaneously claiming that "this attribute specifies the location of one or more meta data profiles, separated by white space." [Background: let's keep metadata profiles (head/@profile) in HTML for use in GRDDL etc., [whatwg] HTML4's profile="" attribute's absence in HTML5, Bug 7413, Bug 7484, Bug 7512, and Issue 55.]
r3869 tweaks the definitions of <section>, <article>, and <details> based on an informal study by Jeremy Keith. r3979 further tweaks the definition of <article>, and r3978 mentions that the <article> element is semantically similar to the <entry> element in RFC 4287 (Atom Syndication Format). [Background: [whatwg] article/section/details naming/definition problems. Related: Bug 7551]
r3954 further clarifies the definition of <footer>. [Background: Bug 7502]
r3907 clarifies the workings of the registries for the enumerated values of <link rel>, <meta name>, and <meta http-equiv> attributes.
r3904 tweaks the semantics of <link rel="up">.
r3962 modifies the outline algorithm (used to generate a kind of "table of contents" of an HTML document based on sections and headers) to handle an obscure edge case. [Background: IRC discussion of edge case, Bug 7527, and in particular this comment on Bug 7527]
r3987 gives an example to clarify that the <nav> element does not always need to be a child of a <header> element.

Video

As regular readers of this column are aware, one of the big new user-visible features of HTML5 is native video support without plugins. As video is incredibly complicated, so to is the video support in HTML5. (Although not related to this week's changes, you may be interested to read my series, A gentle introduction to video encoding.)

r3867 modifies the algorithm for sizing anamorphic video within a <video> element, and r3913 defines how to display a frame of anamorphic video in a canvas pattern. [Background: Re: video size when aspect ratio is not 1, What The Heck Is Anamorphic?]
r3924 defines what happens when you dynamically insert a <source> element as a child of a <video> element that also has a src attribute, and r3925 defines what happens when you dynamically remove a <video src> attribute. [Background: Bug 7631, Bug 7632]
r3927 gives advice on how browsers could render an <audio> element with a controls attribute.
r3992 makes further refinements to the play() and pause() algorithms.

Web Forms

Forms continue to be difficult.

r3874 allows browsers to reset the list of selected files of an <input type="file"> element by setting its value attribute to the empty string. [Background: [whatwg] Setting .value on <input type=file>]
r3922 clarifies that setting the disabled attribute of a <fieldset> element should not disable the children of the fieldset's <legend> element. [Background: Bug 7591]
r3934 defines that the maxLength property should return -1 on a <textarea> or <input> element that does not include a maxlength attribute. [Background: Bug 7427]
r3957 clarifies that implicit form submission should validate the form first. [Background: Bug 7511]

Interesting Discussion Threads This Week

I like this proposal for adding a document.head property. It would presumably be faster than document.getElementsByTagName('head')[0], and more reliable than document.documentElement.firstChild.
Character encoding on the web is even worse than you think.
Re: On testing HTML

Around the Web

Brad Neuberg: A video introduction to HTML5
Google (my employer) released an intriguing project called Google Chrome Frame. It's a plugin for Internet Explorer that enables a number of new HTML5 capabilities on an opt-in basis. Here's a few technical details. Reaction has ranged from impressed to unimpressed to positively ironic. Google Wave is one of the first web applications to opt-in and suggest that Internet Explorer users download the plugin.
Steve Faulkner: HTML5 & WAI-ARIA: Happy Families, a slide deck about current HTML5 accessibility features, misfeatures, and support in browsers and assistive technologies.
Burst Engine is "an OpenSource vector animation engine for the HTML5 Canvas Element."
Peter-Paul Koch: The HTML5 drag and drop disaster

Tune in next week for another exciting edition of "This Week in HTML5."

Posted in Weekly Review | 1 Comment »

This Week in HTML 5 – Episode 8

Wednesday, October 8th, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

It's time to catch up on the myriad of changes to the HTML 5 spec. The big news this week is the continued merging of Web Forms 2 into HTML 5.

<button> [r2280]
<select> [r2285, r2287, r2288, r2290]
<input type="submit"> [r2269]
<input type="reset"> [r2270]
<input type="button"> [r2271]
<input type="image"> [r2276]
<input type="file">> [r2274]
<input type="checkbox"> [r2257, r2258]
<input type="radio"> [r2259]
<input type="hidden"> [r2268]
<input type="email"> [r2227]
<input type="url"> [r2228, r2231, r2235]
<input type="number"> [r2254]
<input type="range"> [r2255]
<input type="date"> [r2252]
<input type="time"> [r2253]
<input type="datetime"> [r2229, r2230, r2231, r2239, 2243, r2247, r2251]
<input type="week"> [r2252]
<input type="month"> [r2252]
<input type="datetime-local"> [r2249]

In other news, Andy Lyttle wants to standardize one particular feature of <input type="search"> (which is already supported by Safari, but not standardized): placeholder text for input fields. The text would initially display in the input field (possibly in a stylized form, smaller font, or lighter color), then disappear when the field receives focus. Lots of sites use Javascript to achieve this effect, but it is surprisingly difficult to get right, in part because no one can quite agree on exactly how it should work. Mozilla Firefox displays the name of your current search engine in its dedicated search box until you focus the search box, at which point it blanks out and allows you to type. Safari's search box is initially blank (at least on Windows), and only displays the name of your default search engine after it has received focus and lost it again. Google Chrome's "omnibox" displays "Type to search", right-justified, even when the omnibox has focus, then removes it after you've typed a single character. Adding an <input placeholder> attribute would allow each browser on each platform to match their users' expectations (and possibly even allow end-user customization) of how placeholder text should work for web forms. Discussion threads: 1, 2, 3. So far, there is no consensus on whether this should be added to HTML 5, or what the markup would look like.

Other interesting changes this week:

r2273 defines the <input required> attribute.
r2272 defines what it means to "activate" a form field, so that "clicking a button" and "setting focus to the button and pressing space" result in the same click event being triggered.
r2277 defines the <input size> attribute, which controls the displayed size of the field (but not the length of the field's value, that's <input maxlength> [r2233]).
r2278 defines the <input pattern> attribute, which is an arbitrary regular expression against which the field's value should be matched.
r2282 defines the input and change events. The input event occurs during typing in a form field (and therefore may trigger multiple times as the user types); the change event triggers when a change is committed, even if typing was not involved (such as choosing files to upload with an <input type="file"> field.
r2242 tweaks the definition of floating point numbers to allow specifying an exponent.

Around the web:

Following up on last week's article on clickjacking, the security researcher who discovered (and named) it has posted details of his discovery. Short version: it's even worse than we thought, but vendors are working on it. Here's a proof-of-concept against Adobe Flash that, quite literally, spys on you (via your webcam) without the usual warning dialogs; here's Adobe's response. NoScript now offers enhanced protection against some clickjacking attack vectors.
Anne van Kesteren gives an update on IE 8's support for HTML 5 and other emerging standards.
Matt Ryall has a good article on HTML 5, headings and sections, which documents the differences between HTML 4 and 5's header elements. My personal opinion: I once wrote a 500 page book in Docbook, a non-HTML markup language for technical writers. Docbook 3 had separate elements for <sect1>, <sect2>, <sect3>, &c, and it was a massive pain in the ass to cut-and-paste sections, or try to reuse them in different documents. Docbook 4 added a generic <section> element which can be nested indefinitely, and all those problems went away. Lots of web authors copy-and-paste HTML markup; anything that helps that "just work" is a good thing.

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 6 Comments »

This Week in HTML 5 – Episode 7

Monday, September 29th, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

Work continued this week on Web Forms 2, but I'm going to hold off on that until next week. And in case you missed it, Ian Hickson gave a tech talk on HTML 5, including live demos of some features recently implemented in nightly browser builds.

The big news this week is the disclosure of a vulnerability that researchers have dubbed "clickjacking." To understand it, start with Giorgio Maone's post, Clickjacking and NoScript. Giorgio is the author of the popular NoScript extension for Firefox. In its default configuration, NoScript protects against this vulnerability on most sites in most situations; you can configure it to defeat the attack entirely, but only at the cost of usability and functionality.

Of course, most web users do not run Firefox, and fewer still run NoScript, so web developers still need to be aware of it. Michal Zalewski's post, Dealing with UI redress vulnerabilities inherent to the current web, addresses some possible workarounds:

Using Javascript hacks to detect that window.top != window to inhibit rendering, or override window.top.location. These mechanisms work only if Javascript is enabled, however, and are not guaranteed to be reliable or future-safe. If the check is carried on every UI click, performance penalties apply, too. Not to mention, the extra complexity is just counterintuitive and weird.

Requiring non-trivial reauthentication (captcha, password reentry) on all UI actions with any potential for abuse. Although this is acceptable for certain critical operations, doing so every time a person adds Bob as a friend on a social networking site, or deletes a single mail in a webmail system, is very impractical.

Worried yet? Now let's turn to the question of what browser vendors can do to mitigate the vulnerability. Michal offers several proposals. It is important to realize that none of these proposals have been implemented yet, so don't go rushing off to your text editor and expecting them to do something useful.

Create a HTTP-level (or HTTP-EQUIV) mechanism along the lines of "X-I-Do-Not-Want-To-Be-Framed-Across-Domains: yes" that permits a web page to inhibit frame rendering in potentially dangerous situations.

Add a document-level mechanism to make "if nested <show this> else <show that>" conditionals possible without Javascript. One proposal is to do this on the level of CSS (by using either the media-dependency features of CSS or special classes); another is to introduce new HTML tags. This would make it possible for pages to defend themselves even in environments where Javascript is disabled or limited.

Add an on-by-default mechanism that prevents UI actions to be taken when a document tries to obstruct portions of a non-same-origin frame. By carefully designing the mechanism, we can prevent legitimate uses (such as dynamic menus that overlap with advertisements, gadgets, etc) from being affected, yet achieve a high reliability in stopping attacks.

Enforce a click-to-work mechanism (resembling the Eolas patent workaround) for all cross-domain IFRAMEs.

Rework everything we know about HTML / browser security models to make it possible for domains and pages to specify very specific opt-in / opt-out policies for all types of linking, referencing, such that countering UI redress attacks would be just one of the cases controlled by this mechanism.

To this list, Colin Jackson added two more suggestions:

New cookie attribute: The "httpOnly" cookie flag allows sites to put restrictions on how a cookie can be accessed. We could allow a new flag to be specified in the Set-Cookie header that is designed to prevent CSRF and "UI redress" attacks. If a cookie is set with a "sameOrigin" flag, we could prevent that cookie from being sent on HTTP requests that are initiated by other origins, or were made by frames with ancestors of other origins. In a CSRF or "UI redress" attack scenario, it will appear as though the user is not logged in, and thus the HTTP request will be unable to affect the user's account.

New HTTP request header: Browser vendors seem to be moving away from "same origin restrictions" towards "verifiable origin labels" that let the site decide whether two security origins trust each other. ... [I]nstead of making it an "X-I-Do-Not-Want-To-Be-Framed-Across-Domains: yes" HTTP response header, make it an "X-Ancestor-Frame-Origin: http://www.evil.com" HTTP request header. This header could be a list of all the origins that are ancestors of the frame that triggered the request. If the site decides it does not like the ancestor frame origin it could reject the request. This could be added as a property of MessageEvent as well to detect client-side-only UI redress attacks.

This last approach moves us down a slippery slope towards site security policies for IFRAMEs and embedded content, similar to the Flash security model that allows trusted sites to access cross-domain resources. In practice, Flash crossdomain.xml files have a number of problems, and such an approach would still only cover a fraction of the possible use cases.

You can read the full thread for all the gory details and back-and-forth among browser vendors (Maciej Stachowiak works on WebKit, Robert O'Callahan works on Firefox) and other interested parties. As Maciej notes, user experience may suffer: "[Under proposal #3] iGoogle widgets would become disabled if scrolled partially off the top of the page under your proposal. And even if scrolled back into view, would remain disabled for a second. With possibly a jarring visual effect, or alternately, no visual indication that they are disabled. Hard to decide which is worse." As Rob notes, any solution will also need to deal with IFRAMEs styled with opacity:0, related attacks using some little-known (but widely supported) capabilities of SVG, and possibly other vectors that the world collectively hasn't figured out yet. If you're getting a mental image of the game "Whack-a-Mole," you're not alone.

Ironically, the best example of "clickjacking" is the download page for the NoScript extension, which uses it for good rather than evil. Thanks to some fancy JavaScript (search for "installer"), Giorgio embeds the addons.mozilla.org download page for NoScript in an IFRAME on his own page on noscript.net, sets the IFRAME to "opacity:0" (an attack vector that Robert O'Callahan specifically warned about), scrolls the embedded addons.mozilla.org page to the top corner of its "Add to Firefox" button, and sets the z-index of the IFRAME to 100. Thus, the IFRAME is floating (due to "z-index:100") invisibly (due to "opacity:0") over Giorgio's own "Install Now" button (due to the positioning of the IFRAME element itself). When you think you're clicking the button on noscript.net you are actually clicking the button on addons.mozilla.org. What's the difference? By default, Firefox treats addons.mozilla.org as a trusted download site, so it immediately pops up the extension installation dialog instead of blocking the installation with an infobar saying "Firefox prevented this site (noscript.net) from installing software on your computer." From a user experience standpoint, this is great -- one less click to download and install an extension. From a security standpoint, this is incredibly scary -- the end user has no idea they're interacting with a third-party site.

Ian Hickson, the editor of HTML 5, weighed in with his opinion:

I would like feedback from browser vendors on this topic, ideally in the form of experimental implementations. Personally I think the idea of disabling the contents of a cross-origin iframe that has been partially obscured or rendered partially off-screen is the best idea, but whether we can adopt it depends somewhat on whether browser vendors are willing to adopt it and implement it. It requires no standards changes to implement.

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 9 Comments »