The WHATWG Blog — Mark Pilgrim, Google

Author Archive

This Week in HTML 5 – Episode 21

Tuesday, February 10th, 2009

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news this week is more major work on the non-normative section on rendering HTML documents, including a lot of reverse-engineered documentation of legacy (invalid) attributes that users expect browsers to support.

r2749: marginwidth and marginheight attributes on the <body> element
r2750: hspace and vspace attributes on <table>
r2751: the bgcolor attribute
r2752: the <font> element
r2753: the frames and rules attributes of <table>
r2757: embedded content such as <audio>, <video>, <embed>, <iframe>, and <canvas>
r2759: laying out a group of <frame>s within a <frameset>
r2760: the <br> element
r2761: default margins on <h1>, <h2>, <h3>, <h4>, <h5>, <h6>, and <figure>
r2762: <bb>, <button>, and <details> elements
r2763: the <hr> element (this change in particular has some WHATWG members very excited)
r2764: the <fieldset> element
r2765: <input type=text>
r2766: <input type=date>, <input type=range>, and <input type=color>
r2767: <input type=checkbox>, <input type=radio>, <input type=file>, <input type=submit>, <input type=reset>, and <input type=button>
r2768: <select>, <progress>, and <meter>
r2769: <textarea>
r2770: <mark>
r2772: printing HTML documents
r2773: <link> elements

In addition, one major section was dropped from HTML 5 this week: an algorithm for determining what object is under the cursor (presuming, of course, that the cursor is within the region of the screen which contains an HTML document, and the current context has a screen, and the current context has a cursor). Ian Hickson has announced on www-style that, in accordance with that group's consensus, the algorithm would be better maintained in a future CSS specification.

Around the web:

On the subject of clickjacking, Microsoft announces IE8 Security Part VII: ClickJacking Defenses, which relies on web authors to include a Microsoft-proprietary HTTP header. RSnake responds, as does Giorgio Maone (who, by the way, has already integrated Microsoft's proprietary header into his NoScript extension for Firefox).
Mihai Sucan: HTML 5 canvas - the basics
Remy Sharp's HTML5 enabling script allows web authors to use HTML 5 elements that Internet Explorer doesn't know about and still have them show up properly in the DOM.
Michael Smith: Examine HTML5 localStorage and sessionStorage data with Web Inspector, which is precisely as exciting as it sounds.
Steve Smith: Structural Tags in HTML5

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 4 Comments »

This Week in HTML 5 – Episode 20

Tuesday, February 3rd, 2009

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news this week is the beginning of the non-normative section on rendering HTML documents. For those of you not up on spec-writing lingo, "non-normative" means "you can ignore this and still claim to be in compliance with the specification." It's advice, not commands. On the other hand, it's generally useful advice, so ignoring it completely is probably not in your best interests.

Currently, the rendering section includes advice on

Hidden elements. Things like <script> should always be hidden (in the sense that they should be executed, not have their source displayed in the page). Likewise, <meta>, <link>, <style>, and so on.
Display types. Which elements should be rendered as block-level elements, which as tables, which as list items, and so on.
Margins and padding. Default values for different elements, and also for the same element in different contexts (nested within other elements).
Alignment. Table headers and captions are centered by default; <table align=left> is treated like float:left; etc.
Fonts and colors. By default, links are blue, visited links are purple, and <code> is rendered in a monospace font.
Punctuation and decorations. Links are underlined by default, acronyms are dotted-underlined, and <blink>, well, blinks.
Resetting rules for inherited properties. Tables reset certain text properties; in quirks mode, they reset even more.

Scrolling through the rest of the (mostly empty) rendering section shows lots of potential for future advice on form controls, data grids, favicons, and even the <marquee> element.

Rendering-related revisions: r2734, r2735, r2736, r2737, r2738.

Switching back to the normative parts of the spec, we have r2720, which makes the outerHTML property and the insertAdjacentHTML() method work in XHTML. For the purposes of this discussion — indeed, for the purposes of the entire HTML 5 specification — "XHTML" means "content served with a Content-Type: application/xhtml+xml". In addition, the section The XHTML Syntax has been entirely reorganized and rewritten to consolidate the rules for parsing and serializing XHTML documents and fragments. [Background: Re: outerHTML/insertAdjacentHTML in XML mode]

Other interesting tidbits this week:

r2712 mandates that browsers ignore any extraneous text on the first line of an application cache manifest file (after the file signature "CACHE MANIFEST"), to accomodate hard-core web authors who edit their manifest files manually in Emacs and want to include mode lines on the first line of the file.
r2719 specifies that browsers should not allow scripts to set document.domain to anything on the Public Suffix List, such as "com" or "co.jp". Essential background reading on why this is dangerous: Untraceable XSS Attacks. Most browsers already block this attack, e.g. Firefox since 3.0. [Background: Re: Setting document.domain]
r2711 addresses some security issues surrounding scripts that open windows with an address of about:blank.
r2731 requires that floats be serialized using exponential notation, e.g. 1e+0. [Background: Floating point number feedback]
r2725 is another in a long and mostly boring saga surrounding the concept of a "legacy DOCTYPE." The official DOCTYPE of HTML 5 is simply <!DOCTYPE HTML> -- so simple, in fact, that some tools can not generate it. Bug 54 tracks the issue to the point of obsession; I won't go into details here, but the issue has been bounced around since at least June 2008. I doubt this will be the last we hear about legacy DOCTYPEs. [More background: ISSUE-54: <!DOCTYPE HTML SYSTEM "about:legacy-compat">]

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 3 Comments »

This Week in HTML 5 – Episode 19

Friday, January 30th, 2009

There are 3 pieces of big news for the week of January 19th. Big news #1: r2692, a major revamp of the way application caches are defined. Application caches are the heart of the offline web model which can be used to allow script-heavy web applications like Gmail to work even after you disconnect from the internet. Here is the new definition of how application caches work:

Each application cache has a completeness flag, which is either complete or incomplete.

An application cache group is a group of application caches, identified by the absolute URL of a resource manifest which is used to populate the caches in the group.

An application cache is newer than another if it was created after the other (in other words, application caches in an application cache group have a chronological order).

Only the newest application cache in an application cache group can have its completeness flag set to incomplete, the others are always all complete.

Each application cache group has an update status, which is one of the following: idle, checking, downloading.

A relevant application cache is an application cache that is the newest in its group to be complete.

Each application cache group has a list of pending master entries. Each entry in this list consists of a resource and a corresponding Document object. It is used during the update process to ensure that new master entries are cached.

An application cache group can be marked as obsolete, meaning that it must be ignored when looking at what application cache groups exist.

A Document initially is not associated with an application cache, but steps in the parser and in the navigation sections cause cache selection to occur early in the page load process.

Multiple application caches in different application cache groups can contain the same resource, e.g. if the manifests all reference that resource.

The end result of this major work is actually pretty similar to how application caches worked before, but there were some edge cases (such as handling 404 errors when fetching the application manifest) which are now handled in a sane fashion. It also paved the way for r2693, which makes it possible for application caches to become "obsolete" (meaning they must be ignored when deciding which caches exist).

Big news #2: r2684, which redefines the on* attributes in a way that doesn't suck quite as much. Also, it defines the widely used (but poorly understood) onerror attribute in a way that matches what browsers actually do with it. Here is the meat of it:

All event handler attributes on an element, whether set to null or to a Function object, must be registered as event listeners on the element, as if the addEventListenerNS() method on the Element object's EventTarget interface had been invoked when the event handler attribute's element or object was created, with the event type (type argument) equal to the type described for the event handler attribute in the list above, the namespace (namespaceURI argument) set to null, the listener set to be a target and bubbling phase listener (useCapture argument set to false), the event group set to the default group (evtGroup argument set to null), and the event listener itself (listener argument) set to do nothing while the event handler attribute's value is not a Function object, and set to invoke the call() callback of the Function object associated with the event handler attribute otherwise.

The listener argument is emphatically not the event handler attribute itself.

When an event handler attribute's Function objectw is invoked, its call() callback must be invoked with one argument, set to the Event object of the event in question.

The handler's return value must then be processed as follows:

If the event type is mouseover

If the return value is a boolean with the value true, then the event must be canceled.

If the event object is a BeforeUnloadEvent object

If the return value is a string, and the event object's returnValue attribute's value is the empty string, then set the returnValue attribute's value to the return value.

Otherwise

If the return value is a boolean with the value false, then the event must be canceled.

The Function interface represents a function in the scripting language being used. It is represented in IDL as follows:
[Callback=FunctionOnly, NoInterfaceObject]
interface Function {
any call([Variadic] in any arguments);
};
The call(...) method is the object's callback.

In JavaScript, any Function object implements this interface.

Big news #3: r2685 and r2686 defines a whole slew of important events that are fired on the Window object, including onbeforeunload, onerror, and onload. Previously, some of these were defined on the <body> element, which didn't actually match current browser behavior.

The following are the event handler attributes that must be supported by Window objects, as DOM attributes on the Window object, and with corresponding content attributes and DOM attributes exposed on the body element:

onbeforeunload

Must be invoked whenever a beforeunload event is targeted at or bubbles through the element or object.

onerror

Must be invoked whenever an error event is targeted at or bubbles through the object.

Unlike other event handler attributes, the onerror event handler attribute can have any value. The initial value of onerror must be undefined.

The onerror handler is also used for reporting script errors.

onhashchange

Must be invoked whenever a hashchange event is targeted at or bubbles through the object.

onload

Must be invoked whenever a load event is targeted at or bubbles through the object.

onmessage

Must be invoked whenever a message event is targeted at or bubbles through the object.

onoffline

Must be invoked whenever a offline event is targeted at or bubbles through the object.

ononline

Must be invoked whenever a online event is targeted at or bubbles through the object.

onresize

Must be invoked whenever a resize event is targeted at or bubbles through the object.

onstorage

Must be invoked whenever a storage event is targeted at or bubbles through the object.

onunload

Must be invoked whenever an unload event is targeted at or bubbles through the object.

Other interesting tidbits from the week of January 19th:

r2683 defines the concept of an override URL in order to prevent javascript: URLs (which you should never, ever use) from breaking through the cross-domain origin security policy.
r2697 provides an algorithm for determining the character encoding of an external script referenced by a <script> element.
r2698 clarifies that rel attributes are case-insensitive.
r2703 tweaks the parsing algorithm of the misplaced <frameset> elements to be more compatible with Internet Explorer.

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 3 Comments »

This Week in HTML 5 – Episode 18

Friday, January 30th, 2009

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group. Despite it being almost February already, this episode will focus on changes and discussion from the week of January 12th. (Very little constructive progress was made between the 1st and the 12th.) I'll follow up with another summary for the week of January 19th, then I'll resume normal weekly updates on Monday.

The big news for the week of January 12th is r2633, which defines the indeterminate attribute on form controls.

The input element represents a two-state control that represents the element's checkedness state. If the element's checkedness state is true, the control represents a positive selection, and if it is false, a negative selection. If the element's indeterminate DOM attribute is set to true, then the control's selection should be obscured as if the control was in a third, indeterminate, state.

The control is never a true tri-state control, even if the element's indeterminate DOM attribute is set to true. The indeterminate DOM attribute only gives the appearance of a third state.

Internet Explorer and Safari already support the indeterminate attribute.

The other news I want to highlight this week -- just because this sort of weirdness tickles me -- is r2616, which tries to tackle the following difficult situation:

A user fills out a form and presses the submit button.
The browser POSTs a form and begins parsing the response.
In the course of parsing the response document, it encounters a <meta charset> attribute that is different from the encoding defined or inferred from the HTTP headers.

If this were the response from a GET request, the browser might re-request the page so it could restart parsing with the new character encoding. But doing that after a POST would be like double-submitting the form, which could have serious consequences on the back end. (Technically, the difference is between idempotent operations like GET and non-idempotent operations like pretty much everything else.) So browsers should never re-request the page after a POST, and they'll just have to muddle through as best they can with the character encoding they have. [bug 6258]

Other interesting tidbits that week:

r2672 states that relative URLs in CSS in HTML documents are not reresolved when the base URL of the HTML document changes. This is just one of those incredibly weird things that web authors do that just makes you want to strangle them en-masse and scream "DON'T DO THAT!" But they do do it -- God knows why -- and it has been a source of pain for browser vendors because no standard has ever defined what they should do in this situation, so naturally they all do something different. [Background: Issues concerning the <base> element]
r2674 defines a way for web authors to include documentation inside <script src> elements -- that is, inline documentation for external scripts.
r2679 adds a number of browser interface elements to the HTML 5 spec, including window.locationbar, window.menubar, window.personalbar, window.scrollbars, window.statusbar, and window.toolbar. These previous undefined properties are already supported by all major browsers except Internet Explorer.

Tune in... er, in a few hours... for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 2 Comments »

This Week in HTML 5 – Episode 17

Tuesday, December 30th, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news this week is a major revamp of table headers, following up from the last major edits last March. Ian summarizes the most recent round of changes:

Header cells can now themselves have headers.

I have reversed the way the algorithm is presented, such that it starts from a cell and reports the headers rather than generating the list of headers for each cell on a header-by-header basis.

If headers="" points to a <td> element, the association is set up, but I have left this non-conforming to help authors catch mistakes.

Header cells that are automatically associating do not stop associating when they hit equivalent cells unless they have also hit a <td> first.

The "col" and "row" scope values now act like the implied auto value except that they force the direction.

Empty header cells don't get automatically associated.

I have removed the wide header cell heuristic.

I have made headers="" use the same ID discovery mechanism as getElementById(), to avoid implementations having to support multiple such mechanisms.

Finally, I have made the spec define if a header is a column header or a row header in the case where scope="" is omitted.

I haven't added summary="" on table; nothing particularly new has been raised on the topic since the last times I looked at this.

Accessibility advocates are disappointed by the continued non-inclusion of the summary attribute. Their reasoning is that "the summary attribute is a very, very practical and useful attribute," despite their own user testing that shows otherwise. As Ian put it, "I am hesitant to include a feature like summary="" when all evidence seems to point to it being widely misused by authors and ignored by the users it intends to help." As with all issues, this is not the final word on the matter, but it's where we stand today.

In other news, r2566 addresses a very subtle issue with fetching images. The problem stems from the following (arguably pointless) markup: <img src=""> A fair number of web pages actually try to declare an image with an empty src attribute. According to the HTTP and URL specifications, this markup means that there is an image at the same address as the HTML document -- a theoretically possible but highly unlikely scenario. Internet Explorer apparently catches this mistake and just silently drops the image. Other browsers do not; they will actually try to fetch the image, which results in a "duplicate" request for the page (once to successfully retrieve the page, and again to unsuccessfully retrieve the image).

Boris Zbarsky, a leading Mozilla developer, states

We (Gecko) have had 28 independent bug reports filed (with people bothering to create an account in the bug database, etc) about the behavior difference from IE here. That's a much larger number of bug reports than we usually get about a given issue. I can't tell you why this pattern is so common (e.g. whether some authoring frameworks produce it in some cases), but it seems that a number of web developers not only produce markup like this but notice the requests in their HTTP logs and file bugs about it.

r2566 addresses the issue by special-casing <img src> to allow browsers to ignore an image if its fetch request would result in fetching exactly the same URL as its HTML document:

When an img is created with a src attribute, and whenever the src attribute is set subsequently, the user agent must fetch the resource specifed by the src attribute's value, unless the user agent cannot support images, or its support for images has been disabled, or the user agent only fetches elements on demand, or the element's src attribute has a value that is an ignored self-reference.

The src attribute's value is an ignored self-reference if its value is the empty string, and the base URI of the element is the same as the document's address.

Other interesting tidbits this week:

r2568 adds a storageArea attribute to StorageEvent object. [StorageEvent deficiency]
r2556 changes the processing model of the <meta charset> attribute by requiring that it appear in the first 512 bytes of the document. For those of you playing along at home, <meta charset="..."> is the new <meta http-equiv="Content-Type" content="text/html; charset=...">. Both forms are fully supported in all major browsers. [Comparing conformance requirements against real-world docs]
r2557, r2559, r2560, r2562, r2563, and r2604 add a variety of common markup errors to the list of errors that HTML validators may treat as minor. [Re: comparing conformance requirements against real-world docs]
r2561 allows the height and width attributes on <input type="image">, a construct that is already supported by all major browsers. [Re: comparing conformance requirements against real-world docs]
r2601 adds an example of something that all browsers do anyway -- killing scripts that run too long.
r2597 removes the notification API, which was kicked around in 2006 but never saw significant interest from either authors or browser vendors. [Notifications API removed]
r2596 defines window.close(), window.focus(), and window.blur(). The focus() and blur() methods have historically been used to produce "pop-up" and "pop-under" windows containing advertisements. Most modern browsers now control how and whether scripts can do this, and the HTML 5 specification goes so far as to recommend that "[u]ser agents are encouraged to ignore calls to this blur() method entirely."
r2552 gives an example of embedding RDF metadata in XHTML. As the spec notes, this is not possible in HTML, although you could always use RDFa.
r2595 gives an example of marking up a tag cloud.

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 12 Comments »