The WHATWG Blog

Please leave your sense of logic at the door, thanks!

This Week in HTML 5 – Episode 28

Thursday, April 2nd, 2009

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news for the week of March 23rd is that SVG can once again be included directly in HTML 5 documents served as text/html:

I've made the following changes to HTML5:

  • Uncommented out the XXXSVG bits, reintroducing the ability to have SVG content in text/html.
  • Defined <script> processing for SVG <script> in text/html by deferring to the SVG Tiny 1.2 spec and blocking synchronous document.write(). The alternative to this is to integrate the SVG script processing model with the (pretty complicated) HTML script processing model, which would require changes to SVG and might result in a dependency from SVG to HTML5. Anne would like to do this, but I'm not convinced it's wise, and it certainly would be more complex than what we have now. If we ever want to add async="" or defer="" to SVG scripts, then this would probably be a necessary part of that process, though.
  • Added a paragraph suggesting: "To enable authors to use SVG tools that only accept SVG in its XML form, interactive HTML user agents are encouraged to provide a way to export any SVG fragment as a namespace-well-formed XML fragment."
  • Added a paragraph defining the allowed content model for SVG <title> elements in text/html documents.

r2904 (and, briefly, r2910) give all the details of this solution. There are still a number of differences between the text in HTML 5 and the proposal brought by the SVG working group. Some of these are addressed further down in the announcement:

Doug Schepers, who has been the SVG working group's HTML 5 liason, does not like this solution:

To be honest, I think it's not a good use of the SVG WG's time to provide feedback when Ian already has his mind made up, even if I don't believe that he is citing real evidence to back up his decision. What I see is this: one set of implementers and authors (the SVG WG) and the majority of the author and user community (in public comments) asking for some sort of preservation of SVG as an XML format, even if it's looser and error-corrected in practice, and a few implementers (Jonas and Lachy, most notably) disagreeing, and Ian giving preference to the minority opinion. Maybe there is sound technical rationale for doing so, but I haven't been satisfied on that score.

Turning to technical matters, one of the features of web forms in HTML 5 is allowing the attributes for form submission on either the <form> element (as in HTML 4) or on the submit button (new in HTML 5). Originally, the attributes for submit buttons were named action, enctype, method, novalidate, and target, which exactly mirrored the attribute names that could be declared on the <form> element.

However, in January 2008, Hallvord R. M. Steen (Opera developer) noted that "INPUT action [attribute] breaks web applications frequently. Both GMail and Yahoo mail (the new Oddpost-based version) use input/button.action and were seriously broken by WF2's action attribute."

Following up in November 2008, Ian Hickson replied, "I notice that Opera still supports 'action' and doesn't seem to have problems in GMail; is this still a problem?" to which Hallvord replied, "GMail fixed it on their side a while ago. It is still a problem with Yahoo mail, breaking most buttons in their UI for a browser that supports 'action'. We work around this with a browser.js hack. ('Still a problem' means 'I tested this again a couple of weeks ago and things were still broken without this patch'.)"

Ian replied, "This is certainly problematic. It's unclear what we should do. It's hard to use another attribute name, since the whole point is reusing existing ones... can we trigger this based on quirks mode, maybe? Though I hate to add new quirks." Hallvord did not like that idea: "In my personal opinion, I don't see why re-using attribute names is considered so important if we can find an alternative that feels memorable and usable. How does this look? <input type="submit" formaction="http://www.example.com/">"

Finally, in March 2009, Ian replied:

That seems reasonable. I've changed "action", "method", "target", "enctype" and "novalidate" attributes on <input> and <button> to start with "form" instead: "formaction", "formmethod", "formtarget", "formenctype" and "formnovalidate".

And thus we have r2890: Rename attributes for form submission to avoid clashes with existing usage.

Other interesting changes this week:

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 3 Comments »

This Week in HTML 5 – Episode 25

Tuesday, March 31st, 2009

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news for the week of February 23rd (yes, I'm that far behind) is a collection of changes about how video is processed. The changes revolve around the resource selection algorithm to handle cases where the src attribute of a <video> element is set dynamically from script.

Background reading on the resource selection algorithm: Re: play() sometimes doesn't do anything now that load() is async, r2849: "Change the way resources are loaded for media elements to make it actually work", r2873: "make all invokations of the resource selection algorithm asynchronous."

Other video-related changes this week:

Another big change this week is the combination of r2859, 2860, and r2861: you can now declare the character encoding of XHTML documents served with the application/xhtml+xml MIME type by using the <meta charset> attribute, but only if the value is "UTF-8". Also, the charset attribute must appear in the first 512 bytes of the document. Previously, the only ways to control the character encoding of an application/xhtml+xml document were setting the charset parameter on the HTTP Content-Type header, or to use an encoding attribute in the XML prolog.

In practice, this will make no difference to encoding detection algorithms; other UTF-* encodings are detected earlier (with a Byte Order Mark), and any other encoding would require an XML prolog. This is mainly to address the desire of a few overly vocal authors to be able to serve the same markup in both text/html and application/xhtml+xml modes. Background reading: Bug 6613: Allow <meta charset="UTF-8"/> in XHTML.

Other interesting changes this week:

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 1 Comment »

This Week in HTML 5 – Episode 17

Tuesday, December 30th, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news this week is a major revamp of table headers, following up from the last major edits last March. Ian summarizes the most recent round of changes:

  • Header cells can now themselves have headers.
  • I have reversed the way the algorithm is presented, such that it starts from a cell and reports the headers rather than generating the list of headers for each cell on a header-by-header basis.
  • If headers="" points to a <td> element, the association is set up, but I have left this non-conforming to help authors catch mistakes.
  • Header cells that are automatically associating do not stop associating when they hit equivalent cells unless they have also hit a <td> first.
  • The "col" and "row" scope values now act like the implied auto value except that they force the direction.
  • Empty header cells don't get automatically associated.
  • I have removed the wide header cell heuristic.
  • I have made headers="" use the same ID discovery mechanism as getElementById(), to avoid implementations having to support multiple such mechanisms.
  • Finally, I have made the spec define if a header is a column header or a row header in the case where scope="" is omitted.
  • I haven't added summary="" on table; nothing particularly new has been raised on the topic since the last times I looked at this.

Accessibility advocates are disappointed by the continued non-inclusion of the summary attribute. Their reasoning is that "the summary attribute is a very, very practical and useful attribute," despite their own user testing that shows otherwise. As Ian put it, "I am hesitant to include a feature like summary="" when all evidence seems to point to it being widely misused by authors and ignored by the users it intends to help." As with all issues, this is not the final word on the matter, but it's where we stand today.

In other news, r2566 addresses a very subtle issue with fetching images. The problem stems from the following (arguably pointless) markup: <img src=""> A fair number of web pages actually try to declare an image with an empty src attribute. According to the HTTP and URL specifications, this markup means that there is an image at the same address as the HTML document -- a theoretically possible but highly unlikely scenario. Internet Explorer apparently catches this mistake and just silently drops the image. Other browsers do not; they will actually try to fetch the image, which results in a "duplicate" request for the page (once to successfully retrieve the page, and again to unsuccessfully retrieve the image).

Boris Zbarsky, a leading Mozilla developer, states

We (Gecko) have had 28 independent bug reports filed (with people bothering to create an account in the bug database, etc) about the behavior difference from IE here. That's a much larger number of bug reports than we usually get about a given issue. I can't tell you why this pattern is so common (e.g. whether some authoring frameworks produce it in some cases), but it seems that a number of web developers not only produce markup like this but notice the requests in their HTTP logs and file bugs about it.

r2566 addresses the issue by special-casing <img src> to allow browsers to ignore an image if its fetch request would result in fetching exactly the same URL as its HTML document:

When an img is created with a src attribute, and whenever the src attribute is set subsequently, the user agent must fetch the resource specifed by the src attribute's value, unless the user agent cannot support images, or its support for images has been disabled, or the user agent only fetches elements on demand, or the element's src attribute has a value that is an ignored self-reference.

The src attribute's value is an ignored self-reference if its value is the empty string, and the base URI of the element is the same as the document's address.

Other interesting tidbits this week:

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 12 Comments »

This Week in HTML 5 – Episode 5

Monday, September 15th, 2008

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news this week is the merging of the Web Forms 2 specification into the HTML 5 specification, and updating it with the collected feedback of the last two years.

Meanwhile, revisions 2160, 2161, 2163, 2164, and 2165 begin the long, hard process of defining when and how a form is submitted. This is one of those things that "everybody knows" but nobody has actually, you know, documented. For example, do you submit a form when you toggle a checkbox? Of course not, "everybody knows" that. Is an unchecked checkbox included in the form data when it is submitted? No, "everybody knows" that too. How do you submit to an ftp:// URL? A mailto:// URL? A data:// URL? What are the three values of the enctype attribute, and how do they affect the form data when it is submitted to a data:// URL with the PUT method?1 Umm... How exactly do you construct the names of the X and Y coordinates to submit a server-side image map? (By the way, server-side image maps are inaccessible, so don't use them unless you provide an accessible fallback form with equivalent functionality.) Web Forms 2 (and now HTML 5) will tell you.

Another interesting set of changes revolves around character encoding. If you don't know anything about character encoding, I would strongly recommend Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) Tim Bray's three-part series, On the Goodness of Unicode, On Character Strings, and Characters vs. Bytes, and anything written by Martin Dürst.

Now then: r2125 warns against using EBCDIC on public-facing web pages. For those of you under 30, EBCDIC is a character encoding invented by IBM in the 1960s for their System/360 mainframe. On non-IBM hardware, EBCDIC lost the encoding war to ASCII, and later Unicode, and it is rarely seen on the public web. r2131 says that browsers should ignore an out-of-band encoding definition that they do not support. For example, if a web page is served with an HTTP Content-Type header with a charset parameter that defines a character encoding the browser does not support, the browser should ignore it and continue the process of determining the character encoding by other means. And finally, r2137 says that browsers should treat US-ASCII as Windows-1252 when determining character encoding. As the HTML 5 specification notes, "The requirement to treat certain encodings as other encodings according to the table above is a willful violation of the W3C Character Model specification."

Other interesting changes this week:

Tune in next week for another exciting episode of "This Week in HTML 5."


Footnotes:

  1. When submitting to a data:// URL with the PUT method, the three values of enctype are application/x-www-form-urlencoded, multipart/form-data, and text/plain. Amaze your friends at the next tech conference!

Posted in Weekly Review | 15 Comments »