This Week in HTML 5 – Episode 5
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
The big news this week is the merging of the Web Forms 2 specification into the HTML 5 specification, and updating it with the collected feedback of the last two years.
formattribute on various elements, which associates an element with its parent form, and the elements attribute of the
formelement, which associates the form with its elements. (Form-associated elements no longer need to be children of the
<form>element itself within the DOM, so explicit association is required. Form-associated elements that are DOM children of the
<form>element are implicitly associated, so your existing markup will continue to work the way you think it does.) [r2157]
Meanwhile, revisions 2160, 2161, 2163, 2164, and 2165 begin the long, hard process of defining when and how a form is submitted. This is one of those things that "everybody knows" but nobody has actually, you know, documented. For example, do you submit a form when you toggle a checkbox? Of course not, "everybody knows" that. Is an unchecked checkbox included in the form data when it is submitted? No, "everybody knows" that too. How do you submit to an
ftp:// URL? A
mailto:// URL? A
data:// URL? What are the three values of the
enctype attribute, and how do they affect the form data when it is submitted to a
data:// URL with the
PUT method?1 Umm... How exactly do you construct the names of the X and Y coordinates to submit a server-side image map? (By the way, server-side image maps are inaccessible, so don't use them unless you provide an accessible fallback form with equivalent functionality.) Web Forms 2 (and now HTML 5) will tell you.
Another interesting set of changes revolves around character encoding. If you don't know anything about character encoding, I would strongly recommend Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) Tim Bray's three-part series, On the Goodness of Unicode, On Character Strings, and Characters vs. Bytes, and anything written by Martin Dürst.
Now then: r2125 warns against using EBCDIC on public-facing web pages. For those of you under 30, EBCDIC is a character encoding invented by IBM in the 1960s for their System/360 mainframe. On non-IBM hardware, EBCDIC lost the encoding war to ASCII, and later Unicode, and it is rarely seen on the public web. r2131 says that browsers should ignore an out-of-band encoding definition that they do not support. For example, if a web page is served with an HTTP
Content-Type header with a
charset parameter that defines a character encoding the browser does not support, the browser should ignore it and continue the process of determining the character encoding by other means. And finally, r2137 says that browsers should treat US-ASCII as Windows-1252 when determining character encoding. As the HTML 5 specification notes, "The requirement to treat certain encodings as other encodings according to the table above is a willful violation of the W3C Character Model specification."
Other interesting changes this week:
- r2122 makes it non-conforming to have empty unquoted attribute values like
- r2130 provides a special case for the
definitionURLattribute for those embedding MathML in HTML.
- r2140 and r2141 allow a legacy DOCTYPE if and only if the HTML page is the result of an XSLT transform. The exact DOCTYPE string is
<!DOCTYPE HTML PUBLIC 'XSLT-compat'>
Tune in next week for another exciting episode of "This Week in HTML 5."
- When submitting to a
data://URL with the
PUTmethod, the three values of
text/plain. Amaze your friends at the next tech conference!