This Week in HTML 5 – Episode 5
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
The big news this week is the merging of the Web Forms 2 specification into the HTML 5 specification, and updating it with the collected feedback of the last two years.
form
[r2142]fieldset
[r2143]input
[r2144]button
[r2145]label
[r2146]select
[r2148]datalist
[r2150]optgroup
[r2151]option
[r2152]textarea
[r2153]output
[r2154]- The
form
attribute on various elements, which associates an element with its parent form, and the elements attribute of theform
element, which associates the form with its elements. (Form-associated elements no longer need to be children of the<form>
element itself within the DOM, so explicit association is required. Form-associated elements that are DOM children of the<form>
element are implicitly associated, so your existing markup will continue to work the way you think it does.) [r2157]
Meanwhile, revisions 2160, 2161, 2163, 2164, and 2165 begin the long, hard process of defining when and how a form is submitted. This is one of those things that "everybody knows" but nobody has actually, you know, documented. For example, do you submit a form when you toggle a checkbox? Of course not, "everybody knows" that. Is an unchecked checkbox included in the form data when it is submitted? No, "everybody knows" that too. How do you submit to an ftp://
URL? A mailto://
URL? A data://
URL? What are the three values of the enctype
attribute, and how do they affect the form data when it is submitted to a data://
URL with the PUT
method?1 Umm... How exactly do you construct the names of the X and Y coordinates to submit a server-side image map? (By the way, server-side image maps are inaccessible, so don't use them unless you provide an accessible fallback form with equivalent functionality.) Web Forms 2 (and now HTML 5) will tell you.
Another interesting set of changes revolves around character encoding. If you don't know anything about character encoding, I would strongly recommend Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) Tim Bray's three-part series, On the Goodness of Unicode, On Character Strings, and Characters vs. Bytes, and anything written by Martin Dürst.
Now then: r2125 warns against using EBCDIC on public-facing web pages. For those of you under 30, EBCDIC is a character encoding invented by IBM in the 1960s for their System/360 mainframe. On non-IBM hardware, EBCDIC lost the encoding war to ASCII, and later Unicode, and it is rarely seen on the public web. r2131 says that browsers should ignore an out-of-band encoding definition that they do not support. For example, if a web page is served with an HTTP Content-Type
header with a charset
parameter that defines a character encoding the browser does not support, the browser should ignore it and continue the process of determining the character encoding by other means. And finally, r2137 says that browsers should treat US-ASCII as Windows-1252 when determining character encoding. As the HTML 5 specification notes, "The requirement to treat certain encodings as other encodings according to the table above is a willful violation of the W3C Character Model specification."
Other interesting changes this week:
- r2122 makes it non-conforming to have empty unquoted attribute values like
<input value=>
- r2130 provides a special case for the
definitionURL
attribute for those embedding MathML in HTML. - r2140 and r2141 allow a legacy DOCTYPE if and only if the HTML page is the result of an XSLT transform. The exact DOCTYPE string is
<!DOCTYPE HTML PUBLIC 'XSLT-compat'>
Tune in next week for another exciting episode of "This Week in HTML 5."
Footnotes:
- When submitting to a
data://
URL with thePUT
method, the three values ofenctype
areapplication/x-www-form-urlencoded
,multipart/form-data
, andtext/plain
. Amaze your friends at the next tech conference!
Re: treating US-ASCII and ISO-8859-1 as Windows-1252, do any browsers not do this already?
[…] Mark Pilgrim’s HTML 5 roundup he shares with us the big news: … the merging of the Web Forms 2 specification into the HTML […]
Hi,
in the footnote, you misspelled multipart/form-data, missed the ‘i’.
Thanks for a great service,
Wow, I’ve never seen a standard ‘in the making’ like this, complete with changesets. What a nice way to give people insight into the process!
Keep doing this, it’s great.
Now for the feedback. Why single out EBDIC and why discourage any encoding?
Steve, the point of specifying is that when new browsers come to the market, they know exactly what they have to implement in order to render Web pages rather than having to reverse engineer what other browsers are doing.
mike, some encodings are discouraged because they have led to security issues in the past. Also, keeping the amount of encodings to a limited set is a good thing. We don’t want to keep adding new encoding converters just for the sake of it.
Pedro, thanks, fixed!
Anne, I didn’t mean to imply it shouldn’t be specified, I was just curious about when this practice started/if any UAs never followed it.
Steve, ah ok. I don’t really know when this started exactly. Most likely a long time ago and probably Internet Explorer was first given the encoding.
What about repetition? Will it also be included in HTML 5 spec?
Repetition templates are most likely dropped as they only address a small subset of the use cases. If someone provides more research we might include another mechanism that addresses more use cases.
Mark – I appreciate these weekly HTML5 episodes. Please KUTGW!
Anyhow, it’s good to see Web Forms 2 is moving forward.
Steve Clay: IE7 does not treat US-ASCII as Windows-1252.
[…] The WHATWG Blog Please leave your sense of logic at the door, thanks! « This Week in HTML 5 – Episode 5 […]
[…] than 90 % of the SVG test suite. It is the only browser that implements Web Forms 2.0, currently being merged into HTML 5. They supported media queries and SMIL long before Acid3 came […]
[…] defines exactly how form data should be encoded before being submitted to the server. I’ve previously mentioned character encoding in this series; this revision marks the first time that an HTML specification has acknowledged the […]