Monday, November 3rd, 2008
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group. Last Friday was Halloween for some of you; in the United States, it involves dressing up in slutty costumes, begging your neighbors for handouts, and getting diabetes. Yesterday, many of you set your clocks back one hour for Daylight Savings Time. And for those of you on the Gregorian calendar, it is now November.
Dates and times loom large in this week's updates. "What is today's date?" is a deceptively simple question, matched in complexity only by the related question, "What time is it?" Sources for Time Zone and Daylight Saving Time Data gives a good overview of the current state of the art for answering both questions. In the movie Crocodile Dundee, Mick says he once asked an Aboriginal elder when he was born; the elder replied, "in the summertime."
r2381 defines global dates and times:
A global date and time consists of a specific Gregorian date, consisting of a year, a month, and a day, and a time, consisting of an hour, a minute, a second, and a fraction of a second, expressed with a time zone, consisting of a number of hours and minutes.
r2382 defines local dates and times:
A local date and time consists of a specific Gregorian date, consisting of a year, a month, and a day, and a time, consisting of an hour, a minute, a second, and a fraction of a second, but expressed without a time zone.
r2383 defines a month:
A month consists of a specific Gregorian date with no timezone information and no date information beyond a year and a month.
r2384 and r2385 define a week:
A week consists of a week-year number and a week number representing a seven day period. Each week-year in this calendaring system has either 52 weeks or 53 weeks, as defined below. A week is a seven-day period. The week starting on the Gregorian date Monday December 29th 1969 (1969-12-29) is defined as week number 1 in week-year 1970. Consecutive weeks are numbered sequentially. The week before the number 1 week in a week-year is the last week in the previous week-year, and vice versa.
A week-year with a number year that corresponds to a year year in the Gregorian calendar that has a Thursday as its first day (January 1st), and a week-year year where year is a number divisible by 400, or a number divisible by 4 but not by 100, has 53 weeks. All other week-years have 52 weeks.
The week number of the last day of a week-year with 53 weeks is 53; the week number of the last day of a week-year with 52 weeks is 52.
Note: The week-year number of a particular day can be different than the number of the year that contains that day in the Gregorian calendar. The first week in a week-year year is the week that contains the first Thursday of the Gregorian year year.
<input>
form elements can be declared to take a local date and time, a global date and time, a date, a time, a month, or a week. You can also declare a global date and time in a <time>
element or in the datetime
attribute of <ins>
and <del>
.
HTML 5 does not define weekends or holidays, and therefore does not define business days. Interstellar datekeeping has been pushed back to HTML 6.
In other news, Chris Wilson suggested a different strategy for the much-maligned <q>
element, which kicked off a long discussion, which in turn spawned several tangential discussions: <q> and commas, <q> vs <p>, UA style sheet for <q>, <q addmarks=true>, and the overly-optimistically-titled Final thoughts on <q>. The basic problem is that, while HTML 4 clearly states that user agents should render with delimiting quotation marks, Microsoft Internet Explorer (prior to IE8b2) did not do so. IE8b2 does do so, but it falls back to client-side regional settings to display quotation marks in pages where the author has not specified the language (which is the vast majority of pages). Also, in some languages, convention dictates alternating single and double quotes for nested quotations, but HTML 4 did not specify how to handle this, and different browsers handle nested quotation marks in different ways.
Other interesting tidbits this week:
Tune in next week for another exciting episode of "This Week in HTML 5."
Posted in Weekly Review | 1 Comment »
Tuesday, October 14th, 2008
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
Most of the changes in the spec this week revolve around the <textarea>
element.
Shelley Powers pointed out that I haven't mentioned the issue of distributed extensibility yet. (The clearest description of the issue is Sam Ruby's message from last year, which spawned a long discussion.) The short version: XHTML (served with the proper MIME type, application/xhtml+xml
) supports embedding foreign data in arbitrary namespaces, including SVG and MathML. None of these technologies (XHTML, SVG, or MathML) have had much success on the public web. Despite Chris Wilson's assertion that "we cannot definitively say why XHTML has not been successful on the Web," I think it's pretty clear that Internet Explorer's complete lack of support for the application/xhtml+xml
MIME type has something to do with it. (Chris is the project lead on Internet Explorer 8.)
Still, it is true that XHTML does support distributed extensibility, and many people believe that the web would be richer if SVG and MathML (and other as-yet-unknown technologies) could be embedded and rendered in HTML pages. The key phrase here is "as-yet-unknown technologies." In that light, the recent SVG-in-HTML proposal (which I mentioned several weeks ago) is beside the point. The point of distributed extensibility is that it does not require approval from a standards body. "Let a thousand flowers bloom" and all that, where by "flowers," I mean "namespaces." This is an unresolved issue.
Other interesting changes this week:
- r2314 ensures that the
required
attribute only applies to form controls whose value can change.
- r2316 defines the
name
attribute for form controls.
- r2317 defines the
disabled
attribute for form controls.
- r2320 defines all the different ways that a form control can fail to satisfy its constraints. For example, an
<input maxlength=20>
element with a 21-character value.
- r2322 defines exactly how form data should be encoded before being submitted to the server. I've previously mentioned character encoding in this series; this revision marks the first time that an HTML specification has acknowledged the existence of
<input type=hidden name=_charset_>
method of specifying the character encoding of submitted form data.
- r2319 removes support for data templates and repetition templates. These were inventions in the original Web Forms 2 specification, but they were never picked up by any major browser.
Around the web:
Tune in next week for another exciting episode of "This Week in HTML 5."
Posted in Weekly Review | 2 Comments »
Wednesday, October 8th, 2008
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
It's time to catch up on the myriad of changes to the HTML 5 spec. The big news this week is the continued merging of Web Forms 2 into HTML 5.
<button>
[r2280]
<select>
[r2285, r2287, r2288, r2290]
<input type="submit">
[r2269]
<input type="reset">
[r2270]
<input type="button">
[r2271]
<input type="image">
[r2276]
<input type="file">>
[r2274]
<input type="checkbox">
[r2257, r2258]
<input type="radio">
[r2259]
<input type="hidden">
[r2268]
<input type="email">
[r2227]
<input type="url">
[r2228, r2231, r2235]
<input type="number">
[r2254]
<input type="range">
[r2255]
<input type="date">
[r2252]
<input type="time">
[r2253]
<input type="datetime">
[r2229, r2230, r2231, r2239, 2243, r2247, r2251]
<input type="week">
[r2252]
<input type="month">
[r2252]
<input type="datetime-local">
[r2249]
In other news, Andy Lyttle wants to standardize one particular feature of <input type="search">
(which is already supported by Safari, but not standardized): placeholder text for input fields. The text would initially display in the input field (possibly in a stylized form, smaller font, or lighter color), then disappear when the field receives focus. Lots of sites use Javascript to achieve this effect, but it is surprisingly difficult to get right, in part because no one can quite agree on exactly how it should work. Mozilla Firefox displays the name of your current search engine in its dedicated search box until you focus the search box, at which point it blanks out and allows you to type. Safari's search box is initially blank (at least on Windows), and only displays the name of your default search engine after it has received focus and lost it again. Google Chrome's "omnibox" displays "Type to search", right-justified, even when the omnibox has focus, then removes it after you've typed a single character. Adding an <input placeholder>
attribute would allow each browser on each platform to match their users' expectations (and possibly even allow end-user customization) of how placeholder text should work for web forms. Discussion threads: 1, 2, 3. So far, there is no consensus on whether this should be added to HTML 5, or what the markup would look like.
Other interesting changes this week:
- r2273 defines the
<input required>
attribute.
- r2272 defines what it means to "activate" a form field, so that "clicking a button" and "setting focus to the button and pressing space" result in the same
click
event being triggered.
- r2277 defines the
<input size>
attribute, which controls the displayed size of the field (but not the length of the field's value, that's <input maxlength>
[r2233]).
- r2278 defines the
<input pattern>
attribute, which is an arbitrary regular expression against which the field's value should be matched.
- r2282 defines the
input
and change
events. The input
event occurs during typing in a form field (and therefore may trigger multiple times as the user types); the change
event triggers when a change is committed, even if typing was not involved (such as choosing files to upload with an <input type="file">
field.
- r2242 tweaks the definition of floating point numbers to allow specifying an exponent.
Around the web:
- Following up on last week's article on clickjacking, the security researcher who discovered (and named) it has posted details of his discovery. Short version: it's even worse than we thought, but vendors are working on it. Here's a proof-of-concept against Adobe Flash that, quite literally, spys on you (via your webcam) without the usual warning dialogs; here's Adobe's response. NoScript now offers enhanced protection against some clickjacking attack vectors.
- Anne van Kesteren gives an update on IE 8's support for HTML 5 and other emerging standards.
- Matt Ryall has a good article on HTML 5, headings and sections, which documents the differences between HTML 4 and 5's header elements. My personal opinion: I once wrote a 500 page book in Docbook, a non-HTML markup language for technical writers. Docbook 3 had separate elements for
<sect1>
, <sect2>
, <sect3>
, &c, and it was a massive pain in the ass to cut-and-paste sections, or try to reuse them in different documents. Docbook 4 added a generic <section>
element which can be nested indefinitely, and all those problems went away. Lots of web authors copy-and-paste HTML markup; anything that helps that "just work" is a good thing.
Tune in next week for another exciting episode of "This Week in HTML 5."
Posted in Weekly Review | 6 Comments »
Tuesday, September 23rd, 2008
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
There is no big news this week. Work continued on last week's orgy of Web Forms-related check-ins. This week adds the <label>
element and the jack-of-all-forms <input>
element. [r2191, r2192, r2197, r2200, r2202, r2204, r2205, r2207, r2211, r2212, r2213, r2214, r2218, r2219, r2220, r2222, r2223]
Laura Carlson and others have begun to review the accessibility of multimedia on the web. Most accessibility discussions revolve around the needs of visually impaired users, but hearing impaired users are also important and too often ignored. There was a long discussion last month (and continuing into this month) about the accessibility implications of the <audio>
and <video>
elements for hearing impaired users. YouTube (owned by Google, my employer) recently announced support for captions on YouTube videos and published a tutorial on adding them to your own videos.
Ian Hickson (the HTML 5 editor) gave an interview about HTML 5 in which he reiterated his goal of having two independent, complete, interoperable implementations of HTML 5 by 2022. (By contrast, HTML 4.0 was "finalized" 11 years ago but still doesn't have two independent, complete, interoperable implementations.) This led to a mini-firestorm among bloggers who misunderstood "2022" as "the date when I can start using HTML 5 features." It bears repeating that the "2022" date has no significance at all for web developers. Most browser vendors are actively involved in HTML 5, several browsers are already shipping HTML 5 features, and developers who are holding their breath until 2022 are going to find themselves seriously behind the curve.
On that note, Brenton Strine asks a very good question: "Is there some place that documents the parts of HTML 5 that are already up and running? Can I use <canvas>
or <video>
? In which browsers? What other tags can I use? What other fancy HTML 5 stuff can I do today in 2008?" On the video front, Mozilla will be shipping Ogg Theora support in Firefox 3.1. (You can read more about why Ogg matters.) Last year, Opera released experimental builds with Ogg Theora support, and they now have video-enabled builds on 3 platforms. The Wikimedia Foundation has a few Theora-encoded videos you can watch.
Tune in next week for another exciting episode of "This Week in HTML 5."
Posted in Weekly Review | 6 Comments »
Monday, September 15th, 2008
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
The big news this week is the merging of the Web Forms 2 specification into the HTML 5 specification, and updating it with the collected feedback of the last two years.
Meanwhile, revisions 2160, 2161, 2163, 2164, and 2165 begin the long, hard process of defining when and how a form is submitted. This is one of those things that "everybody knows" but nobody has actually, you know, documented. For example, do you submit a form when you toggle a checkbox? Of course not, "everybody knows" that. Is an unchecked checkbox included in the form data when it is submitted? No, "everybody knows" that too. How do you submit to an ftp://
URL? A mailto://
URL? A data://
URL? What are the three values of the enctype
attribute, and how do they affect the form data when it is submitted to a data://
URL with the PUT
method?1 Umm... How exactly do you construct the names of the X and Y coordinates to submit a server-side image map? (By the way, server-side image maps are inaccessible, so don't use them unless you provide an accessible fallback form with equivalent functionality.) Web Forms 2 (and now HTML 5) will tell you.
Another interesting set of changes revolves around character encoding. If you don't know anything about character encoding, I would strongly recommend Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) Tim Bray's three-part series, On the Goodness of Unicode, On Character Strings, and Characters vs. Bytes, and anything written by Martin Dürst.
Now then: r2125 warns against using EBCDIC on public-facing web pages. For those of you under 30, EBCDIC is a character encoding invented by IBM in the 1960s for their System/360 mainframe. On non-IBM hardware, EBCDIC lost the encoding war to ASCII, and later Unicode, and it is rarely seen on the public web. r2131 says that browsers should ignore an out-of-band encoding definition that they do not support. For example, if a web page is served with an HTTP Content-Type
header with a charset
parameter that defines a character encoding the browser does not support, the browser should ignore it and continue the process of determining the character encoding by other means. And finally, r2137 says that browsers should treat US-ASCII as Windows-1252 when determining character encoding. As the HTML 5 specification notes, "The requirement to treat certain encodings as other encodings according to the table above is a willful violation of the W3C Character Model specification."
Other interesting changes this week:
- r2122 makes it non-conforming to have empty unquoted attribute values like
<input value=>
- r2130 provides a special case for the
definitionURL
attribute for those embedding MathML in HTML.
- r2140 and r2141 allow a legacy DOCTYPE if and only if the HTML page is the result of an XSLT transform. The exact DOCTYPE string is
<!DOCTYPE HTML PUBLIC 'XSLT-compat'>
Tune in next week for another exciting episode of "This Week in HTML 5."
Footnotes:
Posted in Weekly Review | 15 Comments »