The WHATWG Blog — microdata

This Week in HTML5 – Episode 38

Tuesday, October 20th, 2009

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

This week, there were some more refinements to microdata. r4139 changes the names of the DOM properties that reflect microdata markup. r4140 renames the content property to itemValue Since no browser has actually implemented this API yet, these changes shouldn't make any difference. Standards are like sex; one mistake, and you're stuck supporting it forever! r4141 and r4147 fix up some microdata examples, in particular this example from Gavin Carothers about marking up O'Reilly's book catalog. Hooray for real-world examples!

There were also some noteworthy changes to the <video> and <audio> API. r4131 says that setting the src attribute on one of those elements should call its load() method. r4132 removes the load event for multimedia elements, and r4133 removes the "in progress" events (loadstart, loadend, and progress) that used to be fired while the video/audio file was downloading.

Other noteworthy changes this week:

r4097 defines fallback content for the obsolete <applet> element.
r4098 "dramatically simplifies <script defer> and <script async> handling." [Background: bug 7792]
r4106 makes the step argument to the <input> element's stepUp() and stepDown() methods optional.
r4111 removes <link rel=feed>. As I documented earlier this year, rel=feed was a reasonable idea that never took off. Only one browser ever implemented it, and in a survey of 3 billion pages I could only find a single page that used it.
r4126 lists suggested default encodings for different locales. [Background: RE: HTML5 Issue 11 (encoding detection): I18N WG response...]
r4138 drops support for non-UTF-8 encodings in Web Workers. [Background: [whatwg] Please always use utf-8 for Web Workers]
r4099 marks the creation of Web Applications 1.0, a super-spec that contains HTML5, pre-defined microdata vocabularies, Web Workers, Web Storage, Web Database, Server-sent Events, and Web Sockets. This marks the first time that some of those specs have been published by the WHATWG, rather than the W3C, and therefore the first time that said specs have been published under a Free-Software-compatible license. (The W3C is still deciding whether to use such a license.)

Around the web:

An Introduction to HTML5 covers a lot of ground
Video on the Web is the latest chapter from my upcoming book on HTML5.

Tune in next week for another exciting edition of "This Week in HTML5."

Posted in Weekly Review | 5 Comments »

This Week in HTML5 – Episode 37

Friday, October 9th, 2009

The big news this week is microdata. Google sponsored a usability study on microdata syntax, which resulted in significant changes to the spec [r4066]. Also related: r4067 fixes a microdata example, r4068 updates the algorithm for extracting RDF triples from microdata, r4069 does some spec cleanup, and r4070 splits out the predefined microdata syntaxes into their own specs:

There was also work on events this week. r4032 defines what events are involved in copy and paste, closing bug 7668. r4037 defines when the reset event fires, closing bug 7699. r4039 defines when the abort event fires, closing bug 7700.

This week brings another milestone, one which went mostly unremarked in mailing lists, blogs, and IRC chatter. As with any large project, Ian Hickson has maintained an informal wishlist of things he would like to clarify, define, or otherwise include in HTML5. The list has grown and shrunk over the years. The list was stored in HTML comments, so it has never been visible unless you viewed the source of the HTML5 specification itself. And as with any large project, there comes a time when you realize you're not going to get to everything on your wishlist.

This week, the wishlist shrunk a lot, as Ian finally "punted" on several issues. Some of them may be tackled in HTML6. (Of course, if someone feels strongly enough, they can certainly argue that an issue still needs to be tackled in HTML5.) r4023 shows the deletions from the wishlist, including: "ability for a web app to save a file to the local disk," proposals for new attributes on the <title> element, partial form validation, multi-column select widgets, auto-formatting of number fields (like many spreadsheet programs do), relative dates, input controls for repeating dates (like anniversaries or other repeating events), and input controls for currency.

Other noteworthy changes this week:

r4011 syncs with the latest Origin spec, closing bug 7599.
r4031 allows user agents to explicitly disable <canvas> support.
r4042 limits PUT and DELETE actions on web forms to the same origin as the page. This is similar to the restriction on XMLHttpRequest.
r4057 defines <applet>.
r4076 disallows the backtick (`) character in unquoted attribute values, because Internet Explorer will treat it as an attribute value delimiter.
r4082 adds the document.head property, which makes me very happy.
r4083 states that an <audio> element without controls should always be hidden. (You can still make a visible <audio> element; just give it a controls attribute.)
r4086 tries to clarify the ever-elusive WindowProxy object.
r4091 registers the various HTTP headers that are used in the new features of HTML5, including Ping-From and Ping-To.
r4092 and r4094 add a non-normative index of HTML elements and attributes. Think of it as an "HTML5 cheat sheet." Various third parties have attempted such a list, but none have been able to keep up with the maintenance required as HTML5 evolved.

Around the web:

Sniffing for RSS 1.0 feeds served as text/html, my original research into how browsers treat mis-labeled RSS feeds. My proposal was accepted and incorporated into the latest draft of the Content Sniffing spec.
mimesniff, my implementation of the Content Sniffing draft spec. Requires Python 3.1 or later.
SVG at Google and in Internet Explorer, by my friend and colleague Brad Neuberg (the mastermind behind SVGWeb).
A cute animated cartoon about HTML5 and <canvas>, using HTML5 and <canvas>.
I will be speaking on HTML5 at two upcoming Google Developer Days. The first is in Prague on November 6; the second is in Moscow on November 10.

Tune in next week for another exciting edition of "This Week in HTML5."

Posted in Weekly Review | 2 Comments »

This Summer in HTML 5 – Episode 33

Wednesday, August 26th, 2009

I hope you enjoyed your summer. My oldest son started kindergarten today. Let's talk about HTML 5.

When last we checked, HTML 5 was humming along towards Last Call in October. Much has been made of this date; I won't bore you with the details, except to say that HTML 5 is very close to entering the next phase of its existence. Regular readers of this blog already know that parts of HTML 5 are already shipping in major browsers. The recently-released Firefox 3.5 supports <audio> and <video>, offline web applications, the drag-and-drop API, and the <canvas> text API. (Technically Firefox 3.0 supported the <canvas> text API too, properly cordoned off in its own vendor-specific functions because the API was not finalized at the time. You can paper over the differences fairly easily.)

So what new and exciting stuff has been added to HTML 5 this summer?

Microdata

At the table in the kitchen, there were three bowls of porridge. Goldilocks was hungry. She tasted the porridge from the first bowl. "This porridge is too hot!" she exclaimed.

So, she tasted the porridge from the second bowl. "This porridge is too cold," she said.

So, she tasted the last bowl of porridge. "Ahhh, this porridge is just right," she said happily and she ate it all up.

— The Story of Goldilocks and the Three Bears

r3074 introduces the concept of microdata. Microdata is designed to allow authors to include additional semantics in their pages for which there is no appropriate HTML element or attribute. For example, HTML is not expressive enough to mark up a contact in an address book (complete with individual fields for name, street address, email, and phone number) or an event on a calendar (complete with start date, end date, and location). Instead of creating new elements and attributes for every possible vocabulary, you can use the microdata attributes to enhance existing elements.

There are a number of other technologies with goals similar to microdata, including microformats and RDFa. As Ian Hickson explained in the message "Annotating structured data that HTML has no semantics for" that introduced microdata, microformats are fine for specific formats but are not flexible enough to be parseable by a generic parser, while RDFa relies on CURIEs and XML namespaces in a way that would require changes to HTML parsing algorithms to work interoperably between text/html and application/xhtml+xml. (Forgive me if I didn't explain that very well. There was a lot of yelling and very little explaining once it became clear that RDFa was not going to be included in HTML 5, so I probably missed some of the nuances.) Work is ongoing to create an RDFa-in-HTML specification.

ARIA

ARIA stands for "Accessible Rich Internet Applications." It is an emerging standard for making web applications more accessible to people using assistive technologies (including, but not limited to, blind people who browse the web with the help of screenreaders). The basic technique is for authors to define "roles" and "states" on individual elements to indicate what sort of control the element represents. For example, HTML has no "treeview" control, but JavaScript libraries like Dojo let you include a treeview in your web-based application with a combination of generic HTML elements, a few images, and a whole lotta JavaScript. ARIA gives you a way to say that the "treeview" HTML element (which is probably just a <div>) is acting as a treeview (that's its "role"). Each item in the treeview can be in the "expanded" or "collapsed" state, and the state changes as the user interacts with the control. Major browsers, including Microsoft Internet Explorer (8) and Firefox (2+) will notice the custom role on the element and announce to assistive technologies that this <div> element is acting as a treeview. (In fact, Dojo already supports these roles and states, due to work funded by IBM.)

r3657 adds the section Annotations for assistive technology products to HTML 5. There are still a number of unanswered questions about how the custom semantics defined by ARIA interact with the native semantics defined by HTML 5.

Everything Old is New Again

As regular readers of this blog already know, HTML 5 goes to great lengths to specify existing browser behavior, even to the point of "willfully violating" other specifications. Vast stretches of the HTML 5 specification are devoted to elements, attributes, and scripting features that nobody likes but everyone is required to support. To that end, r3502 defines the <listing>, <plaintext>, <acronym>, <xmp>, and <dir> elements; r3133 and r3141 define the <marquee> element; r3155, r3403, r3409, and r3410 define document.all.

Other important changes include the location.reload() method (r3220), the textarea.textLength property (r3177), a new rollback() method for synchronous SQL transactions r3210), and the ability to upload multiple files at a time from a web form (r3544 and r3545).

Features Removed

"The food here is terrible!"

"I know, and such small portions!"

(variously attributed)

Everyone complains that HTML 5 is too big, but nobody has any reasonable solution for making it smaller. (Splitting it into multiple specifications to make it "smaller" is like cutting a pie into slices to give it fewer calories.) However, based on implementor feedback, HTML 5 has shed a few ~~pounds~~features this summer. To wit:

r3555 removes the <datagrid> element and its associated APIs. Originally envisioned as a two-dimensional editable "spreadsheet-lite," it was never implemented in any browser.
r3621 removes the <bb> element, which was originally designed to support "installing" web applications as standalone programs. There were a number of security-related concerns, and browser vendors flatly refused to implement it.
r3342 removes any mention of what an optimal video codec would look like. Contrary to popular belief, this revision does not remove the <video> element itself; the <video> element is alive and well and implemented in Safari, Firefox, Google Chrome, and an experimental build of Opera. However, it is true that there is no single video codec that is supported out-of-the-box by all browsers. Firefox and Opera only support Ogg Theora, Google Chrome supports H.264 and Theora, and Safari supports whatever QuickTime supports (which doesn't include Ogg Theora unless you install a third-party plugin).

Administrative Stuff

"Man didn't the right form."

"What man?"

"The man from the cat detector van."

"The loony detector van, you mean."

"Look, it's people like you what cause unrest."

— Monty Python's "Fish License"

When web servers send you HTML, they are supposed to label it as such with the HTTP Content-Type header. Each content type (an HTML page, a JPEG image, an MPEG-4 video) has its own "MIME type." MIME types must be registered with the IANA.

r3552 adds the registration information for text/html, application/xhtml+xml, text/event-stream, text/cache-manifest, and application/microdata+json. r3582 adds the registration information for text/ping.

Standards frequently include references to other standards. References can be "normative" or "informative." To quote RFC 3967 (a standard about creating standards), "a normative reference specifies a document that must be read to fully understand or implement the subject matter in the new [standard], or whose contents are effectively part of the new [standard], as its omission would leave the new [standard] incompletely specified. An informative reference is not normative; rather, it provides only additional background information." r3580 adds a list of references to HTML 5.

Tune in next week as we return to our regular weekly schedule of "This Week in HTML 5."

Posted in Weekly Review | 7 Comments »