One of the features we've added in HTML5 is a way to include
machine-readable annotations that people can scrape in a simple and
well-defined way. This means that if a site wants to make the
information available, you don't have to rely on brittle
screen-scraping to get the information out.
This is easiest to understand with an example.
Suppose that you had an issue tracking database like Bugzilla, and that you wanted
other tools to be able to pull information about issues in that
Today, Bugzilla exposes an XML file for each bug, but this means
maintaining two parallel formats for the bug page. Instead of
providing such a separate interface, you can use microdata, the
new attributes in HTML5. That way, even as your issue tracker changes
its interface from version to version, the underlying data can still
be reliably readable from the same HTML page.
Imagine the markup today looks like this:
<h1>Issue 12941: Too many pies in the pie factory</h1>
To annotate this with microdata, we just mint some names, and then
label each field with those names. The names are in "reverse-DNS"
form; if the bug system was at "example.net", then the names would be
"net.example.bug", "net.example.number", and so on. Thus we get:
<h1>Issue <span itemprop="net.example.number">12941</span>:
<span itemprop="net.example.title">Too many pies in the pie factory</span></h1>
<dd itemprop="net.example.reporter">[email protected]</dd>
item="net.example.bug" attribute says "here is a
bug". The various
itemprop attributes provide name/value
pairs for the bug. The snippet above would result in the following
tree of data:
net.example.number = "12941"
net.example.title = "Too many pies in the pie factory"
net.example.reporter = "[email protected]"
net.example.priority = "AAA"
Now it doesn't matter if the page is dramatically changed, the same
data can still be made unambiguously available:
<h1>Example.Net Bugs Database</h1>
<h1 itemprop="net.example.title">Too many pies in the pie factory</span></h1>
<p>#<span itemprop="net.example.number">12941</span>; reported
by <span itemprop="net.example.reporter">[email protected]</span>.</p>
<p>PRIORITY: <strong itemprop="net.example.priority">AAA</strong>.</p>
This concludes this brief introduction to microdata! Some future blog posts will introduce a few aspects of microdata that I didn't discuss here:
- How to annotate URIs, dates and times, and hidden data using microdata.
- How to nest items within each other.
- How to annotate an item with more than one type, or how to give a single value multiple names.
- The predefined vocabularies.
- How to add annotations outside of an
Are you interested in reviewing HTML5 for errors?
- Jump in! All feedback is welcome, from anyone.
- Open the specification: either the one-page version, or the multipage version or the PDF copy (A4, Letter)
- Start reading! See below for ideas of what to look for.
If you find a problem, either send an e-mail to the WHATWG list ([email protected], subscription required), file a bug (registration required), send an e-mail to the [email protected] list (no subscription required), or send an e-mail directly to [email protected]
If everything goes according to plan, all issues will get a response from the editor before October. You can track how many issues remain to be responded to on our graph.
What to look for
The plan is to see whether we can shake down the spec and get rid of all the minor problems that have so far been overlooked. Typos, confusion, cross-reference errors, as well as mistakes in examples, errors in the definitions, and major errors like security bugs or contradictions.
Anyone who helps find problems in the spec — however minor — will get their name in the acknowledgements section.
You don't really need any experience to find the simplest class of problems: things that are confusing! If you don't understand something, then that's a problem. Not all the introduction sections and examples are yet written, but if there is a section with an introduction section that isn't clear, then you've found an issue: let us know!
Something else that would now be good to search for is typos, spelling errors, grammar errors, and the like. Don't hesitate to send e-mails even for minor typos, all feedback even on such small issues is very welcome.
If you have a specific need as a Web designer, then try to see if the need is met. If it isn't, and you haven't discussed this need before, then send an e-mail to the list. (So for example, if you want HTML to support date picker widgets, you'd look in the spec to see if it was covered. As it turns out, that one is!)
If you have some specific expertise that lets you review a particular part of the spec for correctness, then that's another thing to look for. For example if you know about graphics, then reviewing the 2D Canvas API section would be a good use of your resources. If you know about scripting, then looking at the "Web browsers" section would be a good use of your time.
Staying in touch
You are encouraged to join our IRC channel #whatwg on Freenode to stay in touch with what other people are doing, but this is by no means required. You are also encouraged to post in the Discussion section on the wiki page for this review project, or in the blog comments below, to let people know what you are reviewing. You can get news updates by following @WHATWG on Twitter.
I gave a talk at Google on Monday demonstrating the various features of HTML5 that are implemented in browsers today. The video is now on YouTube, so now you too can watch and laugh at my lame presentation skills!
The segments of this talk are as follows. Some of the demos are available online for you to play with and are linked to from the following list:
- Drag and Drop API (29:05)
- Form Controls (40:50)
- Validation (1:07:20)
- Questions and Answers (1:09:35)
If you're very interested in watching my typos, the high quality version of the video on the YouTube site is clear enough to see the text being typed. More details about the demos can be found on the corresponding demo page.
The four hottest topics in the WHATWG Issues List are:
The video codec issue is being actively worked on, but we're not close to a good solution yet (it's mostly an economic and political issue, not a technical one, which is why we don't have any transparency on this issue, sadly). I recently responded to most of the table-related feedback. Web Forms 2 work is waiting for a decision from the W3C's forms task force on whether WF2 will be integrated as-is into HTML5 or whether it will be changed before being merged. The namespace issue is the one I'm working on now.
The first thing I have to do is work out what the problem is! There has been a lot of discussion, but not much of it is focussed on a problem, most of it is focussed on possible solutions. One can't evaluate a solution without knowing what it's trying to solve, though. To this end, I have created a wiki page where I will note down any problem descriptions I can find as I read all 367 of the e-mails in this folder.
Feel free to help! If you want to coordinate, I'm Hixie in #whatwg on Freenode IRC.
The W3C is having its technical plenary day today, and a number of WHATWG contributors are there. It's hard to participate remotely in this event, but you can watch and listen — the W3C is publishing an audio stream (in Ogg; a Java applet alternative is available too), and has commissioned realtime captioning for the event. There's also W3C IRC channel on the topic on irc.w3.org, port 6665, channel #tp,
password . You can also chat with WHATWG contributors who are present at the event on our own IRC channel.
beantown (it's not clear why there's a password, just go with it)
The agenda for the day is available from the W3C site. Don't forget to adjust the times from the Boston timezone to your timezone if you want to listen to a particular session.