Microdata (part 1)
One of the features we've added in HTML5 is a way to include machine-readable annotations that people can scrape in a simple and well-defined way. This means that if a site wants to make the information available, you don't have to rely on brittle screen-scraping to get the information out.
This is easiest to understand with an example.
Suppose that you had an issue tracking database like Bugzilla, and that you wanted other tools to be able to pull information about issues in that database.
Today, Bugzilla exposes an XML file for each bug, but this means maintaining two parallel formats for the bug page. Instead of providing such a separate interface, you can use microdata, the new attributes in HTML5. That way, even as your issue tracker changes its interface from version to version, the underlying data can still be reliably readable from the same HTML page.
Imagine the markup today looks like this:
<body> <h1>Issue 12941: Too many pies in the pie factory</h1> <dl> <dt>Reporter</dt> <dd>firstname.lastname@example.org</dd> <dt>Priority</dt> <dd>AAA</dd> ...
To annotate this with microdata, we just mint some names, and then label each field with those names. The names are in "reverse-DNS" form; if the bug system was at "example.net", then the names would be "net.example.bug", "net.example.number", and so on. Thus we get:
<body item="net.example.bug"> <h1>Issue <span itemprop="net.example.number">12941</span>: <span itemprop="net.example.title">Too many pies in the pie factory</span></h1> <dl> <dt>Reporter</dt> <dd itemprop="net.example.reporter">email@example.com</dd> <dt>Priority</dt> <dd itemprop="net.example.priority">AAA</dd> ...
item="net.example.bug" attribute says "here is a
bug". The various
itemprop attributes provide name/value
pairs for the bug. The snippet above would result in the following
tree of data:
net.example.bug: net.example.number = "12941" net.example.title = "Too many pies in the pie factory" net.example.reporter = "firstname.lastname@example.org" net.example.priority = "AAA"
Now it doesn't matter if the page is dramatically changed, the same data can still be made unambiguously available:
<body> <h1>Example.Net Bugs Database</h1> <section item="net.example.bug"> <h1 itemprop="net.example.title">Too many pies in the pie factory</span></h1> <p>#<span itemprop="net.example.number">12941</span>; reported by <span itemprop="net.example.reporter">email@example.com</span>.</p> <p>PRIORITY: <strong itemprop="net.example.priority">AAA</strong>.</p> ...
This concludes this brief introduction to microdata! Some future blog posts will introduce a few aspects of microdata that I didn't discuss here:
- How to annotate URIs, dates and times, and hidden data using microdata.
- How to nest items within each other.
- How to annotate an item with more than one type, or how to give a single value multiple names.
- The predefined vocabularies.
- How to add annotations outside of an