HTML 5 introduces new elements like <section>, <article> and <footer> for structuring the content in your webpages. They can be employed in many situations where <div> is used today and should help you make more readable, maintainable, HTML source. But if you just go through your document and blindly replace all the <div>s with <section>s you are doing it wrong.
This is not just semantic nit-picking, there is a practical reason to use these elements correctly.
In HTML 5, there is an algorithm for constructing an outline view of documents. This can be used, for example by AT, to help a user navigate through a document. And <section> and friends are an important part of this algorithm. Each time you nest a <section>, you increase the outline depth by 1 (in case you are wondering what the advantages of this model are compared to the traditional <h1>-<h6> model, consider a web based feedreader that wants to integrate the document structure of the syndicated content with that of the surrounding site. In HTML 4 this means parsing all the content and renumbering all the headings. In HTML5 the headings end up at the right depth for free). So a document like the following:
<h1>This is the main header</h1>
<h1>This is a subheader</h1>
<h1>This is a subsubheader</h1>
<h1>This is a second subheader</h1>
has an outline like:
This is the main header
+--This is a subheader
+--This is a subsubheader
+--This is a second subheader
If you just blindly convert all the <div>s on your pages to <sections> it's pretty unlikely your page will have the outline you expected. And, apart from being a semantic faux-pas, this will confuse the hell out of people who rely on headings for navigation.
Hopefully, in time, we will get tools that make this kind of mistake obvious and CSS support for selecting headings based on depth. Until then remember <section> is not just a semantic <div>
There has been a certain amount of controversy over the supposed date of 2022 for HTML 5 to be "finished". It is somewhat important to realise the significance that should be attached to this date:
None at all
OK, strictly speaking that's not quite true, but it's a pretty good approximation to the truth. What really matters is when browsers ship HTML5 features. Given that's already happening, there is really no cause for alarm. By 2022 we hope to have a full testsuite and two full implementations but then we also expect to see products shipping with features from HTML 6.
Lachlan Hunt and I recently gave a presentation entitled Getting Your Hands Dirty with HTML5 at the @media 2008 conference in London. The audience was mainly front-end developers; the kind of people who are using HTML to make a living, so it was a great chance to get the message out about some of the new features that have been under development.
The talk covered the Design Principles under which HTML5 is being developed, how some of the features of HTML5 can be used to enhance common web sites, and how people can get involved with the development of HTML5.
The presentation seemed to go reasonably well, especially given that we had not met till the morning of the talk although we did have fewer demos than I would have liked, both due to technical problems in the talk and a lack of time to prepare. So, for those who were at the talk (as well as those who were not), here are a somewhat random collection of demos of the HTML5 features we mentioned:
If anyone who saw the presentation is reading this and would like to provide constructive criticism on the talk, I would really appreciate it; giving talks is fun so it would be nice to get better at it 🙂
html5lib 0.10 is now available for your HTML-parsing pleasure.
html5lib is an implementation of the HTML 5 parsing algorithm, available in both Python and Ruby flavours. The HTML 5 algorithm is based on reverse engineering the behaviour of popular web browsers and so is compatible with the myriad of broken HTML encountered on the web.
Features in 0.10:
- Parse HTML to a variety of common tree formats including minidom, ElementTree and BeautifulSoup (Python), and hpricot and rexml (Ruby) as well as a custom simpletree format
- Automatic detection of character encoding from
meta elements and using frequency analysis (if chardet is available)
- Sanitization of markup and CSS using a whitelist approach
- Liberal XML parsing
- Conversion of trees to event streams and Genshi-inspired filters for those streams
- Flexible serializers for writing out streams in HTML and XHTML-syntax
- A prototype HTML 5 validator
- A large test suite
html5lib 0.9 is now available for your parsing pleasure.
html5lib is an implementation of the WHATWG HTML parsing algorithm in Python and released under a MIT-license It enables malformed HTML to be parsed into standard minidom and ElementTree structures,in a way that is highly compatible with the behavior of major desktop web browsers. As well as parsing to trees html5lib contains a DOM to SAX converter; it is hoped that by supporting these standard APIs, toolchains based on draconian XML parsers can be repurposed to process HTML content with minimal effort.
In addition to the HTML parsing capability, html5lib 0.9 contains an experimental liberal XML parser based on the WHATWG algorithm without the HTML-specific error handling. This is suitable for parsing XML from sources that cannot guarantee wellformedness; e.g. web feeds.
The 0.9 release is expected to be the last major release before 1.0 and no new features will be added before 1.0 is released. Instead we will work on any remaining correctness issues, other bugs, and on improving the messages reported when parse errors are encountered. Bug reports are very much appreciated. Users or people looking to get involved are encouraged to join the mailing list or visit the #WHATWG channel on freenode.net