The WHATWG Blog — Lachlan Hunt

Author Archive

Plans for HTML6

Sunday, April 1st, 2007

The WHATTF have today decided to reveal our plans for HTML6. The objective is to rewrite HTML5 based off the new ISO standard OOXML specification as the serialization. The architectural model will adhere to the principle of separation of semantics from presentation using RDF for semantics and XSL-FO for presentation. An XML Schema will be provided for semantic validation.

Advantages:

Over 90% market penetration (almost all, if not all, users own an Office suite);
Backwards compatibility. Since the new HTML6 will have OOXML as the serialization, all OOXML implementations of today will be able to accurately render HTML6.
Easy migration. Using a fairly simplistic XSLT style sheet, all developers will be able to convert their HTML4 documents into HTML6, and vice-versa.
Ready for the enterprise. Thanks to XSLT and GRDDL, mapping the over-the-wire OOXML data to the RDF/XSL-FO model will be trivial. This approach gives the best of both worlds: While OOXML provides compatibility for entry-level applications, the RDF/XSL-FO-based architectural model integrates with the enterprise-strength backplane for rich applications.
WS-* integration. XML Schema provides for binding with SOAP-based intermediation solutions.

We firmly believe that new HTML version should maintain backwards compatibility and specification writers must not reinvent the wheel. We've come to the conclusion that the OOXML specification adheres to these goals, making life easier for everyone: users, developers, implementors and spec writers

More details will follow soon.

Posted in WHATWG | 7 Comments »

The Future of HTML Presentation Slides

Monday, February 26th, 2007

Last month, I did a presentation on the future of HTML at the WSG meeting in Sydney. For those of you who couldn't make it, or those who wish to hear it again, I have finally got around to publishing the slides, audio recording and transcript.

Posted in WHATWG | Comments Off on The Future of HTML Presentation Slides

Declaring the Character Encoding

Thursday, February 22nd, 2007

HTML requires that authors declare the character encoding of the file either using HTTP headers (when served over HTTP) or metadata in the file. In previous versions of HTML, authors could specify the character encoding using a relatively complex meta element like this:

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

The idea of the http-equiv attribute was that it would act as a substitute for real HTTP headers. However, in practice, that is not entirely true. Only a few headers actually have any effect in browsers. In fact, HTML4 even suggested that servers use this attribute to gather information for HTTP response message headers; but in reality, no known server ever did this.

Although the MIME type is included in the value for the Content-Type header above, it has no effect in browsers. The only useful and practical piece of information in that element is: charset=UTF-8.

In order to simplify the meta element and remove unnecessary markup, HTML5 has changed it slightly. The new way to declare the character encoding in the file will be to use the following:

<meta charset="UTF-8">

Obviously, that is much shorter and easier to remember. Luckily, due to the way encoding detection has been implemented by browsers, it is backwards compatible and believed to be supported by all known browsers.

Along with this, the spec has recently defined how encoding detection must be implemented by browsers and imposed a few additional restrictions for documents to be considered conforming.

When serialised, the charset attribute and its value must be contained completely in the first 512 bytes of the file.
The attribute value must be serialised without the use of character entity references of any kind. e.g. You cannot use <meta charset=" UTF-8"> to declare UTF-8. This is because the encoding detection algorithm does not decode character references, because it occurs before the actual parsing begins.
The character encoding used must be a rough superset of US-ASCII e.g. you can’t use this for EBCDIC encoded files.
User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU encodings.

If the encoding is either UTF-8, UTF-16 or UTF-32, then authors can use a BOM at the start of the file to indicate the character encoding.

Posted in WHATWG | 12 Comments »

Web Standards Group: HTML 5 Presentation

Sunday, January 21st, 2007

On Thursday evening (2007-01-25), at the Web Standards Group meeting in Sydney, I'll be doing a presentation about the future of HTML. It will be a 20 minute presentation giving an overview of the new features in HTML5. So, if you're in Sydney and interested learning more about HTML5, or maybe the other 3 topics being presented, RSVP now and come along!

For those of you who can't make it on the night, the slides, and hopefully a podcast of the event, will be made available afterwards.

Posted in Events, WHATWG | Comments Off on Web Standards Group: HTML 5 Presentation

Feed Autodiscovery

Sunday, December 3rd, 2006

We’ve recently added link types to HTML 5. In particular we defined the mechanism for syndication feed autodiscovery. Autodiscovery has become widely deployed and implemented already since its inception in 2002, using the link element with the alternate relationship and a type attribute indicating the format of the feed.

<link rel="alternate" type="application/atom+xml"
      href="/feed.atom" title="Atom Feed">
<link rel="alternate" type="application/rss+xml"
      href="/feed.rss" title="RSS Feed">

For backwards compatibility, we must retain support for, and explicitly define, that method. However, there are two main issues with using the alternate relationship:

Syndication feeds are not necessarily alternate representations of the page.
The MIME type is not always a good indication that a resource is a feed. For example, hAtom uses regular HTML with the MIME type text/html, yet may still be used as a syndication feed format.

To address this issue, we have introduced a new feed relationship which indicates that the referenced document is a syndication feed. This now allows you to link to several different feeds containing different content which are not necessarily alternate versions of the page.

<link rel="feed" type="application/atom+xml"
      href="/feed/comments" title="All comments">
<link rel="feed" type="application/atom+xml"
      href="/feed/summaries" title="Article Summaries">

It also means that you do not need to specify the type attribute to have the link recognised as a syndication feed and browsers can still show it in the subscription list.

<link rel="feed" href="/feed" title="Articles">

Another benefit of this is that if there is ever a new syndication feed format, you don’t have to wait for browsers to be updated with the new MIME type to recognise it as a feed. For instance, if your feed reader supports the hAtom microformat, you could subscribe to an HTML document that has been linked to as a feed.

<link rel="feed" type="text/html"
      href="/feed.html" title="All comments">

In order to retain backwards compatibility, the definition for alternate says that when used in combination with a type attribute with the value of either application/rss+xml or application/atom+xml.then it implies the feed keyword as well.

The feed keyword can also be used in combination with alternate to say that it is specifically the feed for the current document.

<link rel="feed alternate" type="application/atom+xml"
      href="/feed.atom" title="Atom Feed">

However, it’s important not to confuse this with the way alternate stylesheets works. The behaviour of rel="alternate stylesheet" is a special case where the use of alternate doesn’t mean an alternate representation of the document itself. In fact, if when used together with stylesheet, that is the one case where the type value cannot imply the feed value.

<link rel="alternate stylesheet" type="application/atom+xml"
      href="/feed.atom" title="This is not a feed!">

Mozilla already has bugs filed for implementing the new feed relationship and fixing its bug with with rel="alternate stylesheet" which are planned for inclusion in Firefox 3.0.

Posted in Browsers, Elements | 16 Comments »