The WHATWG Blog

Please leave your sense of logic at the door, thanks!

Archive for March, 2009

This Week in HTML 5 – Episode 25

Tuesday, March 31st, 2009

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.

The big news for the week of February 23rd (yes, I'm that far behind) is a collection of changes about how video is processed. The changes revolve around the resource selection algorithm to handle cases where the src attribute of a <video> element is set dynamically from script.

Background reading on the resource selection algorithm: Re: play() sometimes doesn't do anything now that load() is async, r2849: "Change the way resources are loaded for media elements to make it actually work", r2873: "make all invokations of the resource selection algorithm asynchronous."

Other video-related changes this week:

Another big change this week is the combination of r2859, 2860, and r2861: you can now declare the character encoding of XHTML documents served with the application/xhtml+xml MIME type by using the <meta charset> attribute, but only if the value is "UTF-8". Also, the charset attribute must appear in the first 512 bytes of the document. Previously, the only ways to control the character encoding of an application/xhtml+xml document were setting the charset parameter on the HTTP Content-Type header, or to use an encoding attribute in the XML prolog.

In practice, this will make no difference to encoding detection algorithms; other UTF-* encodings are detected earlier (with a Byte Order Mark), and any other encoding would require an XML prolog. This is mainly to address the desire of a few overly vocal authors to be able to serve the same markup in both text/html and application/xhtml+xml modes. Background reading: Bug 6613: Allow <meta charset="UTF-8"/> in XHTML.

Other interesting changes this week:

Tune in next week for another exciting episode of "This Week in HTML 5."

Posted in Weekly Review | 1 Comment »

Validator.nu HTML Parser 1.2.0

Friday, March 27th, 2009

I put together a new release of the Validator.nu HTML Parser. This is a highly recommended update for everyone who is using a previous version the parser in an application.

Posted in Processing Model, Syntax | No Comments »

<section> is not just a “semantic <div>”

Thursday, March 19th, 2009

HTML 5 introduces new elements like <section>, <article> and <footer> for structuring the content in your webpages. They can be employed in many situations where <div> is used today and should help you make more readable, maintainable, HTML source. But if you just go through your document and blindly replace all the <div>s with <section>s you are doing it wrong.

This is not just semantic nit-picking, there is a practical reason to use these elements correctly.

In HTML 5, there is an algorithm for constructing an outline view of documents. This can be used, for example by AT, to help a user navigate through a document. And <section> and friends are an important part of this algorithm. Each time you nest a <section>, you increase the outline depth by 1 (in case you are wondering what the advantages of this model are compared to the traditional <h1>-<h6> model, consider a web based feedreader that wants to integrate the document structure of the syndicated content with that of the surrounding site. In HTML 4 this means parsing all the content and renumbering all the headings. In HTML5 the headings end up at the right depth for free). So a document like the following:


<body>
  <h1>This is the main header</h1>
  <section>
    <h1>This is a subheader</h1>
    <section>
      <h1>This is a subsubheader</h1>
    </section>
  </section>
  <section>
    <h1>This is a second subheader</h1>
  </section>
</body>

has an outline like:

This is the main header
+--This is a subheader
    +--This is a subsubheader
+--This is a second subheader

If you just blindly convert all the <div>s on your pages to <sections> it's pretty unlikely your page will have the outline you expected. And, apart from being a semantic faux-pas, this will confuse the hell out of people who rely on headings for navigation.

Hopefully, in time, we will get tools that make this kind of mistake obvious and CSS support for selecting headings based on depth. Until then remember <section> is not just a semantic <div>

Posted in Elements, Tutorials | 24 Comments »

Supporting New Elements in Firefox 2

Wednesday, March 18th, 2009

We have previously talked about how to get Internet Explorer to play ball when using the new HTML5 elements, but today I'm going to talk about Firefox 2.

Firefox 2 (or any other Gecko-based browser with a Gecko version pre 1.9b5) has a parsing bug where it will close an unknown element when it sees the start tag of a "block" element like p, h1, div, and so forth. So if you have:

<body>
 <header>
  <h1>Test</h1>
 </header>
 <article>
  <p>...</p>
  ...
 </article>
 <nav>
  <ul>...</ul>
 </nav>
 <footer>
  <p>...</p>
 </footer>
</body>

...then in Firefox 2 it will be parsed as if it were:

<body>
 <header>
  </header><h1>Test</h1>
 
 <article>
  </article><p>...</p>
  ...
 
 <nav>
  </nav><ul>...</ul>
 
 <footer>
  </footer><p>...</p>
 
</body>

So if you style the new elements with CSS it will probably look completely broken in Firefox 2.

If you care about Firefox 2 then there are some ways to fix this:

  1. Go back to using div elements
  2. Use content type negotiation between text/html and application/xhtml+xml
  3. Fix up the DOM with scripting

(1) is probably wise if your content structure changes between pages or over time. (2) also works but means that users will be exposed to the Yellow Screen of Death should a markup error slip through your system. Otherwise (3) can be worth to consider.

Fixing up Firefox 2's DOM is actually pretty simple if you have a consistent structure. Using the same markup as above it could look something like this:

<body>
 <header>
  <h1>Test</h1>
 </header>
 <article>
  <p>...</p>
  ...
 </article>
 <nav>
  <ul>...</ul>
 </nav>
 <footer>
  <p>...</p>
 </footer>
 <!--[if !IE]>--><script>
  // dom fixup for gecko pre 1.9b5
  var n = document.getElementsByTagName('header')[0];
  if (n.childNodes.length <= 1) { // the element was closed early
    var tags = ['ARTICLE', 'NAV', 'FOOTER', 'SCRIPT'];
    for (var i = 0; i < tags.length; ++i) {
      while (n.nextSibling && n.nextSibling.nodeName != tags[i]) {
        n.appendChild(n.nextSibling);
      }
      n = n.nextSibling;
    }
  }
 </script><!--<![endif]-->
</body>

You might think that this script would work for IE, too, when not using the createElement hack, but apparently IE throws an exception when trying to append a child to an unknown element. So you still have to use the createElement hack for IE.

If you want to move the script to head and run it on load and you don't have anything after the footer then you would replace 'SCRIPT' in the tags array with undefined to make it work.

(If you want to do content type negotiation and want to just serve XHTML to Gecko-based browsers with this bug then you should look for the substrings "Gecko/" and "rv:1.x" where x is less than 9, or "rv:1.9pre" or "rv:1.9a" or "rv:1.9bx" where x is less than 5.)

Posted in Browsers, DOM, Elements | 17 Comments »

This Week Day in HTML 5 – Episode 24

Tuesday, March 10th, 2009

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group. The pace of HTML 5 changes has reached a fever pitch, so I'm going to split out these episodes into daily (!) rather than weekly summaries until things calm down.

The big news for February 13, 2009 is r2814/r2815, which adds a .value property on file upload controls:

On getting, [the .value property] must return the string "C:\fakepath\" followed by the filename of the first file in the list of selected files, if any, or the empty string if the list is empty. On setting, it must throw an INVALID_ACCESS_ERR exception.

According to bug 6529, Opera already implements something close to this, and is willing to modify their implementation to match the spec text.

The other big news of the day is r2804/r2805, which defines what happens when a focused element becomes hidden.

For example, this might happen because the element is removed from its Document, or has a hidden attribute added. It would also happen to an input element when the element gets disabled.

But let's back up a step. What does it mean for an element to be "focused" in the first place? The spec has a whole section on the concept of focus. The short answer is that focus is just what you think it is: it's the element that responds when you type. As you tab around a form, the focus moves between form fields, where you can type information; as you tab around a page, the focus moves between links, which you can follow; if you tab long enough, focus moves out of the page and back to the location bar, and so forth.

The not-so-short answer that everything in the previous paragraph is platform-specific and, depending on your browser, highly customizable. The HTML 5 spec respects these platform differences, and defers the definition of what elements should be focusable to the browser, which ultimately respects the conventions of the underlying operating system.

The long answer is that virtually anything can be focusable, because HTML 5 standardizes a crucial accessibility feature that most modern browsers now implement, namely that any element can have a tabindex attribute.

It may not sound like it, but this is a really, really important feature for building accessible web applications. HTML 4 restricted the tabindex attribute to links and form fields. For reasons unknown, Internet Explorer ignored that and respected a tabindex attribute on any element. Starting with Firefox 2, Mozilla co-opted this implementation and wrote up a rough "standard" about using the tabindex attribute on anything. Once it was implemented in the browser, IBM contracted with major screenreader vendors to support Firefox's (and IE's) behavior. This provided the foundation for pure-Javascript widgets to become keyboard-accessible (also implemented by IBM). Not just theoretically, but actually, in real shipping browsers and real shipping screenreaders. Under the covers, Dojo's complex widgets are marked up with semantically meaningless <div>s and <span>s, yet they are still focusable and keyboard-navigable. The controls are in the tab order, so you can focus with the keyboard, then you can use the keyboard to further manipulate them, change their state, and so forth.

In HTML 4, there was no way to put custom controls into the tab order without breaking markup validity (unless you futzed around with custom DTDs or scripting hacks), because the tabindex attribute could only be used on links and form fields. HTML 5 "paves the cowpaths" and standardizes this definition and behavior of tabindex-on-anything. This is a huge step forward for web accessibility. The concept of focus is central to accessibility, and HTML 5 gives it the attention it deserves. (There's more to making controls accessible than just keyboard navigability, but if you don't have keyboard navigability, nothing else really matters. If you're creating your own custom Javscript widgets, you must read the (non-vendor-specific) DHTML Style Guide for implementing keyboard accessibility in custom controls.)

Now then, why is r2804 important? Well, if the element that has focus suddenly can't have focus anymore -- because it was programmatically hidden or disabled, or it was removed from the DOM altogether -- then it is vitally important to specify where the focus goes. So the HTML 5 specification lays it all out:

When an element that is focused stops being a focusable element, or stops being focused without another element being explicitly focused in its stead, the user agent should run the focusing steps for the body element, if there is one; if there is not, then the user agent should run the unfocusing steps for the affected element only.

[Background reading: Re: Lose Focus When Hidden? (SVG ISSUE-2031)]

Other interesting changes of the day:

Tune in... well, sometime soon-ish for another exciting episode of "This Week Day In HTML 5."

Posted in Weekly Review | 1 Comment »