The WHATWG Blog

Please leave your sense of logic at the door, thanks!

The Road to HTML 5: getElementsByClassName()

Welcome back to my semi-regular column, "The Road to HTML 5," where I'll try to explain some of the new elements, attributes, and other features in the upcoming HTML 5 specification.

The feature of the day is getElementsByClassName(). Long desired by web developers and implemented in Javascript libraries like Prototype, this function does exactly what it says on the tin: it returns a list of elements in the DOM that define one or more classnames in the class attribute. getElementsByClassName() exists as a method of the document object (for searching the entire DOM), as well as on each HTMLElement object (for searching the children of an element).

The HTML 5 specification defines getElementsByClassName():

The getElementsByClassName(classNames) method takes a string that contains an unordered set of unique space-separated tokens representing classes. When called, the method must return a live NodeList object containing all the elements in the document, in tree order, that have all the classes specified in that argument, having obtained the classes by splitting a string on spaces. If there are no tokens specified in the argument, then the method must return an empty NodeList. If the document is in quirks mode, then the comparisons for the classes must be done in an ASCII case-insensitive manner, otherwise, the comparisons must be done in a case-sensitive manner.

A Brief History of getElementsByClassName()

Can We Use It?

Yes We Can! As you can tell from the timeline, getElementsByClassName() is supported natively in Firefox 3, Opera 9.5, Safari 3.1, and all versions of Google Chrome. It is not available in any version of Microsoft Internet Explorer. (IE 8 beta 2 is the latest version as of this writing.) To use it in browsers that do not support it natively, you will need a wrapper script. There are many such scripts; I myself am partial to Robert Nyman's Ultimate GetElementsByClassName. It uses the native getElementsByClassName() method in modern browsers that support it, then falls back to the little-known document.evaluate() method, which is supported by older versions of Firefox (since at least 1.5) and Opera (since at least 9.27). If all else fails, Robert's script falls back to recursively traversing the DOM and collecting elements that match the given classnames.

And in conclusion

getElementsByClassName() is well-supported across all modern browsers except IE, and a performance-optimized open source wrapper script can cover IE and older browsers.

Tags: ,
Posted in Tutorials, WHATWG | 7 Comments »

This Week in HTML 5 – Episode 12

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group. The primary editor was traveling this week, so there are very few spec changes to discuss. Instead, I'd like to try something a little different.

It has been suggested (1, 2, 3, &c.) that HTML 5 is trying to bite off more than it can metaphorically chew. It is true that it is a large specification, and it might benefit from being split into several pieces. But it is not true that it includes everything but the kitchen sink.

For example, HTML 5 will not

Daniel Schattenkirchner asked whether Almost-Standards mode is still needed. Almost-Standards mode is a form of DOCTYPE sniffing invented by Mozilla in 2002 to address line heights in table cells containing images. Bug 153032 implemented the new mode, which Mozilla called "Almost Standards mode" and HTML 5 calls "limited quirks mode." Henri Sivonen made the point that it would probably be too expensive to get rid of the mode. Like it or not, we're probably stuck with it.

And finally, a gem that I missed when it was first discussed: back in July, "Lars" provided the best documentation of the <keygen> element, ever.

Tune in next week for another exciting episode of "This Week in HTML 5."

Tags: , , ,
Posted in Weekly Review | 3 Comments »

The Road to HTML 5 – Episode 1: the section element

Welcome to a new semi-regular column, "The Road to HTML 5," where I'll try to explain some of the new elements, attributes, and other features in the upcoming HTML 5 specification.

The element of the day is the <section> element.

The section element represents a generic document or application section. A section, in this context, is a thematic grouping of content, typically with a header, possibly with a footer. Examples of sections would be chapters, the various tabbed pages in a tabbed dialog box, or the numbered sections of a thesis. A Web site's home page could be split into sections for an introduction, news items, contact information.

Discussion of sections and headers dates back several years. In November 2004, Ian Hickson wrote:

Basically I want three things:

  1. It has to be possible to take existing markup (which correctly uses <h1>-<h6>) and wrap the sections up with <section> (and the other new section elements) and have it be correct markup. Basically, allowing authors to replace <div class="section"> with <section>, <div class="post"> with <article>, etc.
  2. It has to be possible to write new documents that use the section elements and have the headers be automatically styled to the right depth (and maybe automatically numbered, with appropriate CSS), and yet still be readable in legacy UAs, without having to think about old UAs. Basically, the header element has to be header-like in old browsers.
  3. It shouldn't be too easy to end up with meaningless markup when doing either of the above. So a random <h4> in the middle of an <h2> and an <h3> has to be defined as meaning _something_.

At the moment what I'm thinking of doing is this (most of these ideas are in the draft at the moment, but mostly in contradictory ways):

The section elements would be:

<body> <section> <article> <navigation> <sidebar>

The header elements would be:

<header> <h1> <h2> <h3> <h4> <h5> <h6>

<h1> gives the heading of the current section.

<header> wraps block-level content to mark the whole thing as a header, so that you can have, e.g., subtitles, or "Welcome to" paragraphs before a header, or "Presented by" kind of information. <header> is equivalent to an <h1>. The first highest-level header in the <header> is the "title" of the section for outlining purposes.

<h2> to <h6> are subsection headings when used in <body>, and equivalent to <h1> when used in one of the section elements.

<h1> automatically sizes to fit the current nesting depth. This could be a problem in CSS since CSS can't handle this kind of thing well -- it has no "or" operator at the simple selector level.

<h2>-<h6> keep their legacy renderings for compatibility.

Further discussion:

Fast-forward to modern times. Using the <section> element instead of, say, <div class="section">, seems like a no-brainer. Unfortunately, there's a catch. (Hey, it's the web; there's always a catch.) Not all modern browsers recognize the <section> element, which means that they fall back to their default handling of unknown elements.

A long digression into browsers' handling of unknown elements

Every browser has a master list of HTML elements that it supports. For example, Mozilla Firefox's list is stored in nsElementTable.cpp. Elements not in this list are treated as "unknown elements." There are two fundamental problems with unknown elements:

  1. How should the element be styled? By default, <p> has spacing on the top and bottom, <blockquote> is indented with a left margin, and <h1> is displayed in a larger font.
  2. What should the element's DOM look like? Mozilla's nsElementTable.cpp includes information about what kinds of other elements each element can contain. If you include markup like <p><p>, the second paragraph element implicitly closes the first one, so the elements end up as siblings, not parent-and-child. But if you write <p><span>, the span does not close the paragraph, because Firefox knows that <p> is a block element that can contain the inline element <span>. So the <span> ends up as a child of the <p> in the DOM.

Different browsers answer these questions in different ways. (Shocking, I know.) Of the major browsers, Microsoft Internet Explorer's answer to both questions is the most problematic.

The first question should be relatively simple to answer: don't give any special styling to unknown elements. Just let them inherit whatever CSS properties are in effect wherever they appear on the page, and let the page author specify all styling with CSS. Unfortunately, Internet Explorer does not allow styling on unknown elements. For example, if you had this markup:

<style type="text/css">
  section { border: 1px solid red }
</style>
...
<section>
<h1>Welcome to Initech</h1>
<p>This is our <span>home page</span>.</p>
</section>

Internet Explorer (up to and including IE8 beta 2) will not put a red border around the section.

The second problem is the DOM that browsers create when they encounter unknown elements. Again, the most problematic browser is Internet Explorer. If IE doesn't explicitly recognize the element name, it will insert the element into the DOM as an empty node with no children. All the elements that you would expect to be direct children of the unknown element will actually be inserted as siblings instead. I've posted an ASCII graph that illustrates this mismatch.

Sjoerd Visscher discovered a workaround for this problem: after you create a dummy element with that name, IE will recognize the element enough to let you style it with CSS. You can put the script in the <head> of your page, and there is no need to ever insert it into the DOM. Simply creating the element once (per page) is enough to teach IE to style the element it doesn't recognize. Sample code and markup:

<html>
<head>
<style type="text/css">
  section { display: block; border: 1px solid red }
</style>
<script type="text/javascript">
  document.createElement("section");
</script>
</head>
<body>
<section>
<h1>Welcome to Initech</h1>
<p>This is our <span>home page</span>.</p>
</section>
</body>
</html>

This hack works in IE 6, IE 7, and IE 8 beta 1, but it doesn't work in IE 8 beta 2. (bug report, test case) The purpose of this illustration is not to blame IE; there's no specification that says what the DOM ought to look like in this case, so IE's handling of the "unknown element" problem is not any more or less correct than any other browser. With the createElement workaround, you can use the <section> element (or any other new HTML 5 element) in all browsers except IE 8 beta 2. I am not aware of any workaround for this problem.

And in conclusion

The <section> element is a very straightforward HTML 5 feature that you can't actually use yet.

Tags: , , , , ,
Posted in Tutorials, Weekly Review | 8 Comments »

This Week in HTML 5 – Episode 11

Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group. Last Friday was Halloween for some of you; in the United States, it involves dressing up in slutty costumes, begging your neighbors for handouts, and getting diabetes. Yesterday, many of you set your clocks back one hour for Daylight Savings Time. And for those of you on the Gregorian calendar, it is now November.

Dates and times loom large in this week's updates. "What is today's date?" is a deceptively simple question, matched in complexity only by the related question, "What time is it?" Sources for Time Zone and Daylight Saving Time Data gives a good overview of the current state of the art for answering both questions. In the movie Crocodile Dundee, Mick says he once asked an Aboriginal elder when he was born; the elder replied, "in the summertime."

r2381 defines global dates and times:

A global date and time consists of a specific Gregorian date, consisting of a year, a month, and a day, and a time, consisting of an hour, a minute, a second, and a fraction of a second, expressed with a time zone, consisting of a number of hours and minutes.

r2382 defines local dates and times:

A local date and time consists of a specific Gregorian date, consisting of a year, a month, and a day, and a time, consisting of an hour, a minute, a second, and a fraction of a second, but expressed without a time zone.

r2383 defines a month:

A month consists of a specific Gregorian date with no timezone information and no date information beyond a year and a month.

r2384 and r2385 define a week:

A week consists of a week-year number and a week number representing a seven day period. Each week-year in this calendaring system has either 52 weeks or 53 weeks, as defined below. A week is a seven-day period. The week starting on the Gregorian date Monday December 29th 1969 (1969-12-29) is defined as week number 1 in week-year 1970. Consecutive weeks are numbered sequentially. The week before the number 1 week in a week-year is the last week in the previous week-year, and vice versa.

A week-year with a number year that corresponds to a year year in the Gregorian calendar that has a Thursday as its first day (January 1st), and a week-year year where year is a number divisible by 400, or a number divisible by 4 but not by 100, has 53 weeks. All other week-years have 52 weeks.

The week number of the last day of a week-year with 53 weeks is 53; the week number of the last day of a week-year with 52 weeks is 52.

Note: The week-year number of a particular day can be different than the number of the year that contains that day in the Gregorian calendar. The first week in a week-year year is the week that contains the first Thursday of the Gregorian year year.

<input> form elements can be declared to take a local date and time, a global date and time, a date, a time, a month, or a week. You can also declare a global date and time in a <time> element or in the datetime attribute of <ins> and <del>.

HTML 5 does not define weekends or holidays, and therefore does not define business days. Interstellar datekeeping has been pushed back to HTML 6.

In other news, Chris Wilson suggested a different strategy for the much-maligned <q> element, which kicked off a long discussion, which in turn spawned several tangential discussions: <q> and commas, <q> vs <p>, UA style sheet for <q>, <q addmarks=true>, and the overly-optimistically-titled Final thoughts on <q>. The basic problem is that, while HTML 4 clearly states that user agents should render with delimiting quotation marks, Microsoft Internet Explorer (prior to IE8b2) did not do so. IE8b2 does do so, but it falls back to client-side regional settings to display quotation marks in pages where the author has not specified the language (which is the vast majority of pages). Also, in some languages, convention dictates alternating single and double quotes for nested quotations, but HTML 4 did not specify how to handle this, and different browsers handle nested quotation marks in different ways.

Other interesting tidbits this week:

Tune in next week for another exciting episode of "This Week in HTML 5."

Tags: , , , ,
Posted in Weekly Review | 1 Comment »

Bay area meetup details

The bay area meetup will be Tuesday 7PM (October 28) at St Stephen's Green in Mountain View. If you have any further questions feel free to ask them in a comment here or on IRC.

To be perfectly clear, the meeting is open to everyone who wants to come. All that is required is that you actually show up at the pub. See you there!

Posted in WHATWG | Comments Off on Bay area meetup details