The WHATWG Blog

Please leave your sense of logic at the door, thanks!

Archive for the ‘Processing Model’ Category

Why the Alt Attribute May Be Omitted

Thursday, August 23rd, 2007

The specification of the alt attribute was recently worked on to thoroughly improve its definition, including an in depth explanation of how to provide appropriate alternate text, with clear authoring requirements.

The requirements describe situations where alternate text must be provided, where an empty alt attribute must be used and, most controversially, where the alt attribute may be omitted entirely. This is controversial because at first glance, it seems like an attempt to endorse the bad and inaccessible practice of omitting the alt attribute, and thus yet another slap in the face for accessibility. That is an unfortunate misconception that needs to be carefully examined to settle any concerns people have. Although it may seem backwards, the situation is actually much more positive.

There are many observed cases where alternate text is simply unavailable and there’s little that can be done about it. For example, most users of photo sharing sites like Flickr wouldn’t have a clue how or why to provide alternate text, even if Flickr provided the ability. While everyone agrees that it would be wonderful if all users did – indeed, the spec strongly encourages that – most users simply won’t.

The problem being addressed is what should be done in those cases where no alt text has been provided and is virtually impossible to acquire. With the current requirement for including the alt attribute in HTML4, it has been observed that many systems will attempt to fulfil the requirement by generating alternate text from the images metadata.

Flickr, for example, repeats the images title; Photobucket appears to combine the image’s filename, title and the author’s username; and Wikipedia redundantly repeats the image caption. The problem with these approaches is that using such values does not provide any additional or useful information about the image and, in some cases, this is worse than providing no alternate text at all.

The benefit of requiring the alt attribute to be omitted, rather than simply requiring the empty value, is that it makes a clear distinction between an image that has no alternate text (such as an iconic or graphical representation of the surrounding text) and an image that is a critical part of the content, but for which not alt text is available. It has been claimed that Lynx and Opera already use this distinction. For images without alt attributes Lynx shows the filename and Opera displays "Image", but neither show anything for images with empty alt attributes. It is still somewhat questionable whether this distinction is actually useful and whether or not browsers can realistically make such a distinction with real world content, and that is certainly open to debate if you have further evidence to provide.

It has been suggested that taking away the unconditional requirement for the alt attribute will affect the ability of validators to notify authors of their mistakes and take away a useful tool for promoting accessibility. However, using validation errors as an accessibility evangelism tool is not necessarily the only, nor the best, way to address the issue.

While it is indeed very useful for authors to know when they have mistakenly omitted an alt attribute, attempting to unconditionally enforce their use, using a tool as blunt as a validator, is counter productive since it encourages the use of poor quality, automatically generated text. Besides, nothing will prevent conformance checkers and authoring tools from notifying authors, if they so desire.

No practical accessibility benefits are lost by conceding the fact that you cannot force everyone to provide alternate text and making the alt attribute optional for the purpose of document conformance. No-one is claiming that conformance to HTML5 equates to conformance with accessibility requirements. There are lots of things that are considered technically conforming in HTML, yet still inaccessible if used poorly. Making alt technically optional doesn't stand in the way of accessibility requirements, nor greatly impact upon accessibility evangelism. It just acknowledges the reality of the situation in the hope of reducing the prevalence of poor quality, automatically generated alt text.

Posted in Browsers, Elements, Processing Model | 46 Comments »

Implementation of the HTML5 parsing algorithm in Java

Friday, August 17th, 2007

There is now an open-source implementation of the HTML5 parsing algorithm in Java 5: the Validator.nu HTML Parser. The parser can be used as a drop-in replacement for the XML parser in applications that use SAX, DOM or XOM APIs to read XHTML 1.x content with an XML parser.

Posted in Conformance Checking, Processing Model, Syntax | Comments Off on Implementation of the HTML5 parsing algorithm in Java

Table Integrity Checker

Tuesday, November 14th, 2006

I am working on a conformance checking service for (X)HTML5. The service is grammar-based for the most part with RELAX NG as the schema language. Some extra-grammatical constraints are expressed as Schematron assertions. Currently, as a Mozilla Foundation grantee, I am working on writing checkers (in Java) for spec features that cannot (practically or at all) be checked using RELAX NG or Schematron.

In a Web two-point-ohey perpetual beta fashion, I am deploying the new prototype features early to allow testing.

The first non-schema checker prototype is a table integrity checker. Since the table model for (X)HTML5 is now being specified, the prototype is speculatively based on the HTML 4.01 table model and browser behavior. The differences from HTML 4.01 are that colspan='0' is treated as colspan='1' and that headers must refer to th cells. The top left corner of cells is placed in the first available slot on the row, which is browser-compatible but different from what the CSS2 spec says.

The checker emits both warnings and errors. Depending on how the spec turns out, errors may become warnings or vice versa.

Currently, the errors are:

Currently, the warnings are:

The table integrity checker only sees a projection of the document tree that contains nothing but table-significant elements and crazy subtrees of table-significant elements in wrong places are silently pruned. These are dealt with on the RELAX NG level. The table integrity checker assumes that it is being used together with a reasonable schema.

The table integrity checker is also enabled for the HTML 4.01 / XHTML 1.0 presets on the generic side of the service, so testing with today’s content is possible.

There’s a pseudo-schema called http://hsivonen.iki.fi/checkers/table/ which isn’t a schema but a magic URL that causes the system to instantiate the table integrity checker. There’s a pseudo-pseudo-schema called http://hsivonen.iki.fi/checkers/all/ which expands to all pseudo-schemas, but at the moment, there’s only one.

Please let me know if the table integrity checker does not work as advertised.

Posted in Conformance Checking, Processing Model | Comments Off on Table Integrity Checker