The WHATWG Blog

Archive for the ‘Syntax’ Category

New Image Report Feature in Validator.nu

Friday, April 18th, 2008

There have been lots and lots of e-mail on the public-html mailing list about making the alt attribute syntactically required in HTML5. At the core of this debate is on one hand using HTML5 validators to send a strong message about accessibility and on the other hand of avoiding a situation where a simplified and idealistic strong message leads to behavior that is counterproductive considering the goal of making the Web accessible. As a policy debate, it is similar to abstinence-only sex education debates.

A validator is a computer program and cannot tell if a textual alternative is appropriate for a given image in a given context. That's why accessibility checking needs to be done by a person. A person may use a software tool to make the checking easier, but trusting on fully automated software to determine whether a page is accessible is misguided.

Given this basic problem, a policy that insists on the alt attribute always being present doesn’t necessarily lead to accessibility. In fact, considering that syntactic correctness and accessibility are different evaluation axes both in terms of computability and in terms of how HTML authors (other than accessibility advocates) tend to view things (judging from observations about the behavior of HTML authors who use validators), a policy that insists on the alt attribute being always present will likely cause people to put the attribute in there but with inappropriate content. In particular, putting an empty alt on images whose presence is important for understanding the context of other content is bad, because in that case the presence of those images is concealed from a non-graphical user. Also, a textual alternative that just says “image” is not an improvement over what, for example, Safari with VoiceOver says in the absence of alt, but would be worse than a smarter client-side heuristic.

Furthermore, there is a very real case where a textual alternative simply isn’t available to the HTML generator: a user uploads photos to a content management system and refuses to supply textual alternatives at the same moment. HTML 4 didn’t account for this case. In fact, requiring alt to under all circumstances assumes that markup is written by a person who knows what the images are at the time of writing markup. It doesn’t make sense to pretend that the case where the markup generator doesn’t have textual alternatives available doesn’t exist. The HTML 5 syntax needs to account for all use cases.

Expecting markup generators to knowingly emit markup that is not valid is not a winning proposition. Quoting me from 2006:

Authoring tools are judged by taking a page authored using the tool and running it through the W3C Validator or, presumably in the future, through an HTML5 conformance checker. Authoring tool makers who are capable of making their tool produce syntactically conforming documents will want to do so and minimize the chance that the users of their software tarnish the reputation of the tool in the eyes of people who use an automated test as a litmus test of authoring tool bogosity. (People who test tools that way will outnumber the people who make a more profound analysis due to the "validate, validate, validate" propaganda.)

To summarize: As a matter of principle, subjective checking or checking that is not applicable for all pages does not belong in the validation function. Practice is more important than principle, though. Baking the alt requirement into the validation function would be bad when the user of the validation function wants a clean report on syntax but isn’t as concerned with accessibility. It is bad for accessibility when authors put the simplest value that silences the validator into the attribute in order to make the validation report look clean, since doing so gives user agents like Safari with VoiceOver less information to work with. That's why I think the requirement to have an alt attribute present doesn’t belong in the validation function also as a practical matter.

It turns out, though, that some people think of validation as a first step toward accessibility, even though syntactic correctness and accessibility really are different evaluation axes. They expect a validator to help them flag images that are lacking a textual alternative. Moreover, the alt issue seems to be taken as the single most important web accessibility issue with the rest of issues somewhere in the long tail. When there is a demand for validators to flag images without alt, validators probably should meet the demand.

To this end, I have developed a new feature for Validator.nu: Image Report. This new feature is not part of the validation function. It also doesn’t do exactly want people are asking of the syntax definition in the long e-mail thread. (It is not a new idea for a validator user interface to offer tools that help a human perform an assessment about the page outside the validation function. For example, the W3C Validator has offered a “Show Document Outline” feature, which is also on file as a request for enhancement for Validator.nu.)

The new feature tries to address the issue of finding missing textual alternatives but it also seeks to address the issue of faulty textual alternatives. Furthermore, it seeks to address these in a way that doesn’t induce people to write bad textual alternatives in order to make the report look cleaner.

When you turn the feature on, it always lists all the images. There is no textual alternative you can fake to make the list look shorter. Instead, there are four categories and you can only change the category in which an image appears.

This has the benefit of removing the badge hunting problem: people trying to silence the validator without actually raising the quality of their page. However, it also has the benefit that the user can review the textual alternatives for appropriateness and the user can review that the right images have been marked as omitted from non-graphical presentation. Since this tool addresses more problems than simply making alt required on the syntax level, I believe this solution is much better than furiously staying entrenched in the status quo of HTML 4 validation, fearing so much a step backwards as to being too afraid to explore steps forward.

Finally, it should be noted that this feature is, by necessity, itself inaccessible to people who cannot view bitmap images. Yet, I think it is legitimate for this feature to be implemented with an HTML user interface. Also, this feature itself is a case where the generator of the user interface markup has no knowledge of the content of the images it is presenting to the user. Hence, it is itself an example of omitting the alt attribute. It would be truly ironic, if the syntax definition of HTML5 prevented Validator.nu from being self-validating.

Posted in Conformance Checking, Syntax | 4 Comments »

Validator.nu HTML Parser 1.0.7 Released

Saturday, April 5th, 2008

There is now a new release of the Validator.nu HTML Parser. Change highlights:

Adds optional support for heuristic encoding sniffing using the ICU4J sniffer, jchardet or both.
Adds support for rewinding and reparsing when becoming confident about the character encoding and the tentative encoding was wrong.
Performs encoding name matching per spec instead of using the JDK mechanism.
Implements spec changes up until just before SVG and MathML support. (Those will merit 1.1 or something.)
Warning: The semantics of the doctype token have changed in case you have your own token handler (unlikely).

Posted in Processing Model, Syntax | Comments Off on Validator.nu HTML Parser 1.0.7 Released

Validator.nu now more useful when migrating existing designs

Saturday, February 2nd, 2008

Due to implementation details, the HTML5 facet of Validator.nu used to ignore the content of obsolete elements such as center, because obsolete elements were simply unknown. This wasn’t particularly useful when assessing the HTML5-upgradeability of an existing design that wrapped everything in center, for example.

The HTML5 facet of Validator.nu now knows about obsolete container elements that existed as deprecated in HTML 4.01. This means that center is still an error, but the contents are now checked as HTML5.

Also, Validator.nu now allows legacy-style internal encoding declarations per the latest Editor’s Draft.

Posted in Conformance Checking, Syntax | Comments Off on Validator.nu now more useful when migrating existing designs

Validator.nu HTML Parser 1.0.6 Released

Tuesday, January 22nd, 2008

Version 1.0.6 of the Validator.nu HTML Parser has been released. The new version fixes a crasher bug in bytes to characters conversion, works around a crash when the ICU4J 3.8.1 UTF-7 decoder is in the classpath, improves error message wording and brings errors and warnings pertaining to legacy encodings up-to-date per the current HTML 5 draft.

This update is highly recommended for all applications that use the parser by giving it an URI or an InputStream. For applications that give the parser a Reader the update is not necessary.

Posted in Processing Model, Syntax | Comments Off on Validator.nu HTML Parser 1.0.6 Released

Major content model changes in HTML5 (and Validator.nu)

Sunday, January 13th, 2008

The HTML5 content models became more lax in December in response to feedback from various people who found the stricter content model—especially the bimorphic content model of div—unhelpful. The notions “strictly inline content”, “structured inline content”, “block content” and “block or inline content but not both” (aka. bimorphic) are now gone.

The elements formerly known as inline are now phrasing elements in order to make a distinction with the display: inline; CSS property. Content models that previously accepted only block content or either block or inline but not both now accept both. Phrasing content and content formerly known as block content are now prose content when taken together. The practical effect is that the conformance requirements became closer to HTML 4.01 Transitional than Strict; the former requirement for strictness turned out to be hard to justify in face of actual authoring patterns and browser support for those authoring patterns.

The content model changes are now also reflected in Validator.nu. There are some known differences from the spec, though:

<style scoped> support is broken due to spec ambiguity.
<font> is not supported. The draft says the element will likely be dropped.
Nested <menu> elements are not supported due to implementability issues with the current spec text.
Data templates are not supported. The draft is annotated with a note saying they are being considered for removal.
style='…' is supported on all elements.

Posted in Conformance Checking, Elements, Syntax | Comments Off on Major content model changes in HTML5 (and Validator.nu)