The WHATWG Blog

HTML5 conformance checking in Vim

May 18th, 2008 by MikeSmith

Kai Hendry has written an HTML filetype plugin for Vim that allows you to use Henri Sivonen’s Validator.nu conformance checking (validation) service remotely to check the contents of any HTML document you edit in Vim and determine if the document is HTML5-conformant (valid).

The filetype plugin is also demo'ed in a screencast tutorial on editing Web applications that Kai has blogged about in a VIM IDE for Web applications posting on his blog (see the blog posting for a link to the video).

All that you need to do to install the Vim filetype plugin is to download the plugin source and save it into ~/.vim/ftplugin/html.vim. To use it to check a document, first do :make within Vim, then use :cope and :clist and :cnext and such to locate the errors (for more details, read the section of the Vim docs that relates to those commands.)

How and why it works

Vim has a set of “quickfix” commands that provide something that many development IDEs also have these days: A way to run a compiler or lint checker or other external tool on the contents of a file you are editing, and then to have any errors returned — along with the line and column numbers of the places in your file where the errors occur — as a list that you can then easily step through or jump through one-by-one and fix. It’s a very powerful feature.

Kai’s HTML filetype plugin provides a way to use Vim’s “quickfix” commands to do conformance checking of HTML5 files. The plugin is dead simple; it’s just two lines:

set makeprg=curl\ -s\ -F\ laxtype=yes\ -F\ parser=html5\
  \ -F\ level=error\ -F\ out=gnu\ -F\ doc=@%\
  http://validator.nu
set errorformat=\"%f\":%l.%c-%m

(Note that I've just wrapped the first line for the purpose of readability in this post.)

The makeprg option in the first line tells Vim what “make program” you want to use when checking HTML files. And the errorformat option in the second line tells Vim the expected format of error messages from that “make program” — so that it can parse the error messages to get the line and column numbers of the places in your file where the errors occur (the meanings of the various parts of the string used in that errorformat value are: %f, filename; %l, line number; %c, column number; %m, error message).

Interaction with Validator.nu

What Kai’s HTML filetype plugin does it to use as the “make program” the curl command-line HTTP client, and in turn, to have curl send a POST request to Validator.nu. The contents of that POST request are set by the parameters and values specified by the -F options passed to curl. Essentially what this does is to emulate what would happen if you used the form-based interface at the Validator.nu website to manually set the values of the various form fields in that interface. (Note that wget could be probably used here (with different options) to do the same thing.)

What Validator.nu does in return is to send a response with the list of errors — in a format that allows the list of errors to be easily parsed by tools that have built-in support (like Vim’s “quickfix”) for reading error lists that are in a regular format and doing something with them.

GNU-formatted error output

In this case, since the out=gnu parameter and value were passed to Validator.nu, the particular format in which Validator.nu returns the error list is the standard GNU error format that’s used by many applications (including that other editor, Emacs). This use case (enabling remote validation and error-evaluation with editing applications) is actually one of the main cases for which Henri added the GNU-formatted error-reporting option to Validator.nu.

Validator.nu + Vim = easy HTML5 conformance checking

The end result is that you get the error information back into Vim in a way that lets you more easily locate and fix the errors.

So setting just two options is all it takes in an editing application like Vim to enable Validator.nu to be used remotely like this (that is, to do integrated HTML5 conformance-checking and error-reporting within the editor). This seems to me to be a pretty good testament (another in a long list) to the utility of the Validator.nu service and to the foresight that’s gone into its design.

It guess it also says a lot about the utility of Vim and the foresight that’s gone into its design — but we all already know how great Vim is, right? 🙂

Tags: tools, validator.nu, vim
Posted in Conformance Checking | 5 Comments »

Reverse Ordered Lists

April 23rd, 2008 by Lachlan Hunt

One of the newly introduced features in HTML 5 is the ability to mark up reverse ordered lists. These are the same as ordered lists, but instead of counting up from 1, they instead count down towards 1. This can be used, for example, to count down the top 10 movies, music, or LOLCats, or anything else you want to present as a countdown list.

In previous versions of HTML, the only way to achieve this was to place a value attribute on each li element, with successively decreasing values.

<h3>Top 5 TV Series</h3>
<ol>
  <li value="5">Friends
  <li value="4">24
  <li value="3">The Simpsons
  <li value="2">Stargate Atlantis
  <li value="1">Stargate SG-1
</ol>

The problem with that approach is that manually specifying each value can be time consuming to write and maintain, and the value attribute was not allowed in the HTML 4.01 or XHTML 1.0 Strict DOCTYPEs (although HTML 5 fixes that problem and allows the value attribute)

The new markup is very simple: just add a reversed attribute to the ol element, and optionally provide a start value. If there’s no start value provided, the browser will count the number of list items, and count down from that number to 1.

<h3>Greatest Movies Sagas of All Time</h3>
<ol reversed>
  <li>Police Academy (Series)
  <li>Harry Potter (Series)
  <li>Back to the Future (Trilogy)
  <li>Star Wars (Saga)
  <li>The Lord of the Rings (Trilogy)
</ol>

Since there are 5 list items in that list, the list will count down from 5 to 1.

The reversed attribute is a boolean attribute. In HTML, the value may be omitted, but in XHTML, it needs to be written as: reversed="reversed".

The start attribute can be used to specify the starting number for the countdown, or the value attribute can be used on an li element. Subsequent list items will, by default, be numbered with the value of 1 less than the previous item.

The following example starts counting down from 100, but omits a few items from the middle of the list and resumes from 3.

<h3>Top 100 Logical Fallacies Used By Creationists</h3>
<ol reversed="reversed" start="100">
  <li>False Dichotomy</li>
  <li>Appeal to Ridicule</li>
  <li>Begging the Question (Circular Logic)</li>
  <!-- Items omitted here -->
  <li value="3">Strawman</li>
  <li>Bare Assertion Fallacy</li>
  <li>Argumentum ad Ignorantiam</li>
</ol>

Posted in Elements, Syntax | 26 Comments »

New Image Report Feature in Validator.nu

April 18th, 2008 by Henri Sivonen

There have been lots and lots of e-mail on the public-html mailing list about making the alt attribute syntactically required in HTML5. At the core of this debate is on one hand using HTML5 validators to send a strong message about accessibility and on the other hand of avoiding a situation where a simplified and idealistic strong message leads to behavior that is counterproductive considering the goal of making the Web accessible. As a policy debate, it is similar to abstinence-only sex education debates.

A validator is a computer program and cannot tell if a textual alternative is appropriate for a given image in a given context. That's why accessibility checking needs to be done by a person. A person may use a software tool to make the checking easier, but trusting on fully automated software to determine whether a page is accessible is misguided.

Given this basic problem, a policy that insists on the alt attribute always being present doesn’t necessarily lead to accessibility. In fact, considering that syntactic correctness and accessibility are different evaluation axes both in terms of computability and in terms of how HTML authors (other than accessibility advocates) tend to view things (judging from observations about the behavior of HTML authors who use validators), a policy that insists on the alt attribute being always present will likely cause people to put the attribute in there but with inappropriate content. In particular, putting an empty alt on images whose presence is important for understanding the context of other content is bad, because in that case the presence of those images is concealed from a non-graphical user. Also, a textual alternative that just says “image” is not an improvement over what, for example, Safari with VoiceOver says in the absence of alt, but would be worse than a smarter client-side heuristic.

Furthermore, there is a very real case where a textual alternative simply isn’t available to the HTML generator: a user uploads photos to a content management system and refuses to supply textual alternatives at the same moment. HTML 4 didn’t account for this case. In fact, requiring alt to under all circumstances assumes that markup is written by a person who knows what the images are at the time of writing markup. It doesn’t make sense to pretend that the case where the markup generator doesn’t have textual alternatives available doesn’t exist. The HTML 5 syntax needs to account for all use cases.

Expecting markup generators to knowingly emit markup that is not valid is not a winning proposition. Quoting me from 2006:

Authoring tools are judged by taking a page authored using the tool and running it through the W3C Validator or, presumably in the future, through an HTML5 conformance checker. Authoring tool makers who are capable of making their tool produce syntactically conforming documents will want to do so and minimize the chance that the users of their software tarnish the reputation of the tool in the eyes of people who use an automated test as a litmus test of authoring tool bogosity. (People who test tools that way will outnumber the people who make a more profound analysis due to the "validate, validate, validate" propaganda.)

To summarize: As a matter of principle, subjective checking or checking that is not applicable for all pages does not belong in the validation function. Practice is more important than principle, though. Baking the alt requirement into the validation function would be bad when the user of the validation function wants a clean report on syntax but isn’t as concerned with accessibility. It is bad for accessibility when authors put the simplest value that silences the validator into the attribute in order to make the validation report look clean, since doing so gives user agents like Safari with VoiceOver less information to work with. That's why I think the requirement to have an alt attribute present doesn’t belong in the validation function also as a practical matter.

It turns out, though, that some people think of validation as a first step toward accessibility, even though syntactic correctness and accessibility really are different evaluation axes. They expect a validator to help them flag images that are lacking a textual alternative. Moreover, the alt issue seems to be taken as the single most important web accessibility issue with the rest of issues somewhere in the long tail. When there is a demand for validators to flag images without alt, validators probably should meet the demand.

To this end, I have developed a new feature for Validator.nu: Image Report. This new feature is not part of the validation function. It also doesn’t do exactly want people are asking of the syntax definition in the long e-mail thread. (It is not a new idea for a validator user interface to offer tools that help a human perform an assessment about the page outside the validation function. For example, the W3C Validator has offered a “Show Document Outline” feature, which is also on file as a request for enhancement for Validator.nu.)

The new feature tries to address the issue of finding missing textual alternatives but it also seeks to address the issue of faulty textual alternatives. Furthermore, it seeks to address these in a way that doesn’t induce people to write bad textual alternatives in order to make the report look cleaner.

When you turn the feature on, it always lists all the images. There is no textual alternative you can fake to make the list look shorter. Instead, there are four categories and you can only change the category in which an image appears.

This has the benefit of removing the badge hunting problem: people trying to silence the validator without actually raising the quality of their page. However, it also has the benefit that the user can review the textual alternatives for appropriateness and the user can review that the right images have been marked as omitted from non-graphical presentation. Since this tool addresses more problems than simply making alt required on the syntax level, I believe this solution is much better than furiously staying entrenched in the status quo of HTML 4 validation, fearing so much a step backwards as to being too afraid to explore steps forward.

Finally, it should be noted that this feature is, by necessity, itself inaccessible to people who cannot view bitmap images. Yet, I think it is legitimate for this feature to be implemented with an HTML user interface. Also, this feature itself is a case where the generator of the user interface markup has no knowledge of the content of the images it is presenting to the user. Hence, it is itself an example of omitting the alt attribute. It would be truly ironic, if the syntax definition of HTML5 prevented Validator.nu from being self-validating.

Posted in Conformance Checking, Syntax | 4 Comments »

Validator.nu HTML Parser 1.0.7 Released

April 5th, 2008 by Henri Sivonen

There is now a new release of the Validator.nu HTML Parser. Change highlights:

Adds optional support for heuristic encoding sniffing using the ICU4J sniffer, jchardet or both.
Adds support for rewinding and reparsing when becoming confident about the character encoding and the tentative encoding was wrong.
Performs encoding name matching per spec instead of using the JDK mechanism.
Implements spec changes up until just before SVG and MathML support. (Those will merit 1.1 or something.)
Warning: The semantics of the doctype token have changed in case you have your own token handler (unlikely).

Posted in Processing Model, Syntax | Comments Off on Validator.nu HTML Parser 1.0.7 Released

Exploring new vocabularies for HTML

March 24th, 2008 by Ian Hickson

The four hottest topics in the WHATWG Issues List are:

Finding a suitable common codec for the video element.
The accessibility of tabular data.
Web Forms 2.
Using markup from namespaces other than HTML in text/html.

The video codec issue is being actively worked on, but we're not close to a good solution yet (it's mostly an economic and political issue, not a technical one, which is why we don't have any transparency on this issue, sadly). I recently responded to most of the table-related feedback. Web Forms 2 work is waiting for a decision from the W3C's forms task force on whether WF2 will be integrated as-is into HTML5 or whether it will be changed before being merged. The namespace issue is the one I'm working on now.

The first thing I have to do is work out what the problem is! There has been a lot of discussion, but not much of it is focussed on a problem, most of it is focussed on possible solutions. One can't evaluate a solution without knowing what it's trying to solve, though. To this end, I have created a wiki page where I will note down any problem descriptions I can find as I read all 367 of the e-mails in this folder.

Feel free to help! If you want to coordinate, I'm Hixie in #whatwg on Freenode IRC.

Posted in WHATWG | Comments Off on Exploring new vocabularies for HTML