The WHATWG Blog — Henri Sivonen

Author Archive

Major content model changes in HTML5 (and Validator.nu)

Sunday, January 13th, 2008

The HTML5 content models became more lax in December in response to feedback from various people who found the stricter content model—especially the bimorphic content model of div—unhelpful. The notions “strictly inline content”, “structured inline content”, “block content” and “block or inline content but not both” (aka. bimorphic) are now gone.

The elements formerly known as inline are now phrasing elements in order to make a distinction with the display: inline; CSS property. Content models that previously accepted only block content or either block or inline but not both now accept both. Phrasing content and content formerly known as block content are now prose content when taken together. The practical effect is that the conformance requirements became closer to HTML 4.01 Transitional than Strict; the former requirement for strictness turned out to be hard to justify in face of actual authoring patterns and browser support for those authoring patterns.

The content model changes are now also reflected in Validator.nu. There are some known differences from the spec, though:

<style scoped> support is broken due to spec ambiguity.
<font> is not supported. The draft says the element will likely be dropped.
Nested <menu> elements are not supported due to implementability issues with the current spec text.
Data templates are not supported. The draft is annotated with a note saying they are being considered for removal.
style='…' is supported on all elements.

Posted in Conformance Checking, Elements, Syntax | Comments Off on Major content model changes in HTML5 (and Validator.nu)

Validator.nu Web service API

Tuesday, November 27th, 2007

Validator.nu has had a Web service API for a while. It has not had documentation, though. This has now changed: Validator.nu Web service API docs.

Posted in Conformance Checking | Comments Off on Validator.nu Web service API

Validating attribute values

Monday, November 26th, 2007

Traditionally, SGML-based HTML validation has treated most attribute values as “anything goes” strings. This has meant that all kinds of bogus values have passed as valid. W3C XML Schema added a fixed set of datatypes. The spec is mostly useless for HTML5 validation, since the HTML5 microsyntaxes do not match exactly the XSD datatypes for the same concepts. XSD regular expressions were suitable for representing the syntax of a number of HTML5 microsyntaxes, though. XSD datatypes can be used from RELAX NG and Validator.nu used to XSD regular expressions for many HTML5 microsyntaxes.

The problem with using XSD regular expressions has been that they are not user-friendly. When an attribute value did not match the required regular expression, the UI told the user that the attribute value was “bad”. Nothing else.

Fortunately, unlike XSD, RELAX NG allows pluggable datatype libraries. A datatype library is a library written in a general-purpose programming language. The RELAX NG engine calls the library to check if a string conforms to a named datatype. Validator.nu has used this approach for a long time for the more complex microsyntaxes in HTML5.

I have recently made an effort to move Validator.nu away from XSD regular expressions to a more comprehensive custom datatype library. Even though as a formalism regular expressions are sufficient for many syntaxes, writing checking by hand allows more useful error messages in cases of failure. Moreover, having identifiers for the datatypes makes it possible to tell which datatype failed as opposed to the UI being able to tell that some regular expression failed under the hood. This allows the UI to pull in per-datatype advice from a wiki page.

In addition to improving the user experience with previously supported microsyntaxes such as integers, I have implemented support for previously unsupported microsyntaxes such as MIME types and Media Queries.

There is still work to do. For example, the syntaxes for accept-charset and WF2 type=email are not done. data: and mailto: IRIs are not properly validated yet. The syntaxes for image map coordinates still use XSD regular expressions. The advice on the wiki page is far from complete. (You can help!)

For the parts already implemented, please try the new features out and let me know what needs improvement.

Posted in Conformance Checking, Syntax | Comments Off on Validating attribute values

Wiki collaboration to describe HTML5 microsyntaxes

Tuesday, November 20th, 2007

Currently, Validator.nu mines the HTML 5 spec for UI text describing permissible content models, element contexts and element-specific attributes. The text is shown when an element or attribute is misplaced on missing.

Unfortunately, the spec does not contain similarly extractable text for microsyntax descriptions. Microsyntaxes are syntaxes that appear mostly in attribute values—for example, HTML5 integer, Web Forms 2.0 week, RFC 2616 media types (aka. MIME types) or CSS3 Media Queries.

Based on IRC discussions, there is interest in producing the descriptions collaboratively. To that end, I have seeded the WHATWG wiki with a page for microsyntax descriptions. If you would like to help make validator messages better, please feel free to edit the wiki (under the MIT license).

Posted in Conformance Checking, Syntax | 1 Comment »

Not that 80

Tuesday, September 18th, 2007

In his post Parroting Pareto, Jeremy Keith says that HTML5 needs to cover cases that “fall far outside the 80%-90% curve”, in particular accessibility. “By their very nature, accessibility concerns are not going to affect the majority of users. That doesn’t mean they can be dismissed.”

My understanding of applying the 80/20 rule to the design of HTML5 is that the “80” isn’t about 80% of users. It is about (proverbial) 80% of authoring cases. That is, it doesn’t make sense to support (for accessibility or otherwise) things that people would only publish very rarely if engineering support for the rarity would complicate the implementation of the language significantly.

See Hixie’s email to the HTML WG on the topic.

Posted in WHATWG | 4 Comments »