Friday, February 27th, 2009
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group. The pace of HTML 5 changes has reached a fever pitch, so I'm going to split out these episodes into daily (!) rather than weekly summaries until things calm down.
The big news for February 12, 2009 is the minting of the the spellcheck
attribute, which web authors can use to provide a hint about whether a particular form field expects the sort of input that would benefit from client-side spell checking. r2801 lays it out:
User agents can support the checking of spelling and grammar of
editable text, either in form controls (such as the value of
textarea
elements), or in elements in an editing
host (using contenteditable
).
For each element, user agents must establish a default behavior, either
through defaults or through preferences expressed by the user. There
are three possible default behaviors for each element:
- true-by-default
- The element will be checked for spelling and grammar if its
contents are editable.
- false-by-default
- The element will never be checked for spelling and grammar.
- inherit-by-default
- The element's default behavior is the same as its parent
element's. Elements that have no parent element cannot have this as
their default behavior.
The spellcheck
attribute is an enumerated attribute whose keywords are
true
and false
. The true
keyword map to the true state. The false
keyword maps to the false state. In
addition, there is a third state, the inherit state, which is
the missing value default (and the invalid value
default).
Starting with version 2, Mozilla Firefox has offered built-in spell checking of <textarea>
elements (on by default) and <input type=text>
elements (off by default). You can change the default behavior by setting the spellcheck
attribute. (test case)
The other big news of the day is the addition of the <form autocomplete>
attribute, while lets web authors provide a hint about whether they would like browsers to save the form's contents and pre-fill the form the next time the user encounters it. r2798:
When an input
element's resulting
autocompletion state is on, the user agent
may store the value entered by the user so that if the user returns
to the page, the UA can prefill the form. Otherwise, the user agent
should not remember the control's value.
... A user agent may allow the user to override the resulting
autocompletion state and set it to always on,
always allowing values to be remembered and prefilled), or always off, never remembering values. However, the ability to
override the resulting autocompletion state to on should not be trivially accessible, as there are
significant security implications for the user if all values are
always remembered, regardless of the site's preferences.
<form autocomplete>
is commonly used on sensitive login forms where the site does not want users to be able to store their password in their browser (which is generally done in an insecure way). Most browsers honor these hints by default, although there are ways to override them if you dislike the idea of web authors disabling useful bits of your browser's functionality.
Other interesting changes of the day:
- r2802 allows external Javascript files to contain a BOM to facilitate identifying scripts in non-ASCII-compatible character encodings.
- r2796 adds some examples of using the unloved
<small>
element.
Discussion of the day: Gregory J. Rosmaita gives details on report of PFWG HTML5 actions ("PFWG" = Protocols and Formats Working Group). The original post was about accessibility issues, specifically a response to the <image alt>
attribute becoming optional and the omission of the headers
and summary
attributes in the HTML 5 table model. But the thread was quickly hijacked by a discussion of the fact that the W3C published another working draft of HTML 5 on February 12.
Wait... what? Oh yes, in true "burying the lede" fashion, I suppose I should mention that the biggest news of February 12th is that the W3C published another working draft of HTML 5. Except that readers of this series will find it uninteresting, since it's just a snapshot of the progress-to-date. (The spec is "published" on whatwg.org every time it changes anyway.) Working drafts have no formal status; they are merely intended to encourage early and wide review. Still, the rest of the world might think it's important, so be sure to bring it up at this weekend's cocktail parties.
Tune in... well, sometime soon-ish for another exciting episode of "This Week Day In HTML 5."
Posted in Weekly Review | 2 Comments »
Wednesday, February 25th, 2009
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group. The pace of HTML 5 changes has reached a fever pitch, so I'm going to split out these episodes into daily (!) rather than weekly summaries until things calm down.
The big news for February 11, 2009 is the addition of an algorithm to parse a color in an IE-compatible way. r2776 lays it all out:
Some obsolete legacy attributes parse colors in a more
complicated manner, using the rules for parsing a legacy color
value, which are given in the following algorithm. When
invoked, the steps must be followed in the order given, aborting at
the first step that returns a value. This algorithm will either
return a simple color or an error.
Let input be the string being
parsed.
If input is the empty string, then
return an error.
If input is an ASCII
case-insensitive match for the string "transparent
", then return an error.
If input is an ASCII
case-insensitive match for one of the keywords listed in the
SVG color
keywords or CSS2 System
Colors sections of the CSS3 Color specification, then return
the simple color corresponding to that keyword. [CSS3COLOR]
-
If input is four characters long, and the
first character in input is a U+0023 NUMBER
SIGN (#) character, and the the last three characters of input are all in the range U+0030 DIGIT ZERO (0)
.. U+0039 DIGIT NINE (9), U+0041 LATIN CAPITAL LETTER A .. U+0046
LATIN CAPITAL LETTER F, and U+0061 LATIN SMALL LETTER A .. U+0066
LATIN SMALL LETTER F, then run these substeps:
Let result be a simple
color.
Interpret the second character of input as a hexadecimal digit; let the red
component of result be the resulting number
multiplied by 17.
Interpret the third character of input
as a hexadecimal digit; let the green component of result be the resulting number multiplied by
17.
Interpret the fourth character of input as a hexadecimal digit; let the blue
component of result be the resulting number
multiplied by 17.
Return result.
Replace any characters in input that
have a Unicode codepoint greater than U+FFFF (i.e. any characters
that are not in the basic multilingual plane) with the
two-character string "00
".
If input is longer than 128 characters,
truncate input, leaving only the first 128
characters.
If the first character in input is a
U+0023 NUMBER SIGN character (#), remove it.
Replace any character in input that is
not in the range U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9),
U+0041 LATIN CAPITAL LETTER A .. U+0046 LATIN CAPITAL LETTER F, and
U+0061 LATIN SMALL LETTER A .. U+0066 LATIN SMALL LETTER F with the
character U+0030 DIGIT ZERO (0).
While input's length is zero or not a
multiple of three, append a U+0030 DIGIT ZERO (0) character to input.
Split input into three strings of equal
length, to obtain three components. Let length
be the length of those components (one third the length of input).
If length is greater than 8, then remove
the leading length-8 characters in
each component, and let length be 8.
While length is greater than two and the
first character in each component is a U+0030 DIGIT ZERO (0)
character, remove that character and reduce length by one.
If length is still greater than
two, truncate each component, leaving only the first two
characters in each.
Let result be a simple
color.
Interpret the first component as a hexadecimal number; let
the red component of result be the resulting
number.
Interpret the second component as a hexadecimal number; let
the green component of result be the resulting
number.
Interpret the third component as a hexadecimal number; let
the blue component of result be the resulting
number.
Return result.
Information on exactly which attributes are subject to this algorithm is scattered throughout the spec. Here is the complete list:
<font color>
<frame bordercolor>
<frameset bordercolor>
<hr color>
<table bgcolor>
<thead bgcolor>
<tfoot bgcolor>
<tbody bgcolor>
<tr bgcolor>
<td bgcolor>
<th bgcolor>
<body text>
<body link>
<body vlink>
<body alink>
<body bgcolor>
The other big news today is the addition of a section on matching HTML elements using selectors. Some of these (:link
, :visited
, :active
) will be familiar to anyone who has written a CSS stylesheet, but there are a number of new selectors that correspond to concepts introduced in HTML 5.
:link
and :visited
match hyperlinks (<a>
, <area>
, and <link>
elements with an href
attribute).
:active
matches certain elements while they are being activated, like a button between mousedown
and mouseup
(or keydown
and keyup
)
:enabled
and :disabled
match hyperlinks and certain other elements that can be disabled, like form fields
:checked
matches checkboxes and radio buttons
:indeterminate
matches checkboxes in the indeterminate state
:default
matches default buttons in forms
:valid
and :invalid
match form fields that have constraints
:in-range
and :out-of-range
match form fields that have range-based constraints (i.e. they can either overflow or underflow)
:required
and :optional
match certain form fields
:read-write
matches editable form fields and other editable elements, and :read-only
matches any element that is not read-write
Other interesting changes of the day:
Discussion of the day: What's the problem? "Reuse of 1998 XHTML namespace is potentially misleading/wrong". Take it away, Lachlan:
I believe the issue is that the XHTML2 WG think they have change
control over that namespace URI and that we shouldn't be using it.
Additionally, the latest XHTML 2 editor's draft is now using the
namespace.
This issue has been discussed in depth around mid 2007. The problem
is that XHTML5 and XHTML2 are completely incompatible with each other
and they cannot possibly use the same namespace as each other.
But XHTML2 also has several major incompatibilities with XHTML1, which
would effectively make it impossible to implement both XHTML 1.x and 2
in the same implementation, if they share the same namespace. XHTML
5, on the other hand, has not only been designed with compatibility in
mind, success is dependent upon continuing to use the same namespace.
Basically, the only solution to this issue that should be considered is
that we continue using the namespace and the XHTML2 WG use a different
namespace.
I'm sure that will go over well with the 12 people who are still working on XHTML 2.
Tune in... well, sometime soon-ish for another exciting episode of "This Week Day In HTML 5."
Posted in Weekly Review | 6 Comments »
Tuesday, February 10th, 2009
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
The big news this week is more major work on the non-normative section on rendering HTML documents, including a lot of reverse-engineered documentation of legacy (invalid) attributes that users expect browsers to support.
- r2749:
marginwidth
and marginheight
attributes on the <body>
element
- r2750:
hspace
and vspace
attributes on <table>
- r2751: the
bgcolor
attribute
- r2752: the
<font>
element
- r2753: the
frames
and rules
attributes of <table>
- r2757: embedded content such as
<audio>
, <video>
, <embed>
, <iframe>
, and <canvas>
- r2759: laying out a group of
<frame>
s within a <frameset>
- r2760: the
<br>
element
- r2761: default margins on
<h1>
, <h2>
, <h3>
, <h4>
, <h5>
, <h6>
, and <figure>
- r2762:
<bb>
, <button>
, and <details>
elements
- r2763: the
<hr>
element (this change in particular has some WHATWG members very excited)
- r2764: the
<fieldset>
element
- r2765:
<input type=text>
- r2766:
<input type=date>
, <input type=range>
, and <input type=color>
- r2767:
<input type=checkbox>
, <input type=radio>
, <input type=file>
, <input type=submit>
, <input type=reset>
, and <input type=button>
- r2768:
<select>
, <progress>
, and <meter>
- r2769:
<textarea>
- r2770:
<mark>
- r2772: printing HTML documents
- r2773:
<link>
elements
In addition, one major section was dropped from HTML 5 this week: an algorithm for determining what object is under the cursor (presuming, of course, that the cursor is within the region of the screen which contains an HTML document, and the current context has a screen, and the current context has a cursor). Ian Hickson has announced on www-style that, in accordance with that group's consensus, the algorithm would be better maintained in a future CSS specification.
Around the web:
Tune in next week for another exciting episode of "This Week in HTML 5."
Posted in Weekly Review | 4 Comments »
Tuesday, December 30th, 2008
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
The big news this week is a major revamp of table headers, following up from the last major edits last March. Ian summarizes the most recent round of changes:
- Header cells can now themselves have headers.
- I have reversed the way the algorithm is presented, such that it starts from a cell and reports the headers rather than generating the list of headers for each cell on a header-by-header basis.
- If headers="" points to a
<td>
element, the association is set up, but I have left this non-conforming to help authors catch mistakes.
- Header cells that are automatically associating do not stop associating when they hit equivalent cells unless they have also hit a
<td>
first.
- The
"col"
and "row"
scope values now act like the implied auto value except that they force the direction.
- Empty header cells don't get automatically associated.
- I have removed the wide header cell heuristic.
- I have made headers="" use the same ID discovery mechanism as
getElementById()
, to avoid implementations having to support multiple such mechanisms.
- Finally, I have made the spec define if a header is a column header or a row header in the case where
scope=""
is omitted.
- I haven't added summary="" on table; nothing particularly new has been raised on the topic since the last times I looked at this.
Accessibility advocates are disappointed by the continued non-inclusion of the summary
attribute. Their reasoning is that "the summary
attribute is a very, very practical and useful attribute," despite their own user testing that shows otherwise. As Ian put it, "I am hesitant to include a feature like summary="" when all evidence seems to point to it being widely misused by authors and ignored by the users it intends to help." As with all issues, this is not the final word on the matter, but it's where we stand today.
In other news, r2566 addresses a very subtle issue with fetching images. The problem stems from the following (arguably pointless) markup: <img src="">
A fair number of web pages actually try to declare an image with an empty src
attribute. According to the HTTP and URL specifications, this markup means that there is an image at the same address as the HTML document -- a theoretically possible but highly unlikely scenario. Internet Explorer apparently catches this mistake and just silently drops the image. Other browsers do not; they will actually try to fetch the image, which results in a "duplicate" request for the page (once to successfully retrieve the page, and again to unsuccessfully retrieve the image).
Boris Zbarsky, a leading Mozilla developer, states
We (Gecko) have had 28 independent bug reports filed (with people bothering to create an account in the bug database, etc) about the behavior difference from IE here. That's a much larger number of bug reports than we usually get about a given issue. I can't tell you why this pattern is so common (e.g. whether some authoring frameworks produce it in some cases), but it seems that a number of web developers not only produce markup like this but notice the requests in their HTTP logs and file bugs about it.
r2566 addresses the issue by special-casing <img src>
to allow browsers to ignore an image if its fetch request would result in fetching exactly the same URL as its HTML document:
When an img is created with a src attribute, and whenever the src attribute is set subsequently, the user agent must fetch the resource specifed by the src attribute's value, unless the user agent cannot support images, or its support for images has been disabled, or the user agent only fetches elements on demand, or the element's src attribute has a value that is an ignored self-reference.
The src attribute's value is an ignored self-reference if its value is the empty string, and the base URI of the element is the same as the document's address.
Other interesting tidbits this week:
- r2568 adds a
storageArea
attribute to StorageEvent
object. [StorageEvent deficiency]
- r2556 changes the processing model of the
<meta charset>
attribute by requiring that it appear in the first 512 bytes of the document. For those of you playing along at home, <meta charset="...">
is the new <meta http-equiv="Content-Type" content="text/html; charset=...">
. Both forms are fully supported in all major browsers. [Comparing conformance requirements against real-world docs]
- r2557, r2559, r2560, r2562, r2563, and r2604 add a variety of common markup errors to the list of errors that HTML validators may treat as minor. [Re: comparing conformance requirements against real-world docs]
- r2561 allows the
height
and width
attributes on <input type="image">
, a construct that is already supported by all major browsers. [Re: comparing conformance requirements against real-world docs]
- r2601 adds an example of something that all browsers do anyway -- killing scripts that run too long.
- r2597 removes the notification API, which was kicked around in 2006 but never saw significant interest from either authors or browser vendors. [Notifications API removed]
- r2596 defines
window.close()
, window.focus()
, and window.blur()
. The focus()
and blur()
methods have historically been used to produce "pop-up" and "pop-under" windows containing advertisements. Most modern browsers now control how and whether scripts can do this, and the HTML 5 specification goes so far as to recommend that "[u]ser agents are encouraged to ignore calls to this blur()
method entirely."
- r2552 gives an example of embedding RDF metadata in XHTML. As the spec notes, this is not possible in HTML, although you could always use RDFa.
- r2595 gives an example of marking up a tag cloud.
Tune in next week for another exciting episode of "This Week in HTML 5."
Posted in Weekly Review | 12 Comments »
Thursday, December 18th, 2008
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
The big news this week is r2529, which makes so many changes that I had to ask Ian to explain it to me. This is what he said:
Someone asked for onbeforeunload
, so I started fixing it. Then I found that there was some rot in the drywall. So I took down the drywall. Then I found a rat infestation. So I killed all the rats. Then I found that the reason for the rot was a slow leak in the plumbing. So I tried fixing the plumbing, but it turned out the whole building used lead pipes. So I had to redo all the plumbing. But then I found that the town's water system wasn't quite compatible with modern plumbing techniques, and I had to dig up the entire town. And that's basically it.
"Amusing, in a quiet way," said Eeyore, "but not really helpful."
Basically, the way that scripts are defined has changed dramatically. Not in an terribly incompatible way, just a clearer definition that paves the way for better specification of certain properties of script
(and noscript
). Let's start with the new definition of a script:
A script has:
- A script execution environment
-
The characteristics of the script execution environment depend
on the language, and are not defined by this specification.
In JavaScript, the script execution environment
consists of the interpreter, the stack of execution
contexts, the global code and function code and
the Function objects resulting, and so forth.
- A list of code entry-points
-
Each code entry-point represents a block of executable code
that the script exposes to other scripts and to the user
agent.
Each Function object in a JavaScript
script execution environment has a corresponding code
entry-point, for instance.
The main program code of the script, if any, is the
initial code entry-point. Typically, the code
corresponding to this entry-point is executed immediately after
the script is parsed.
In JavaScript, this corresponds to the
execution context of the global code.
- A relationship with the script's global object
-
An object that provides the APIs that the code can use.
This is typically a Window
object. In JavaScript, this corresponds to the global
object.
When a script's global object is an
empty object, it can't do anything that interacts with the
environment.
- A relationship with the script's browsing context
-
A browsing context that is assigned responsibility
for actions taken by the script.
When a script creates and navigates a new top-level browsing
context, the opener
attribute of the new browsing context's
Window
object will be set to the script's
browsing context's Window
object.
- A character encoding
-
A character encoding, set when the script is created, used to
encode URLs. If the character encoding is
set from another source, e.g. a document's character
encoding, then the script's character encoding
must follow the source, so that if the source's changes, so does
the script's.
- A base URL
-
A URL, set when the script is created, used to
resolve relative URLs. If the base URL is
set from another source, e.g. a document base URL,
then the script's base URL must follow the source, so
that if the source's changes, so does the script's.
- Membership in a script group
-
A group of one or more scripts that are loaded in the same
context, which are always disabled as a group. Scripts in a script
group all have the same global object and browsing context.
A script group can be frozen. When a script group is
frozen, any code defined in that script group will throw an
exception when invoked. A frozen script group can be
unfrozen, allowing scripts in that script group to run
normally again.
The most interesting part of this new definition is the script group, a new concept which now governs all scripts. When a Document
is created, it gets a fresh script group, which contains all the scripts that are defined (or are later created somehow) in the document. When the user navigates away from the document, the entire script group is frozen, and browsers should not execute those scripts anymore. This sounds like an obvious statement if you think of documents as individual browser windows (or tabs), but consider the case of a document with multiple frames, or one with an embedded iframe
. Suppose that the user clicks some link within the iframe that only navigates to a new URL within the iframe (i.e. the parent document stays the same). The parent document may have some reference to functions defined in the old iframe. Should it still be able to call these functions? IE says no; other browsers say yes. HTML 5 now says no, because when the iframe navigates to a new URL, the old iframes script group is frozen -- even if there are active references to those scripts (say, from the parent document), browsers shouldn't allow the page to execute them.
The main benefit of this new concept of script groups is that it removes a number of complications faced by the non-IE browsers. For example, it prevents the problem of scripts suddenly discovering that their global object is no longer the object that they think of as the Window
object. Script groups are also frozen when calling document.open(). Freezing script groups also defines the point at which timers and other callbacks are reset, which is something that previous versions of HTML had never defined.
And after all of this ripping up and redefining, HTML 5 now defines the onbeforeunload
event, which is already supported by major browsers.
Other interesting tidbits this week:
- r2533 adds support for passing structured data between documents with
postMessage()
. [structured data discussion]
- r2536 defines the
NameCreator
, NameDeleter
, NameGetter
, NameSetter
, IndexGetter
, and IndexSetter
anonymous methods, which are used by browsers internally to manage lists of named or indexed properties (e.g. form.elements
, per-element custom data
attributes, or the pixel data of a canvas
).
- r2537 explains that you can not click something while you're already in the process of clicking it. (Technically speaking, it makes the
click()
method non-reentrant.) [nested click()
discussion]
- r2538 clarifies that non-interactive elements that are not usually focusable, but that do currently have focus (via the
tabindex
attribute), should simulate onclick
events when the user presses ENTER. This may seem like a minor point, but it is important for building keyboard-accessible web applications. [onclick discussion]
- r2539 notes that buttons (and their values) are not submitted with other form data unless they were the button that submitted the form. [button submission discussion]
- Silvia Pfeiffer posts thoughts on video accessibility and links to this collection of video accessibility requirements on the Mozilla wiki.
- Nine years in the making, the second major edition of the Web Content Accessibility Guidelines is now officially a W3C Recommendation. The guidelines are supplemented by a comprehensive techniques document, for example Using
alt
attributes on img
elements. HTML 5 also includes a section on using the alt
attribute, but in general you should defer to WCAG 2.0 because it was written by experts.
Tune in next week for another exciting episode of "This Week in HTML 5."
Posted in Weekly Review, WHATWG | 2 Comments »