Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
The big news this week is the beginning of the non-normative section on rendering HTML documents. For those of you not up on spec-writing lingo, "non-normative" means "you can ignore this and still claim to be in compliance with the specification." It's advice, not commands. On the other hand, it's generally useful advice, so ignoring it completely is probably not in your best interests.
Currently, the rendering section includes advice on
- Hidden elements. Things like
<script>
should always be hidden (in the sense that they should be executed, not have their source displayed in the page). Likewise, <meta>
, <link>
, <style>
, and so on.
- Display types. Which elements should be rendered as block-level elements, which as tables, which as list items, and so on.
- Margins and padding. Default values for different elements, and also for the same element in different contexts (nested within other elements).
- Alignment. Table headers and captions are centered by default;
<table align=left>
is treated like float:left
; etc.
- Fonts and colors. By default, links are blue, visited links are purple, and
<code>
is rendered in a monospace font.
- Punctuation and decorations. Links are underlined by default, acronyms are dotted-underlined, and
<blink>
, well, blinks.
- Resetting rules for inherited properties. Tables reset certain text properties; in quirks mode, they reset even more.
Scrolling through the rest of the (mostly empty) rendering section shows lots of potential for future advice on form controls, data grids, favicons, and even the <marquee>
element.
Rendering-related revisions: r2734, r2735, r2736, r2737, r2738.
Switching back to the normative parts of the spec, we have r2720, which makes the outerHTML
property and the insertAdjacentHTML()
method work in XHTML. For the purposes of this discussion — indeed, for the purposes of the entire HTML 5 specification — "XHTML" means "content served with a Content-Type: application/xhtml+xml
". In addition, the section The XHTML Syntax has been entirely reorganized and rewritten to consolidate the rules for parsing and serializing XHTML documents and fragments. [Background: Re: outerHTML/insertAdjacentHTML in XML mode]
Other interesting tidbits this week:
- r2712 mandates that browsers ignore any extraneous text on the first line of an application cache manifest file (after the file signature "CACHE MANIFEST"), to accomodate hard-core web authors who edit their manifest files manually in Emacs and want to include mode lines on the first line of the file.
- r2719 specifies that browsers should not allow scripts to set
document.domain
to anything on the Public Suffix List, such as "com" or "co.jp". Essential background reading on why this is dangerous: Untraceable XSS Attacks. Most browsers already block this attack, e.g. Firefox since 3.0. [Background: Re: Setting document.domain]
- r2711 addresses some security issues surrounding scripts that open windows with an address of
about:blank
.
- r2731 requires that floats be serialized using exponential notation, e.g.
1e+0
. [Background: Floating point number feedback]
- r2725 is another in a long and mostly boring saga surrounding the concept of a "legacy DOCTYPE." The official DOCTYPE of HTML 5 is simply
<!DOCTYPE HTML>
-- so simple, in fact, that some tools can not generate it. Bug 54 tracks the issue to the point of obsession; I won't go into details here, but the issue has been bounced around since at least June 2008. I doubt this will be the last we hear about legacy DOCTYPEs. [More background: ISSUE-54: <!DOCTYPE HTML SYSTEM "about:legacy-compat">]
Tune in next week for another exciting episode of "This Week in HTML 5."
Posted in Weekly Review | 3 Comments »
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group. Despite it being almost February already, this episode will focus on changes and discussion from the week of January 19th. Normal weekly updates will resume on Monday.
There are 3 pieces of big news for the week of January 19th. Big news #1: r2692, a major revamp of the way application caches are defined. Application caches are the heart of the offline web model which can be used to allow script-heavy web applications like Gmail to work even after you disconnect from the internet. Here is the new definition of how application caches work:
Each application cache has a completeness flag, which is either complete or incomplete.
An application cache group is a group of application caches, identified by the absolute URL of a resource manifest which is used to populate the caches in the group.
An application cache is newer than another if it was created after the other (in other words, application caches in an application cache group have a chronological order).
Only the newest application cache in an application cache group can have its completeness flag set to incomplete, the others are always all complete.
Each application cache group has an update status, which is one of the following: idle, checking, downloading.
A relevant application cache is an application cache that is the newest in its group to be complete.
Each application cache group has a list of pending master entries. Each entry in this list consists of a resource and a corresponding Document
object. It is used during the update process to ensure that new master entries are cached.
An application cache group can be marked as obsolete, meaning that it must be ignored when looking at what application cache groups exist.
A Document
initially is not associated with an application cache, but steps in the parser and in the navigation sections cause cache selection to occur early in the page load process.
Multiple application caches in different application cache groups can contain the same resource, e.g. if the manifests all reference that resource.
The end result of this major work is actually pretty similar to how application caches worked before, but there were some edge cases (such as handling 404 errors when fetching the application manifest) which are now handled in a sane fashion. It also paved the way for r2693, which makes it possible for application caches to become "obsolete" (meaning they must be ignored when deciding which caches exist).
Big news #2: r2684, which redefines the on*
attributes in a way that doesn't suck quite as much. Also, it defines the widely used (but poorly understood) onerror
attribute in a way that matches what browsers actually do with it. Here is the meat of it:
All event handler attributes on an element, whether set to null
or to a Function
object, must be registered as event listeners on the
element, as if the addEventListenerNS()
method on the Element
object's EventTarget
interface had been invoked when the event handler attribute's
element or object was created, with the event type (type argument) equal to the type
described for the event handler attribute in the list above, the
namespace (namespaceURI
argument) set to null, the listener set to be a target and bubbling
phase listener (useCapture
argument set to false), the event group set to the default group
(evtGroup argument set to
null), and the event listener itself (listener argument) set to do
nothing while the event handler attribute's value is not a
Function
object, and set to invoke the call()
callback of the
Function
object associated with the event handler
attribute otherwise.
The listener
argument is emphatically not the event handler attribute
itself.
When an event handler attribute's Function
objectw
is invoked, its call()
callback must be invoked with one argument, set to the
Event
object of the event in question.
The handler's return value must then be processed as follows:
- If the event type is
mouseover
If the return value is a boolean with the value true, then
the event must be canceled.
- If the event object is a
BeforeUnloadEvent
object
If the return value is a string, and the event object's
returnValue
attribute's value is the empty string, then set the returnValue
attribute's value to the return value.
- Otherwise
If the return value is a boolean with the value false, then
the event must be canceled.
The Function
interface represents a function in the
scripting language being used. It is represented in IDL as
follows:
[Callback=FunctionOnly, NoInterfaceObject]
interface Function {
any call([Variadic] in any arguments);
};
The call(...)
method is the object's callback.
In JavaScript, any Function
object implements this interface.
Big news #3: r2685 and r2686 defines a whole slew of important events that are fired on the Window
object, including onbeforeunload
, onerror
, and onload
. Previously, some of these were defined on the <body>
element, which didn't actually match current browser behavior.
The following are the event handler attributes that must be
supported by Window
objects, as DOM attributes on the
Window
object, and with corresponding content
attributes and DOM attributes exposed on the body
element:
onbeforeunload
Must be invoked whenever a beforeunload
event is targeted at or bubbles
through the element or object.
onerror
-
Must be invoked whenever an error
event is targeted at or bubbles
through the object.
Unlike other event handler attributes, the onerror
event handler attribute can
have any value. The initial value of onerror
must be
undefined
.
The onerror
handler is also used for reporting script errors.
onhashchange
Must be invoked whenever a hashchange
event is targeted at or bubbles
through the object.
onload
Must be invoked whenever a load
event is targeted at or bubbles
through the object.
onmessage
Must be invoked whenever a message
event is targeted at or bubbles
through the object.
onoffline
Must be invoked whenever a offline
event is targeted at or bubbles
through the object.
ononline
Must be invoked whenever a online
event is targeted at or bubbles
through the object.
onresize
Must be invoked whenever a resize
event is targeted at or bubbles
through the object.
onstorage
Must be invoked whenever a storage
event is targeted at or bubbles
through the object.
onunload
Must be invoked whenever an unload
event is targeted at or bubbles
through the object.
Other interesting tidbits from the week of January 19th:
- r2683 defines the concept of an override URL in order to prevent
javascript:
URLs (which you should never, ever use) from breaking through the cross-domain origin security policy.
- r2697 provides an algorithm for determining the character encoding of an external script referenced by a
<script>
element.
- r2698 clarifies that
rel
attributes are case-insensitive.
- r2703 tweaks the parsing algorithm of the misplaced
<frameset>
elements to be more compatible with Internet Explorer.
Tune in next week for another exciting episode of "This Week in HTML 5."
Posted in Weekly Review | 3 Comments »
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group. Despite it being almost February already, this episode will focus on changes and discussion from the week of January 12th. (Very little constructive progress was made between the 1st and the 12th.) I'll follow up with another summary for the week of January 19th, then I'll resume normal weekly updates on Monday.
The big news for the week of January 12th is r2633, which defines the indeterminate
attribute on form controls.
The input
element represents a two-state control
that represents the element's checkedness state. If the
element's checkedness state
is true, the control represents a positive selection, and if it is
false, a negative selection. If the element's indeterminate
DOM attribute
is set to true, then the control's selection should be obscured as
if the control was in a third, indeterminate, state.
The control is never a true tri-state control, even
if the element's indeterminate
DOM attribute
is set to true. The indeterminate
DOM attribute
only gives the appearance of a third state.
Internet Explorer and Safari already support the indeterminate attribute.
The other news I want to highlight this week -- just because this sort of weirdness tickles me -- is r2616, which tries to tackle the following difficult situation:
- A user fills out a form and presses the submit button.
- The browser POSTs a form and begins parsing the response.
- In the course of parsing the response document, it encounters a
<meta charset>
attribute that is different from the encoding defined or inferred from the HTTP headers.
If this were the response from a GET request, the browser might re-request the page so it could restart parsing with the new character encoding. But doing that after a POST would be like double-submitting the form, which could have serious consequences on the back end. (Technically, the difference is between idempotent operations like GET and non-idempotent operations like pretty much everything else.) So browsers should never re-request the page after a POST, and they'll just have to muddle through as best they can with the character encoding they have. [bug 6258]
Other interesting tidbits that week:
- r2672 states that relative URLs in CSS in HTML documents are not reresolved when the base URL of the HTML document changes. This is just one of those incredibly weird things that web authors do that just makes you want to strangle them en-masse and scream "DON'T DO THAT!" But they do do it -- God knows why -- and it has been a source of pain for browser vendors because no standard has ever defined what they should do in this situation, so naturally they all do something different. [Background: Issues concerning the <base> element]
- r2674 defines a way for web authors to include documentation inside
<script src>
elements -- that is, inline documentation for external scripts.
- r2679 adds a number of browser interface elements to the HTML 5 spec, including
window.locationbar
, window.menubar
, window.personalbar
, window.scrollbars
, window.statusbar
, and window.toolbar
. These previous undefined properties are already supported by all major browsers except Internet Explorer.
Tune in... er, in a few hours... for another exciting episode of "This Week in HTML 5."
Posted in Weekly Review | 2 Comments »
The previous post discussed how to enable styling of the new HTML5 elements in IE by using a simple script. However, if the user has scripting disabled, the layout would probably fall apart badly.
So that means if you care about IE users with scripting disabled, you can't use the new elements, right?
Not necessarily.
There are some tricks to work around the broken DOM and limited styling in IE:
- Know what the DOM looks like and target other elements than the new elements for styling.
- Use the universal selector (
*
) to target the right element.
- Use
noscript
.
What does this mean?
Target other elements for styling
Consider you have the following markup:
<body>
<article>
...
</article>
<nav>
<ul>
...
</ul>
</nav>
</body>
Instead of doing this:
* { margin:0; padding:0 }
body { background:silver }
article { border:solid; background:white; margin-left:10em }
nav { position:absolute; top:0; left:0; width:10em }
...do this:
* { margin:0; padding:0 }
html { background:silver }
body { border:solid; background:white; margin-left:10em }
ul { position:absolute; top:0; left:0; width:10em }
Now of course you're going to use other ul
elements than the navigation, so how do we get more specific on which element to target? The obvious solution is to set a class or id on the ul
element, but there's another solution which might be more convenient in some cases, which brings me to...
Using the universal selector
Depending on the situation, and whether you care about IE6 or not, you can use the universal selector to target the element you want.
Consider you have the following markup:
<body>
<article>
<header>
<h1>...</h1>
<p>...</p>
</header>
...
...and you want to style the p
that is in header
, you would do this in the normal case:
article header p { font-weight:bold }
But in IE, the article
, header
, h1
and p
elements are all siblings, so the selector wouldn't match.
So then one would expect this to match, but it doesn't (IE doesn't allow selecting unknown elements using type selectors):
article + header + h1 + p { font-weight:bold }
However, this matches:
body > * + * + h1 + p { font-weight:bold }
Using noscript
The above techniques shouldn't mess up other browsers (or IE when scripting is enabled), however if you prefer (or if something would screw up) you can use a separate style sheet for IE when scripting is disabled by just using the following markup:
<head>
<!--[if IE]>
<noscript><link rel="stylesheet" href="ie-noscript.css"></noscript>
<![endif]-->
...
Conclusion
The above techniques might not be very scalable or might well impact maintanence, but the point of this article is to show that it is possible to use the new elements while still supporting IE with scripting disabled.
Posted in Browsers, DOM, Elements | 11 Comments »
Internet Explorer poses a small challenge when it comes to making use of the new elements introduced in HTML5. Among others, these include elements like section
, article
, header
and footer
. The problem is that due to the way parsing works in IE, these elements are not recognised properly and result in an anomalous DOM representation.
To illustrate, consider this simple document fragment:
<body>
<section>
<p>This is an example</p>
</section>
</body>
Strangely, IE 6, 7 and 8 all fail to parse the section
element properly and the resulting DOM looks like this.
BODY
SECTION
P
#text
: This is an example
/SECTION
Notice how IE actually creates 2 empty elements. One named SECTION
and the other named /SECTION
. Yes, it really is parsing the end tag as a start tag for an unknown empty element.
There is a handy workaround available to address this problem, which was first revealed in a comment by Sjoerd Visscher.
The basic concept is that by using document.createElement(tagName)
to create each of the unknown elements, the parser in IE then recognises those elements and parses them in a more reasonable and useful way. e.g. By using the following script:
document.createElement("section");
The resulting DOM for the fragment given above looks like this:
BODY
section
P
#text
: This is an example
This same technique works for all unknown elements in IE 6, 7 and 8. Note that there is a known bug that prevented this from working in IE 8 beta 2, but this has since been resolved in the latest non-public technical preview.
For convenience, Remy Sharp has written and published a simple script that provides this enhancement for all new elements in the current draft of HTML5, which you can download and use.
This script is not needed for other browsers. Opera 9, Firefox 3 and Safari 3 all parse unknown elements in a more reasonable way by default. Note, however, that Firefox 2 does suffer from some related problems, for which there is unfortunately no known solution; but it is hoped that given the faster upgrade cycle for users of Firefox, relatively speaking compared with IE, Firefox 2 won't pose too much of a problem in the future.
Posted in Browsers, DOM, Elements, Events, Syntax | 25 Comments »