Validator.nu HTML Parser 1.2.1
Version 1.2.1 of the Validator.nu HTML Parser is now available. It fixes an incompatibility with the DOM implementation of the latest Xerces.
Version 1.2.1 of the Validator.nu HTML Parser is now available. It fixes an incompatibility with the DOM implementation of the latest Xerces.
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
In this article:
<hgroup>
elementaccesskey
attributewindow.setTimeout
and window.setInterval
functions<video>
changes<hgroup>
elementTopping our list of changes this week is the new <hgroup>
element:
The hgroup element represents the heading of a section. The element is used to group a set of h1–h6 elements when the heading has multiple levels, such as subheadings, alternative titles, or taglines.
Meanwhile, the <header>
element has been redefined:
The header element represents a group of introductory or navigational aids. A header element typically contains the section's heading (an h1–h6 element or an hgroup element), but can also contain other content, such as a table of contents, a search form, or any relevant logos.
Here is an example of how these elements can work together in marking up a specification:
<header>
<hgroup>
<h1>Scalable Vector Graphics (SVG) 1.2</h1>
<h2>W3C Working Draft 27 October 2004</h2>
</hgroup>
<dl>
<dt>This version:</dt>
<dd><a href="http://www.w3.org/TR/2004/WD-SVG12-20041027/">http://www.w3.org/TR/2004/WD-SVG12-20041027/</a></dd>
...
</dl>
</header>
Relevant background reading:
<header>
to <hgroup>
and restrict it just to supporting subheadings.<header>
element.<header>
does not introduce a new section<footer>
in <header>
since that's probably indicative of an error, so validators should probably report it.<header>
in <address>
and <footer>
.<hgroup>
element<header>
elementaccesskey
attributeNext up in this week's changes is the reintroduction and reformulation of the accesskey
attribute. In HTML 4, the accesskey
attribute allows the web designer to define keyboard shortcuts for frequently-used links or form fields. In HTML 5,
All elements may have the accesskey content attribute set. The accesskey attribute's value is used by the user agent as a guide for creating a keyboard shortcut that activates or focuses the element.
If the accesskey
attribute is used on a non-link, non-form-field element, it defines a command, which has a specific meaning in HTML 5.
Also new in HTML 5: the accesskey
attribute may contain a number of shortcuts, space-separated, and the new .accessKeyLabel
DOM property contains the shortcut key that the browser ultimately chose.
One possible enhancement, not in HTML 5 but under consideration for HTML 6, is the use of more-than-1-character strings to define roles, such as accesskey="help"
. The browser could then choose the appropriate shortcut key based on the user's platform and preferences.
I plan to write up a more detailed history of the accesskey
attribute in a seperate article. Until then, here is some background reading:
accesskey
.accesskey
attributeaccesskey
attribute to define a commandwindow.setTimeout
and window.setInterval
functionsThe window.setTimeout
and window.setInterval
functions have been in a state of limbo in the HTML 5 spec, waiting for an editor to take them and split them out into a separate spec. No editor has come forward, so back into HTML 5 they go.
These timer functions are complicated by their unique history in browser-land. They can take basically anything as their first argument. If you pass a function, it will be executed after the specified interval. If you pass anything else, the browser will call toString()
on the parameter and then evaluate it as a JavaScript expression in the context of the current window (or, if the timer function is called from a web worker, the current WorkerUtils
object). There is also a little-known but widely supported third argument to setTimeout
and setInterval
, which passes arguments to the evaluated expression. Meanwhile, the second argument -- the timeout value -- can also be any datatype. Browsers must call toNumber(toString(timeout))
and round down to the nearest integer.
window.setTimeout
and window.setInterval
.<video>
changesIt seems that each week in HTML 5 brings more changes to the <video>
element. While this is not strictly true, it is certainly true this week.
error
events at the <source>
element when it fails to load. But never mind that, because it didn't solve the problem, so it was overridden by...<source>
when using <source>
elements, or once on <video>
if the UA gave up trying to load the video. [Media elements may never load, may load the wrong source, etc]video.startTime
, which returns the earliest possible position. [Start position of media resources and ensuing discussion]a_canvas_element.getContext('2d').createPattern(a_video_element)
. [Re: [canvas] Using HTMLVideoElement with createPattern()]Speaking of events, there was a series of event-related checkins this week. The onundo
and onredo
events, usually triggered by the user selecting the Undo
or Redo
item from the Edit menu, have been moved from the Document
to the Window
. [3003] These events are important for all sorts of web applications (think Google Docs and then work your imagination outward).
r3004 adds support for the onbeforeprint
and onafterprint
events, which are supported in Microsoft Internet Explorer since version 5.
r3005 updates the global list of event handlers to include these new events, some video-related events, some storage-related events, and several others that have slipped through the cracks during the thrashing of these features.
document.cookie
lock mentioned in episode 28)..protocol
, .host
, .hostname
, .port
, .pathname
, .search
, and .hash
properties to <a>
and <area>
elements.<h1>
elements depends on how deep it is nested within nested <section>
elements.document.lastModified
should return the current time.<time datetime="valid-datetime"></time>
should display the datetime value in the user's timezone and locale.<input type=tel>
Tune in next week for another exciting episode of "This Week in HTML 5."
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
This big news this week is the <datagrid>
element. This is a brand spanking new element introduced in r2962.
In the
datagrid
data model, data is structured as a set of rows representing a tree, each row being split into a number of columns. The columns are always present in the data model, although individual columns might be hidden in the presentation.Each row can have child rows. Child rows may be hidden or shown, by closing or opening (respectively) the parent row.
Rows are referred to by the path along the tree that one would take to reach the row, using zero-based indices. Thus, the first row of a list is row "0", the second row is row "1"; the first child row of the first row is row "0,0", the second child row of the first row is row "0,1"; the fourth child of the seventh child of the third child of the tenth row is "9,2,6,3", etc.
The chains of numbers that give a row's path, or identifier, are represented by arrays of positions, represented in IDL by the
RowID
interface.The root of the tree is represented by an empty array.
Each column has a string that is used to identify it in the API, a label that is shown to users interacting with the column, a type, and optionally an icon.
The possible types are as follows:
Keyword Description text
Simple text. editable
Editable text. checkable
Text with a check box. list
A list of values that the user can switch between. progress
A progress bar. meter
A gauge. custom
A canvas onto which arbitrary content can be drawn. Each column can be flagged as sortable, in which case the user will be able to sort the view using that column.
Columns are not necessarily visible. A column can be created invisible by default. The user can select which columns are to be shown.
When no columns have been added to the
datagrid
, a column with no name, whose identifier is the empty string, whose type istext
, and which is not sortable, is implied. This column is removed if any explicit columns are declared.Each cell uses the type given for its column, so all cells in a column present the same type of information.
The other major change to the spec this week is the <keygen>
element. As I mentioned in episode 12, someone went to the trouble of documenting the <keygen>
element, and there has been a surprising amount of discussion about it in the past six months. Simply put, the keygen element represents a key-pair generator control. You include it in a <form>
. When your browser submits the form, the private key is stored in the local keystore, and the public key is packaged and sent to the server. [r2960]
Not much else went into the spec this week, but there's been a lot of interesting activity around the web.
Tune in next week for another exciting episode of "This Week in HTML 5."
Welcome back to my semi-regular column, "The Road to HTML 5," where I'll try to explain some of the new elements, attributes, and other features in the upcoming HTML 5 specification.
The feature of the day is link relations.
In this article:
Regular links (<a href>
) simply point to another page. Link relations are a way to explain why you're pointing to another page. They finish the sentence "I'm pointing to this other page because..."
And so on. HTML 5 breaks link relations into two categories:
Two categories of links can be created using the link element. Links to external resources are links to resources that are to be used to augment the current document, and hyperlink links are links to other documents. ...
The exact behavior for links to external resources depends on the exact relationship, as defined for the relevant link type.
Of the examples I just gave, only the first (rel=stylesheet) is a link to an external resource. The rest are hyperlinks to other documents. You may wish to follow those links, or you may not, but they're not required in order to view the current page.
Common link relations include <link rel=stylesheet>
(for importing CSS rules) and <link rel=alternate type=application/atom+xml>
(for Atom feed autodiscovery). HTML 4 defines several link relations; others have been defined by the microformats community. HTML 5 attempts to consolidate all the known link relations, clean up their definitions (if necessary), and then provide a central registry for future proposals.
Most often, link relations are seen on <link>
elements within the <head>
of a page. Some link relations can also be used on <a>
elements, but this is uncommon even when allowed. HTML 5 also allows some relations on <area>
elements, but this is even less common. (HTML 4 did not allow a rel
attribute on <area>
elements.)
See the full chart of link relations to check where you can use specific rel
values.
Link relations were added to the HTML 5 spec in November 2006. (Back then the spec was still called "Web Applications 1.0.") r319 kicked off a flurry of rel
-related activity. The original additions were primarily based on research of existing web content in December 2005, using Google's cache of the web at the time. Since then, other relations have been added, and a few have been dropped.
rel=alternate has always been a strange hybrid of use cases, even in HTML 4. In HTML 5, its definition has been clarified and extended to more accurately describe existing web content. For example, using rel=alternate
in conjunction with the type
attribute indicates the same content in another format. Using rel=alternate
in conjunction with type=application/rss+xml
or type=application/atom+xml
indicates an RSS or Atom feed, respectively.
HTML 5 also puts to rest a long-standing confusion about how to link to translations of documents. HTML 4 says to use the lang
attribute in conjunction with rel=alternate
to specify the language of the linked document, but this is incorrect. The HTML 4 Errata lists four outright errors in the HTML 4 spec (along with several editorial nits); one of these outright errors is how to specify the language of a document linked with rel=alternate
(The correct way, described in the HTML 4 Errata and now in HTML 5, is to use the hreflang
attribute.) Unfortunately, these errata were never re-integrated into the HTML 4 spec, because no one in the W3C HTML Working Group was working on HTML anymore.
rel=alternate
added to HTML 5media
attribute in conjunction with rel=alternate
title
attribute required for rel="alternate stylesheet"
.New in HTML 5
rel=archives "indicates that the referenced document describes a collection of records, documents, or other materials of historical interest. A blog's index page could link to an index of the blog's past posts with rel="archives"."
New in HTML 5
rel=author
is used to link to information about the author of the page. This can be a mailto:
address, though it doesn't have to be. It could simply link to a contact form or "about the author" page.
rel=author is equivalent to the rev=made
link relation defined in HTML 3.2. Despite popular belief, HTML 4 does not include rev=made
, effectively obsoleting it. (You can search the entire spec for the word "made" if you don't believe me.)
Given that rev=made
was the only significant non-typo usage of the rev
attribute, HTML 5 added rel=author
to make up for the loss of rev=made
in HTML 4, thus allowing the working group to obsolete the rev
attribute altogether. Other than the un/semi/sortof-documented rev=made
value, people typo the "rev" attribute more often than they intentionally use it, which suggests that the world would be better off if validators could flag it as non-conforming.
The decision to drop the rev
attribute seems especially controversial. The same question flares up again and again on the working group's mailing list: "what happened to the rev
attribute?" But in the face of almost-universal misunderstanding (among people who try to use it) and apathy (among everyone else), no one has ever made a convincing case for keeping it that didn't boil down to "I wish the world were different." Hey, so do I, man. So do I.
rel=author
added to HTML 5New in HTML 5
rel=external "indicates that the link is leading to a document that is not part of the site that the current document forms a part of." I believe it was first popularized by WordPress, which uses it on links left by commenters. I could not find any discussion of it in the HTML working group mailing list archives. Both its existence and its definition appear to be entirely uncontroversial.
New in HTML 5, but may not be long for this world
rel=feed "indicates that the referenced document is a syndication feed." Right away, you're thinking, "Hey, I thought you were supposed to use rel=alternate type=application/atom+xml
to indicate that the referenced document is a syndication feed." In fact, that's what everyone does, and that's what all browsers support. Firefox 3 is the only browser that supports rel=feed
. (It also supports rel=alternate type=application/atom+xml
.) The rel=feed
variant was proposed in the Atom working group in 2005 and somehow found its way into HTML 5. Just yesterday, I was discussing whether HTML 5 should drop rel=feed due to lack of browser implementation and complete and utter lack of author awareness.
rel=feed
added to HTML 5HTML 4 defined rel=start
, rel=prev
, and rel=next
to define relations between pages that are part of a series (like chapters of a book, or even posts on a blog). The only one that was ever used correctly was rel=next
. People used rel=previous
instead of rel=prev
; they used rel=begin
and rel=first
instead of rel=start
; they used rel=end
instead of rel=last
. Oh, and -- all by themselves -- they made up rel=up
to point to a "parent" page.
HTML 5 includes rel=first
, which was the most variation of the different ways to say "first page in a series." (rel=start
is a non-conforming synonym, for backward compatibility.) Also rel=prev
and rel=next
, just like HTML 4 (but mentioning rel=previous
for back-compat). It also adds rel=last
(the last in a series, mirroring rel=first
) and rel=up
.
The best way to think of rel=up
is to look at your breadcrumb navigation (or at least imagine it). Your home page is probably the first page in your breadcrumbs, and the current page is at the tail end. rel=up
points to the next-to-the-last page in the breadcrumbs.
rel=first/prev/next/last
added to HTML 5rel=up
added to HTML 5rel=first/prev/next/last
refer to any sequence of pages, not just a hierarchical structure.up
keyword in a single rel
attribute.New in HTML 5
rel=icon is the second most popular link relation, after rel=stylesheet
. It is usually found together with shortcut
, like so:
<link rel="shortcut icon" href="/favicon.ico">
All major browsers support this usage to associate a small icon with the page (usually displayed in the browser's location bar next to the URL).
Also new in HTML 5: the sizes
attribute can be used in conjunction with the icon
relationship to indicate the size of the referenced icon. [sizes
example]
sizes
)rel=icon
added to HTML 5sizes
attribute, and r1559 adds an exampleNew in HTML 5
rel=license was invented by the microformats community. It "indicates that the referenced document provides the copyright license terms under which the current document is provided."
rel=license
added to HTML 5New in HTML 5
rel=nofollow "indicates that the link is not endorsed by the original author or publisher of the page, or that the link to the referenced document was included primarily because of a commercial relationship between people affiliated with the two pages." It was invented by Google and standardized within the microformats community. The thinking was that if "nofollow" links did not pass on PageRank, spammers would give up trying to post spam comments on weblogs. That didn't happen, but rel=nofollow
persists. Many popular blogging systems default to adding rel=nofollow
to links added by commenters.
New in HTML 5
rel=noreferrer "indicates that the no referrer information is to be leaked when following the link." No browser currently supports this. [rel=noreferrer test case]
rel=noreferrer
added to HTML 5rel=noreferrer
to also blow away the 'opener' when used with target=_blankNew in HTML 5
rel=pingback specifies the address of a "pingback" server. As explained in the Pingback specification, "The pingback system is a way for a blog to be automatically notified when other Web sites link to it. ... It enables reverse linking -- a way of going back up a chain of links rather than merely drilling down."
Blogging systems, notably WordPress, implement the pingback mechanism to notify authors that you have linked to them when creating a new blog post.
rel=pingback
added to HTML 5New in HTML 5
rel=prefetch "indicates that preemptively fetching and caching the specified resource is likely to be beneficial, as it is highly likely that the user will require this resource." Search engines sometimes add <link rel=prefetch href="URL of top search result">
to the search results page if they feel that the top result is wildly more popular than any other. For example: using Firefox, search Google for CNN; view source; search for the keyword "prefetch".
Mozilla Firefox is the only current browser that supports rel=prefetch
.
rel=prefetch
added to HTML 5rel=prefetch
New in HTML 5
rel=search "indicates that the referenced document provides an interface specifically for searching the document and its related resources." Specifically, if you want rel=search
to do anything useful, it should point to an OpenSearch document that describes how a browser could construct a URL to search the current site for a given keyword.
OpenSearch (and rel=search
links that point to OpenSearch description documents) is supported in Microsoft Internet Explorer since version 7 and Mozilla Firefox since version 2.
rel=search
added to HTML 5New in HTML 5
rel=sidebar "indicates that the referenced document, if retrieved, is intended to be shown in a secondary browsing context (if possible), instead of in the current browsing context." What does that mean? In Opera and Mozilla Firefox, it means "when I click this link, prompt the user to create a bookmark that, when selected from the Bookmarks menu, opens the linked document in a browser sidebar." (Opera actually calls it the "panel" instead of the "sidebar.")
Internet Explorer, Safari, and Chrome ignore rel=sidebar
and just treat it as a regular link. [rel=sidebar test case]
rel=sidebar
added to HTML 5New in HTML 5
rel=tag "indicates that the tag that the referenced document represents applies to the current document." Marking up "tags" (category keywords) with the rel
attribute was invented by Technorati to help them categorize blog posts. Early blogs and tutorials thus referred to them as "Technorati tags." (You read that right: a commercial company convinced the entire world to add metadata that made the company's job easier. Nice work if you can get it!) The syntax was later standardized within the microformats community, where it was simply called "rel=tag".
Most blogging systems that allow associating categories, keywords, or tags with individual posts will mark them up with rel=tag
links. Browsers do not do anything special with them, but they're really designed for search engines to use as a signal of what the page is about.
rel=tag
added to HTML 5rel=contact
was briefly part of HTML 5, but r1711 removed it because it conflicted with the same-named XFN relationship.
There seems to be an infinite supply of ideas for new link relations. In an attempt to prevent people from just making shit up, the WHATWG maintains a registry of proposed rel
values and defines the process for getting them accepted.
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
There has been very little spec-related activity this week, so I will briefly repeat Ian Hickson's request to Help us review HTML5 and then turn to a fascinating debate happening right now on the WHATWG mailing list.
The debate revolves around perceptions and expectations of privacy. Brady Eidson (Apple/WebKit) kicks off the discussion with Private browsing vs. Storage and Databases:
A commonly added feature in browsers these days is "private browsing mode" where the intention is that the user's browsing session leaves no footprint on their machine. Cookies, cache files, history, and other data that the browser would normally store to disk are not updated during these private browsing sessions.
This concept is at odds with allowing pages to store data on the user's machine as allowed by LocalStorage and Databases. Sur[e]ly persistent changes during a private browsing session shouldn't be written to the user's disk as that would violate the intention of a private browsing session. ...
- Disable LocalStorage completely when private browsing is on. Remove it from the DOM completely.
- Disable LocalStorage mostly when private browsing is on. It exists at window.localStorage, but is empty and has a 0-quota.
- Slide a "fake" LocalStorage object in when private browsing is enabled. It starts empty, changes to it are successful, but it is never written to disk. When private browsing is disabled, all changes to the private browsing proxy are thrown out.
- Cover the real LocalStorage object with a private browsing layer. It starts with all previously stored contents. Any changes to it are pretended to occur, but are never written to disk. When private browsing is disabled, all items revert to the state they were in when private browsing was enabled and writing changes to disk is re-enabled.
- Treat LocalStorage as read-only when private browsing is on. It exists, and all previously stored contents can be retrieved. Any attempt to setItem(), removeItem(), or clear() fail.
Ian Fette (Google/Chrome) explains how Google Chrome handles LocalStorage in "incognito" mode:
[W]hilst the [incognito] session is active, pages can still use a database / local storage / ... / and at the end of the session, when that [temporary] profile is deleted, things will go away. I personally like that approach, as there may be legitimate reasons to want to use a database even for just a single session.
Darin Fisher (Google/Chrome) follows up to clarify Google Chrome's behavior:
Chrome's "incognito mode" means -- is defined as -- starting from a clean slate (as if you started browsing for the first time on a new computer), and when you exit incognito mode, the accumulated data is discarded. That's all there is to it. The behavior of LocalStorage and Database in this mode is deduced easily from that definition.
Jonas Sicking (Mozilla/Firefox) explains his opposition to option 5:
My concern with this is the same as the reason we in firefox clear all cookies when entering private browsing mode. The concern is as follows:
- A search engine stores a user-id token in a cookie. They then use this token to server side store the users 10 last searches.
- A user uses this search engine to search for various items. Doing this causes the user-id token to be stored in a cookie.
- The user then switches to private browsing mode.
- The user makes a search for a present for his wife.
- The user switches back into normal browsing mode.
At this point it is still possible to see the search for the wifes present in the websites store of recent searches.
Something very similar could happen for localStorage I would imagine, where the user-identifing information is stored in the localStorage rather than a cookie.
Josh "timeless" Soref (core Firefox developer) explores the privacy implications of different options:
[Option 1: Disabling LocalStorage won't work because] Many sites will just assume that they know a given useragent supports localstorage, so they'll be surprised and break. This will mean that a user can't use certain sites.
[Option 2: Enabling LocalStorage with 0 quota] will enable sites to know that the user is browsing in private, which is probably also a violation of the user's trust model. If I were to be browsing in private, I wouldn't want most sites to know that I'm doing this.
[Option 4 or 5: Starting with existing LocalStorage data] means the site will know who you are (on average), and is almost certainly never what the user wants.
Jonas Sicking (Mozilla/Firefox) tentatively states
For what it's worth, I believe we're currently planning on doing 2 in firefox.
Brady Eidson concludes:
I strongly share Jonas' concern that we'd tell web applications that we're storing there data when we already know we're going to dump it later. For 3 and 4 both, we're basically lying to the application and therefore the user.
... So far I'm standing by WebKit choosing #5 for now.
Drew Wilson summarizes his thoughts on the matter:
I think the #1 goal for incognito mode has to be "maximum compatibility" -- let sites continue to work, which kills options #1 & 2. A secondary goal for incognito mode would be "don't let sites know the user is in incognito mode" -- this kills approach #1 and #5, and possibly #2 (depending on whether there are significant non-incognito use cases that also have 0 local storage quota).
For my part, I agree with Drew, and I would add this: I use Google Chrome's "incognito mode" quite frequently when I'm developing websites. It's an easy way to test from a "blank slate" with no cookies and no cache, and it's much easier than juggling multiple profiles. If data in my LocalStorage "bleeds" into incognito mode, this use case would become unreliable and web development would be harder for me. (Bil Corry makes this point too.)
On a more philosophical level, it's nobody's business that I'm in private browsing mode. (Scott Hess makes this point too.) If authors can detect it, I consider that a serious bug. (Imagine the ha.ckers.org headline: "Safari Hole Allows Sites To Detect 'Private' Browsing, Punish Users.") Even worse, if LocalStorage could be used as a "super-cookie" for less-than-honorable sites to track me from normal usage to incognito usage, then it's not really "private browsing" in any sense of the word that matters.
In the early days of Greasemonkey, there were discussions of whether Greasemonkey should send or provide some detectable signal to page authors that Greasemonkey was running and the user had active scripts modifying the current page. To which I replied:
If Greasemonkey makes any overtures towards allowing web publishers to "opt out" or override my browsing experience in any way, I will immediately fork it and make it my life's mission to maintain the fork as long as possible.
Tune in next week for another exciting episode of "This Week in HTML 5."