Please leave your sense of logic at the door, thanks!

Usability testing HTML5

by Ian Hickson in WHATWG

Over the past few weeks, Google has been preparing and then running a usability study to test the microdata feature of HTML5.


We first created three different variants based on the original microdata proposal:

  1. One based on what the spec said (documentation)
  2. One trying to put types in an explicit itemtype="" attribute and moving "about" to item="", and replacing itemfor="" with just having multiple item=""s with the same name (documentation)
  3. One trying to remove types altogether and using item as a boolean attribute. (documentation)

Our plan was to run six studies, two for each variant, with each participant running through the following steps:

  1. Read and comment on a couple of motivating slides explaining why one would care about microdata
  2. Read the provided documentation for the variant being tested
  3. Look at and comment on the animals example with microdata (variant 2, variant 3)
  4. Exercise: try to extract the data from the "flickr" example (variant 2, variant 3)
  5. Exercise: try to annotate the blog example (variant 2, variant 3)
  6. Exercise: try to annotate the review example (variant 2, variant 3)
  7. Compare and contrast the "yelp" example with microdata to the equivalent of one of the other two variants (variant 2, variant 3)

We made some changes along the way. After the first three, it became clear that "about" was a very confusing term to use for giving the item's global identifier, and so we changed the documentation and examples to use "itemid" instead (which turned out to be much less confusing). Early on we also introduced some documentation text to explain the differences between the variants in the last exercise, because just showing them the two side by side wasn't getting us anything useful (1 to 3, 2 to 1, 2 to 3, 3 to 1).

After our sixth participant canceled on us, we decided to create a fourth variant (documentation) based on what we'd learnt with the first five, and to get two more participants to test this variant specifically. For these participants, we used the following methodology:

  1. Read and comment on a couple of motivating slides explaining why one would care about microdata
  2. Read the provided documentation for the variant being tested
  3. Look at and comment on the animals example with microdata
  4. Exercise: try to extract the data from the "flickr" example
  5. Exercise: try to extract the data from the review example
  6. Exercise: try to annotate the blog example
  7. Exercise: try to annotate the "yelp" example


Some interesting things came out of this study. First, as mentioned above, the term "about" turns out to be highly non-intuitive. I originally took the word from RDFa, on the principle that they knew more about this than I did, but our participants had a lot of trouble with that term. When we changed it to "itemid", there was a marked improvement in people's understanding of the concept.

Second, people were much less confused about types than I thought they would be. In preparing for this study I discussed microdata with a number of people, and I found that one major area of confusion was the concept of types vs the concept of properties. This is why variant 3 has no types: I wanted to find out whether people had trouble with them or not. Well, not only did people not have problems with types, several participants went out of their way to specify the type of an item, for example using the attribute name "type" instead of "item" in variant 1.

It seems that while reasoning about types at the theoretical level is somewhat confusing, it isn't so confusing that the concept should be kept out of the language. Instead, types should just be more explicitly mentioned. This is why we renamed "item" to "itemtype".

Third, people were confused by the scoping nature of the "item" attribute. Some of our participants never understood scoping at all, and most of the participants who understood the concept were still quite confused by the "item" attribute. We were encouraged, however, by one variant 1 participant's sudden enlightenment when they saw variant 3's "itemscope" attribute, and by the reaction of the variant 3 participant to the "itemscope" attribute compared to the reactions that the other two variants' participants had to their "item" attributes. This is why we split "item" into "itemtype" and "itemscope", instead of just using "itemtype".

We found that people who understood microdata's basic features also understood "itemfor", but while we were doing the study, it was pointed out on the WHATWG list that "itemfor" makes it impossible to find the properties of an item without scanning the whole document. This is why we tested the <itemref> idea in variant 4. People were at least as able to understand this as "itemfor".

In general, the changes we made for variant 4 were all quite successful. With one exception, that's what HTML5 now says. The one exception is that I hoisted the "itemid" property to an attribute like "itemtype", based on the argument that if people want to scan a document for the item with a particular "itemid", <itemref> would make it impossible to do it for the property without creating the microdata graph for the entire page.

One thing we weren't trying to test but which I was happy to see is that people really don't have any problems dealing with URLs as property names. In fact, they didn't even complain about URLs being long, which reassured me that microdata's lack of URL shortening mechanisms is probably not an issue.

Overall, this was a good and useful experience. I hope we can use usability studies to test other parts of HTML5 in the future.


(Added based on Twitter feedback.) Some people have asked to see the raw data we collected in this study. I've uploaded the raw files as they were at the end of each participant's one-hour session. This data on its own isn't especially useful; what matters is how the participants reached their conclusions. There are seven hours' worth of video to document that, but we can't publish the video online, since that would be a violation of the legal agreement we have with the participants to protect their privacy.

The study was conducted by one of Google's usability study moderators, and the participants were screened and recruited by a separate team of usability study recruiters specifically for this study. Our criteria were intended to find Web developers who were somewhat comfortable with HTML and who had at most a passing knowledge of the HTML5 effort.

Bear in mind, when looking at the raw data, that the participants had just one hour to go from not knowing about this at all, to being expected to read and write code in a new syntax, with no hints other than the examples and the documentation (which most only glanced at!).

7 Responses to “Usability testing HTML5”

  1. Hixie, did I miss something??? Who was part of the “study” (people from Google)? There was only seven people in your study (of which one didn’t show up)?

    Please tell me I am reading all this incorrectly and you are not drawing real evidence/conclusion from this “study”?

  2. Flaws:

    Videos can’t be viewed out of Google. Bias on the part of the creators of the study. Lack of outside involvement. No information about where the people taking the study are employed. Lack of diversity of demographics. Lack of proper, and neutral, oversight. Interpretation by person or persons without proper background, and neutrality. Single study, only.

    To make changes to the HTML5 specification without discussion with the HTML WG was unconscionable.

  3. I think it’s awesome that you’re doing usability studies on APIs. Usability is important to coders, too. 🙂

  4. If I understand correctly, you only used 7 people to do this test? This means the margin of error is something like +/-30% (perhaps more, this is a conservative number).
    But besides this, tests like these are non-representative of real-world cases. Consider the fact that out in the real world, people don’t get “assignments” as you have given your test subjects.
    I suggest you do real-time use-case testing for these kind of quant-analysis.