The WHATWG Blog — Ian Hickson

Author Archive

DRM and Web security

Wednesday, September 21st, 2016

For a few years now, the W3C has been working on a specification that extends the HTML standard to add a feature that literally, and intentionally, does nothing but limit the potential of the Web. They call this specification "Encrypted Media Extensions" (EME). It's essentially a plug-in mechanism for proprietary DRM modules.

Much has been written on how DRM is bad for users because it prevents fair use, on how it is technically impossible to ever actually implement, on how it's actually a tool for controlling distributors, a purpose for which it is working well (as opposed to being to prevent copyright violations, a purpose for which it isn't working at all), and on how it is literally an anti-accessibility technology (it is designed to make content less accessible, to prevent users from using the content as they see fit, even preventing them from using the content in ways that are otherwise legally permissible, e.g. in the US, for parody or criticism). Much has also been written about the W3C's hypocrisy in supporting DRM, and on how it is a betrayal to all Web users. It is clear that the W3C allowing DRM technologies to be developed at the W3C is just a naked ploy for the W3C to get more (paying) member companies to join. These issues all remain. Let's ignore them for the rest of post, though.

One of the other problems with DRM is that, since it can't work technically, DRM supporters have managed to get the laws in many jurisdictions changed to make it illegal to even attempt to break DRM. For example, in the US, there's the DMCA clauses 17 U.S.C. § 1201 and 1203: "No person shall circumvent a technological measure that effectively controls access to a work protected under this title", and "Any person injured by a violation of section 1201 or 1202 may bring a civil action in an appropriate United States district court for such violation".

This has led to a chilling effect in the security research community, with scientists avoiding studying anything that might relate to a DRM scheme, lest they be sued. The more technology embeds DRM, therefore, the less secure our technology stack will be, with each DRM-impacted layer getting fewer and fewer eyeballs looking for problems.

We can ill afford a chilling effect on Web browser security research. Browsers are continually attacked. Everyone who uses the Web uses a browser, and everyone would therefore be vulnerable if security research on browsers were to stop.

Since EME introduces DRM to browsers, it introduces this risk.

A proposal was made to avoid this problem. It would simply require each company working on the EME specification to sign an agreement that they would not sue security researchers studying EME. The W3C already requires that members sign a similar agreement relating to patents, so this is a simple extension. Such an agreement wouldn't prevent members from suing for copyright infringement, it wouldn't reduce the influence of content producers over content distributors; all it does is attempt to address this even more critical issue that would lead to a reduction in security research on browsers.

The W3C is refusing to require this. We call on the W3C to change their mind on this. The security of the Web technology stack is critical to the health of the Web as a whole.

- Ian Hickson, Simon Pieters, Anne van Kesteren

Posted in Multimedia, W3C | 7 Comments »

Make patent commitments for the URL standard

Tuesday, September 2nd, 2014

The WHATWG is starting down the road of getting patent commitments for its standards. You can be part of this! First, create an account with the W3C's community group system. Then, join the WHATWG community group. Then make the patent commitment by following the instructions on this page (pick the first radio button, then click "Record my choice"). That's all there is to it! Google, Mozilla, and Opera have already signed the patent commitment agreement. Anyone can sign up, but it's even more useful if you are an employee of a big patent-holding company and can convince your company to sign up!

Posted in WHATWG | Comments Off on Make patent commitments for the URL standard

HTML is the new HTML5

Wednesday, January 19th, 2011

In 2009 we announced that the HTML5 specification at the WHATWG was progressing to Last Call. The plan at the time was to finish the specification this year and publish a snapshot of "HTML5" in 2012. However, shortly after that we realised that the demand for new features in HTML remained high, and so we would have to continue maintaining HTML and adding features to it before we could call "HTML5" complete, and as a result we moved to a new development model, where the technology is not versioned and instead we just have a living document that defines the technology as it evolves.

As there is still interest in publishing a snapshot of HTML5, the W3C is still working on that (in conjunction with the WHATWG).

Because the specification is now a living document, we are today announcing two changes:

The HTML specification will henceforth just be known as "HTML", with the URL http://whatwg.org/html. (We will also continue to maintain the Web Applications 1.0 specification that contains HTML and a number of related APIs like Web Storage, Web Workers, and Server-Sent Events.)
The WHATWG HTML spec can now be considered a "living standard". It's more mature than any version of the HTML specification to date, so it made no sense for us to keep referring to it as merely a draft. We will no longer be following the "snapshot" model of spec development, with the occasional "call for comments", "call for implementations", and so forth.

In practice, the WHATWG has basically been operating like this for years, and indeed we were going to change the name last year but ended up deciding to wait a bit since people still used the term "HTML5" a lot. However, the term is now basically being used to mean anything Web-standards-related, so it's time to move on!

If you have any questions please don't hesitate to ask them in the comments or on IRC. We'll update the FAQ with the most commonly asked questions.

Posted in WHATWG | 152 Comments »

HTML5 at Last Call

Tuesday, October 27th, 2009

For a brief period today, there were no outstanding e-mails or bugs on the specs, and so I took that opportunity to transition us here at the WHATWG to the next stage of HTML5's development: Last Call! This affects three specs at the WHATWG:

There's also a version of the spec called Web Applications 1.0 (for nostalgic reasons) that has all of the above as well as a number of other specs, namely Web Storage, Web Database, Server-sent Events, and the Web Sockets API and protocol, all together in one document. With the exception of the Web Database spec, they're all now in last call at the WHATWG.

So if you've been waiting to see if someone else would report the problem that you had seen, well, if it's not fixed, they didn't! So you should now send that feedback in yourself.

There's two ways to send feedback. If your feedback is something short and simple, you can just load up the spec in your browser, click on the section with the problem, then type in your message using the review comments box that appears at the bottom of the window, and hit the "Submit Review Comments" button. This works for the HTML5 and Web Applications 1.0 specs. (Thanks to the W3C HTML Working Group for making their bug database available to us for this purpose.)

If your feedback is more elaborate, then you should subscribe to the mailing list and then send your feedback there.

Note: Lest there be any confusion, the W3C HTML WG has not yet transitioned HTML5 to Last Call at the W3C. HTML5 is a joint effort of W3C and WHATWG groups, but we have different issues lists and different criteria for going to Last Call. For more details on the W3C HTML WG's processes, see the W3C HTML WG charter.

Posted in WHATWG | 16 Comments »

Usability testing HTML5

Sunday, October 4th, 2009

Over the past few weeks, Google has been preparing and then running a usability study to test the microdata feature of HTML5.

Methodology

We first created three different variants based on the original microdata proposal:

One based on what the spec said (documentation)
One trying to put types in an explicit itemtype="" attribute and moving "about" to item="", and replacing itemfor="" with just having multiple item=""s with the same name (documentation)
One trying to remove types altogether and using item as a boolean attribute. (documentation)

Our plan was to run six studies, two for each variant, with each participant running through the following steps:

Read and comment on a couple of motivating slides explaining why one would care about microdata
Read the provided documentation for the variant being tested
Look at and comment on the animals example with microdata (variant 2, variant 3)
Exercise: try to extract the data from the "flickr" example (variant 2, variant 3)
Exercise: try to annotate the blog example (variant 2, variant 3)
Exercise: try to annotate the review example (variant 2, variant 3)
Compare and contrast the "yelp" example with microdata to the equivalent of one of the other two variants (variant 2, variant 3)

We made some changes along the way. After the first three, it became clear that "about" was a very confusing term to use for giving the item's global identifier, and so we changed the documentation and examples to use "itemid" instead (which turned out to be much less confusing). Early on we also introduced some documentation text to explain the differences between the variants in the last exercise, because just showing them the two side by side wasn't getting us anything useful (1 to 3, 2 to 1, 2 to 3, 3 to 1).

After our sixth participant canceled on us, we decided to create a fourth variant (documentation) based on what we'd learnt with the first five, and to get two more participants to test this variant specifically. For these participants, we used the following methodology:

Read and comment on a couple of motivating slides explaining why one would care about microdata
Read the provided documentation for the variant being tested
Look at and comment on the animals example with microdata
Exercise: try to extract the data from the "flickr" example
Exercise: try to extract the data from the review example
Exercise: try to annotate the blog example
Exercise: try to annotate the "yelp" example

Conclusions

Some interesting things came out of this study. First, as mentioned above, the term "about" turns out to be highly non-intuitive. I originally took the word from RDFa, on the principle that they knew more about this than I did, but our participants had a lot of trouble with that term. When we changed it to "itemid", there was a marked improvement in people's understanding of the concept.

Second, people were much less confused about types than I thought they would be. In preparing for this study I discussed microdata with a number of people, and I found that one major area of confusion was the concept of types vs the concept of properties. This is why variant 3 has no types: I wanted to find out whether people had trouble with them or not. Well, not only did people not have problems with types, several participants went out of their way to specify the type of an item, for example using the attribute name "type" instead of "item" in variant 1.

It seems that while reasoning about types at the theoretical level is somewhat confusing, it isn't so confusing that the concept should be kept out of the language. Instead, types should just be more explicitly mentioned. This is why we renamed "item" to "itemtype".

Third, people were confused by the scoping nature of the "item" attribute. Some of our participants never understood scoping at all, and most of the participants who understood the concept were still quite confused by the "item" attribute. We were encouraged, however, by one variant 1 participant's sudden enlightenment when they saw variant 3's "itemscope" attribute, and by the reaction of the variant 3 participant to the "itemscope" attribute compared to the reactions that the other two variants' participants had to their "item" attributes. This is why we split "item" into "itemtype" and "itemscope", instead of just using "itemtype".

We found that people who understood microdata's basic features also understood "itemfor", but while we were doing the study, it was pointed out on the WHATWG list that "itemfor" makes it impossible to find the properties of an item without scanning the whole document. This is why we tested the <itemref> idea in variant 4. People were at least as able to understand this as "itemfor".

In general, the changes we made for variant 4 were all quite successful. With one exception, that's what HTML5 now says. The one exception is that I hoisted the "itemid" property to an attribute like "itemtype", based on the argument that if people want to scan a document for the item with a particular "itemid", <itemref> would make it impossible to do it for the property without creating the microdata graph for the entire page.

One thing we weren't trying to test but which I was happy to see is that people really don't have any problems dealing with URLs as property names. In fact, they didn't even complain about URLs being long, which reassured me that microdata's lack of URL shortening mechanisms is probably not an issue.

Overall, this was a good and useful experience. I hope we can use usability studies to test other parts of HTML5 in the future.

Update

(Added based on Twitter feedback.) Some people have asked to see the raw data we collected in this study. I've uploaded the raw files as they were at the end of each participant's one-hour session. This data on its own isn't especially useful; what matters is how the participants reached their conclusions. There are seven hours' worth of video to document that, but we can't publish the video online, since that would be a violation of the legal agreement we have with the participants to protect their privacy.

The study was conducted by one of Google's usability study moderators, and the participants were screened and recruited by a separate team of usability study recruiters specifically for this study. Our criteria were intended to find Web developers who were somewhat comfortable with HTML and who had at most a passing knowledge of the HTML5 effort.

Bear in mind, when looking at the raw data, that the participants had just one hour to go from not knowing about this at all, to being expected to read and write code in a new syntax, with no hints other than the examples and the documentation (which most only glanced at!).

Posted in WHATWG | 7 Comments »