The WHATWG Blog

Please leave your sense of logic at the door, thanks!

Archive for the ‘What’s Next’ Category

Progressing Streams

Wednesday, July 5th, 2017

Back in 2014 we announced the Streams Standard. It's about time for an update on where we are and what's coming up.

Streaming the response to fetch() via the response.body attribute was standardized last year and is now implemented in several major browsers. Recently streaming uploads have been added to the Fetch Standard. fetch(url, {method: 'POST', body: readable}) will start an upload. The expected properties of a streaming function apply:

Service Workers are another area where Streams are indispensable. They are used internally to allow the page to start processing bytes as soon as any are available, and are being used by developers as a powerful tool to synthesize responses. A Response object can be constructed with a ReadableStream body and then passed to fetchEvent.respondWith() like any other response.

The real power of Streams is unlocked when sources, sinks and transforms from disparate authors are combined in novel ways. The most exciting action is not in platform built-ins but in streams created in the wider developer ecosystem. With this in mind, every aspect of the standard has been fine-tuned for productivity. Take the following example:

let appendChildWritableStream = new WritableStream({
  write(domNode) {
   parentNode.appendChild(domNode);
  }
});

Notice what isn't there:

With a recent browser you can see this in action in a live demo (video). At time of writing, this demo and the others in this post work in Chrome stable version 59 and Safari stable version 10.1.

The WritableStream API is now stable. We've added a getWriter() method which is the analogue of ReadableStream's getReader(). It adds locking semantics so that multiple writers cannot interfere with each other. Recent work has focused on predictability, for example by preventing underlying sink methods from running concurrently, and robustness, like dealing with badly-behaving strategy size functions that call into other methods reentrantly.

The strength of the algorithmic style of specification is that even unintended behavior will be the same between implementations. On the other hand, when specifying the pipeTo() method of ReadableStream, providing latitude for browsers to optimize was a high priority. As well as bypassing JavaScript when copying data between built-in streams, user agents may need to change the timing or ordering of calls to underlying methods to get the best performance for their architecture. For this reason, we specified pipeTo() in a requirements style. This presents its own challenges, for example how to specify the "least work" that an implementation can do and still be compliant.

Streams also challenge our fundamental assumptions about how the web platform works. You may not want to have to modify the DOM directly if you already have a template engine producing HTML. Shouldn't you be able to pipe a stream of HTML to an element?

We don't yet know how this capability would fit in the web platform, but Jake Archibald has created a custom element providing a compelling vision of what we could do with it. The demo demonstrates inserting a stream of HTML directly from the server.

Depending on your environment, you may have seen some significant jank in that demo. The problem is that the server supplies data faster than the browser can layout and render it. This is where backpressure comes in. Any data sink can apply backpressure just by returning a promise from its write() method. In many cases this happens as a natural consequence of the implementation. In this case, we want to delay until the browser has had a chance to render the HTML. A slight modification to the custom element and the page becomes much smoother: demo (side-by-side video).

It's clear that we should prioritize interactivity when adding content to an existing page. Maybe browsers need a special low-jank path for streaming HTML. But what about initial page load? You've probably seen pages that didn't respond to input because they were still performing some expensive layout below the fold. Should we prioritize interactivity there, too? We're still working through all the implications.

Ensuring low friction for all participants has really helped drive the progress of the standard. From bug fixes to the BYOB readable byte stream design from implementers, to large scale contributions from external contributors, the benefits of the community process are clear.

Transform streams are the final key piece needed to make the stream ecosystem complete. We have a working, tested reference implementation that we are using as the basis for active design discussions. Full standardization and implementer adoption is expected to follow in the next few months.

In past two years streams have gone from being a promising idea to having multiple independent implementations and wide adoption. Implementation work is accelerating, and there is already a critical mass of shipping functionality.

We're looking to widen developer involvement with Streams. Check out the examples, contribute some web platform tests, or help improve the documentation.

Posted in What's Next | No Comments »

The Developer’s Edition of HTML makes a comeback

Wednesday, June 28th, 2017

Back in 2011, Ben Schwarz took on the ambitious project of curating an edition of the HTML Standard specifically for web developers. It omitted details aimed specifically at browser vendors, and had several additional features to make the experience more pleasant to read.

Ben did an amazing job maintaining this for many years, but some time ago it fell behind the changes to the HTML Standard. Since the move to make HTML more community-driven, we've been hoping to find a way to synchronize the developer's edition with the mainstream specification. That day has finally arrived!

We've deployed an initial version of the new developer's edition at a new URL, https://html.spec.whatwg.org/dev/. It's rough around the edges, missing several of the features of the old version. And it needs some curation to omit implementer-specific sections; many have crept in during the downtime. We're tracking these and other issues in the issue tracker. But now, the developer's edition is integrated into our build process and editing workflow, and will forever remain synchronized with the HTML Standard itself.

Hereby we issue a call to the community to help us with the revitalized developer's edition. Two of the biggest areas of potential improvement are helping us properly mark up the source according to the guidelines for what goes in the developer's edition, and contributing to the design of the developer's edition in order to make it more beautiful and usable.

Finally, I want to thank Michael™ Smith for getting this process started, via a series of pull requests to our build tools which did most of the foundational work. And of course Ben Schwarz, without whom none of this would have happened in the first place.

Posted in Tutorials, What's Next | No Comments »

Infra

Wednesday, November 16th, 2016

Welcome to the newest standard maintained by the WHATWG: the Infra Standard! Standards such as DOM, Fetch, HTML, and URL have a lot of common low-level infrastructure and primitives. As we go about defining things in more detail we realized it would be useful to gather all the low-level functionality and put it one place. Infra seemed like a good name as it’s short for infrastructure but also means below in Latin, which is exactly where it sits relative to the other work we do.

In the long term this should help align standards in their vocabulary, make standards more precise, and also shorten them as their fundamentals are now centrally defined. Hopefully this will also make it easier to define new standards as common operations such as “ASCII lowercase” and data structures such as maps and sets no longer need to be defined. They can simply be referenced from the Infra Standard.

We would love your help improving the Infra Standard on GitHub. What language can further be deduplicated? What is common boilerplate in standards that needs to be made consistent and shared? What data types are missing? Please don’t hesitate to file an issue or write a pull request!

Posted in What's Next, WHATWG | Comments Off on Infra

XHTML5 in a nutshell

Sunday, July 25th, 2010

The WHATWG Wiki portal has a nice section describing HTML vs. XHTML differences, as well as specifics of a polyglot HTML document that also would be able to serve HTML5 document as valid XML document. I'd like to review what it takes to transform an HTML5 polyglot document into a valid XHTML5 document: it appears, finally the 'XHTML5' has become an official name.

The W3C first public working draft of "Polyglot Markup" recommendation describes polyglot HTML document as a document that conforms to both the HTML and XHTML syntax by using a common subset of both the HTML and XHTML and in a nutshell the HTML5 polyglot document is:

Polyglot document could serve as either HTML or XHTML, depending on browser support and MIME type. A polyglot HTML5 code essentially becomes XHTML5 document if it is served with the XML MIME type application/xhtml+xml . In a nutshell the XHTML5 document is: Finally, the basic XHTML5 document would look like this:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
<meta charset="UTF-8" />

</head>

<body>
<svg xmlns="http://www.w3.org/2000/svg">
<rect stroke="black" fill="blue" x="45px" y="45px" width="200px" height="100px" stroke-width="2" />
</svg>
</body>
</html>

The XML declaration <?xml version=”1.0” encoding=”UTF-8”?> is not required if the default UTF-8 encoding is used: an XHTML5 validator would not mind if it is omitted. However it is strongly recommended to configure the encoding using server HTTP Content-Type header, otherwise this character encoding could be included in the document as part of a meta tag <meta charset="UTF-8" />. This encoding declaration would be needed for a polyglot document so that it will be treated as UTF-8 if served as either HTML or XHTML.

The Total Validator Tool - Firefox plugin/desktop app has now the user-selectable option for XHTML5-specific validation.

I would say that the main advantage of using XHTML5 would be the ability to extend HTML5 to XML-based technologies such as SVG and MathML. The disadvantage is the lack of Internet Explorer support, more verbose code, and error handling. Unless we need that extensibility, HTML5 is the way to go.

Posted in Syntax, What's Next, WHATWG | 26 Comments »

What’s next in HTML, episode 2: who’s been peeing in my sandbox?

Tuesday, January 26th, 2010

Welcome back to “What’s Next in HTML,” where I’ll try to summarize the major activity in the ongoing standards process in the WHAT Working Group. With HTML5 in Last Call, the WHATWG has moved to an unversioned development model for HTML. While browser vendors are busy implementing HTML5, let’s talk about what’s next.

The big news in HTML this week is r1643. ... Well, technically that revision is over 20 months old, but there have been a flurry of updates that affect the underlying feature. What feature, you might ask? Sandboxing untrusted content.

The sandbox attribute, when specified [on an <iframe> element], enables a set of extra restrictions on any content hosted by the iframe. ... When the attribute is set, the content [hosted by the iframe] is treated as being from a unique origin, forms and scripts are disabled, links are prevented from targeting other browsing contexts, and plugins are disabled.

This could be useful for all kinds of scenarios. The HTML5 spec lists some examples of blog comments, but I think that’s mostly a red herring. Think about what’s hosted in iframes today: third-party advertising and third-party widgets. In each case, a web author wants to embed something on their page that they have little or no control over. In practice, that usually works fine. Advertising iframes don’t do anything (except display ads). Most widgets are well-behaved, and most widget frameworks (like Google Gadgets) enforce terms of service that forbid widgets from “taking over” the parent page in which they are embedded. Still, that’s a social/legal solution, not a technical one. Sandboxing is a complementary technical solution, where the parent page can actually tell the browser “Hey, I don’t fully trust this thing, but I’m embedding it anyway. Can you reduce its privileges?”

What privileges? Well, by default, “sandboxed” iframes can not

There are ways for the parent page to add back each of these privileges, if the third-party content needs it.

[The sandbox attribute’s] value must be an unordered set of unique space-separated tokens. The allowed values are allow-same-origin, allow-forms, and allow-scripts. The allow-same-origin keyword allows the content to be treated as being from the same origin instead of forcing it into a unique origin, and the allow-forms and allow-scripts keywords re-enable forms and scripts respectively (though scripts are still prevented from creating popups).

So it’s a security feature. You could restrict an advertising iframe to have no privileges whatsoever, but you could give a widget iframe privileges to execute its own scripts or embed its own forms.

If it’s a security feature, won’t older browsers still be insecure?

Yes. Well, no more than they are now. In fact, very few browsers support the sandbox attribute today, so we’re not just talking about users of older browsers — we’re talking about pretty much everyone. But that’s OK. The sandbox attribute is designed to be an incremental security feature. It’s an additional layer of security, not the only layer. Browsers have supported iframes for a long time, and thousands of web authors are using them despite the very real risks of embedding untrusted content. Advertising networks can and have been hacked; malicious widgets can and have been published; bad actors can and do try to do bad things to as many people as possible until they’re caught and taken down. You need to keep doing all the things you’re doing now to prevent iframe-based attacks. Then add sandbox, too.

I can’t do any filtering or sanitizing. Can I rely solely on browser-based sandboxing?

Someday, you might — might! — be able to throw out all your sanitizing code and rely solely on the sandbox attribute. Of course, you can’t do that today, because users of older browsers would still be vulnerable. So we need a “clean break” solution — a way to serve untrusted content to supporting browsers while absolutely, positively, 100% ensuring that older browsers never render the untrusted content under any circumstances. Enter the text/html-sandboxed MIME type.

All HTML pages are served with the text/html MIME type. It’s part of the HTTP headers, normally invisible to end users, but nevertheless sent by web servers every time a client requests a page. Every resource type (images, scripts, CSS files) has its own MIME type. Untrusted content could have its own MIME type. And this is where text/html-sandboxed comes in. If my web server serves up an HTML page with a MIME type of text/html, your browser will render it. If my web server serves up the same HTML page with a MIME type of text/html-sandboxed, you browser will download it (or offer to download it). Your browser doesn’t recognize that MIME type, so it falls back to the default action, which is to download it and save it as a file on your local disk. We can use this behavior to our advantage.

As browsers start supporting the sandbox attribute, they can also start supporting the text/html-sandboxed MIME type. What does it mean to “support” this new MIME type? If a user navigates directly to a page served with the new MIME type, don’t do anything special. Just download it, which is what happens already. BUT... if the user navigates to a page that includes an <iframe> element, AND the iframe has a sandbox attribute, AND the src of the iframe points to an HTML page that is served with the text/html-sandboxed MIME type, THEN render the iframe as normal (but still subject to the restrictions listed in the sandbox attribute).

Older browsers will download (or offer to download) the untrusted content. From a security perspective, that’s a good thing — at least, it means the content won’t be rendered as HTML. From a usability perspective, that’s terrible. Who wants to go to a page and suddenly have the browser offering to download a bunch of useless files? That means that you won’t really be able to use this technique until all users have upgraded to a browser that supports both the sandbox attribute and the text/html-sandboxed MIME type. That will be... a while. But it might happen someday!

Iframes suck. Can’t I just include the untrusted content inline?

There have been a number of proposals for a <sandbox> element, which you could wrap around untrusted content. All such proposals suffer fatal flaws, stemming from how today’s browsers parse HTML markup. You, the author who wants to “wrap” untrusted content, would need to ensure that the content did not “break out” of the sandbox. For instance, it could include an </sandbox> element. (Hey, it’s untrusted! That’s why we’re here in the first place.) There are a surprising number of variations of markup that are recognized as end tags (having to do with inserting whitespace characters in strange places), and you would be responsible for sanitizing all of these variations. Furthermore, you would need to ensure that the untrusted content did not include a script that called document.write(), which could be used for writing out a matching </sandbox> end tag programmatically. Think about the number of ways that script could be obfuscated, and pretty soon you’re asking individual web authors to solve the halting problem just to wrap some untrusted content.

If a wrapper element is the wrong solution, what’s the right one? This is where the “flurry of updates” has been happening. The current solution is r4619: the srcdoc attribute (with minor updates in r4623, r4624, and r4626). The best way to explain it is by example:

<iframe sandbox srcdoc="<p>Markup in an attribute, woohoo!</p>"></iframe>

Yeah, that’s pretty janky. But it has the following nice qualities:

It also has the following not-so-nice qualities:

There is one exception to that last rule. There are a few comment systems that are entirely client-side. That is, the comments are not part of the page markup that comes down from the web server; they are programmatically added after the page is rendered. Such comment systems could use JavaScript-based feature detection to check whether the browser supported the srcdoc attribute, and write out the appropriate markup either way. I wrote the book on HTML5 feature detection. (No really! A whole fscking book!) Detecting srcdoc support would use detection technique #2:

if ("srcdoc" in document.createElement("iframe")) { ... }

But this would only help in the case where you were adding untrusted content to the page at runtime, on the client side. Server-side cases will have to wait until everybody upgrades.

So when can I use all this stuff?

Hahahahahaha. You must be new here.

No really, when?

There are several pieces here, each with their own compatibility story.

  1. The sandbox attribute, for reducing privileges of untrusted content. Chromium and Google Chrome support the sandbox attribute (I tested the dev channel version 4.0.302.3); Safari, Firefox, Internet Explorer, and Opera ignore it. So you can start using the sandbox attribute today — just be sure to test in Chromium or Google Chrome to ensure you’ve set the sandbox privileges properly. It won’t have any effect in other browsers, but that’s OK. Remember, the sandbox attribute isn’t designed to be your only line of defense; it’s a complement to your existing defenses. Keep doing whatever you’re doing now (sanitizing input, auditing code, enforcing legal terms with your partners, etc), then add sandbox for extra protection.
  2. The text/html-sandboxed MIME type, for ensuring that users can’t navigate to untrusted content. There are two parts to this. First, browsers must not render pages served with a text/html-sandboxed MIME type, if you navigate to the page directly. This part works in all browsers, today; they all download (or offer to download) the page markup instead of rendering it. Second, browsers that support the sandbox attribute need to render iframes served with the text/html-sandboxed MIME type (subject to the privilege restrictions listed in the sandbox attribute). No browser supports this yet, not even Google Chrome. (It renders the parent page but downloads the iframe content instead of rendering it within the frame.) So you can’t use this technique yet, until Google updates Chrome to support it. (In theory, other browser vendors will implement support for this at the same time they implement support for the sandbox attribute, but I suppose we’ll just have to wait and see.)
  3. The srcdoc attribute, for including untrusted content inline. Since the fallback behavior in legacy browsers for this feature is “render nothing at all” (by design), this attribute won’t be useful until pretty much all of your visitors upgrade to browsers that support the attribute. At the moment, no current browser supports the srcdoc attribute, so it’ll be a while. If I had to guess, I’d say January 29, 2022, at 4:37pm. Plus or minus 10 years.

And now you know “What’s Next in HTML.”

Posted in What's Next | 24 Comments »