Back in 2011, Ben Schwarz took on the ambitious project of curating an edition of the HTML Standard specifically for web developers. It omitted details aimed specifically at browser vendors, and had several additional features to make the experience more pleasant to read.
Ben did an amazing job maintaining this for many years, but some time ago it fell behind the changes to the HTML Standard. Since the move to make HTML more community-driven, we've been hoping to find a way to synchronize the developer's edition with the mainstream specification. That day has finally arrived!
We've deployed an initial version of the new developer's edition at a new URL, https://html.spec.whatwg.org/dev/. It's rough around the edges, missing several of the features of the old version. And it needs some curation to omit implementer-specific sections; many have crept in during the downtime. We're tracking these and other issues in the issue tracker. But now, the developer's edition is integrated into our build process and editing workflow, and will forever remain synchronized with the HTML Standard itself.
Hereby we issue a call to the community to help us with the revitalized developer's edition. Two of the biggest areas of potential improvement are helping us properly mark up the source according to the guidelines for what goes in the developer's edition, and contributing to the design of the developer's edition in order to make it more beautiful and usable.
Finally, I want to thank Michael™ Smith for getting this process started, via a series of pull requests to our build tools which did most of the foundational work. And of course Ben Schwarz, without whom none of this would have happened in the first place.
You’d think that the HTML Standard would be pretty far removed from shared memory considerations, but as it happens HTML defines a parser for HTML which is intertwined with script execution, defines a way to instantiate new global objects through the
iframe element, defines a way to instantiate new threads (and even processes, depending on the implementation) with workers, and all the various infrastructure pieces that go along with that. Finally, it also defines a message-based communication channel to communicate between those threads and processes.
SharedArrayBuffer class: a sibling to
ArrayBuffer, with the ability to be accessed from several threads at once. And on top of that we needed to do some work to make it play nicely with all the various globals the web platform provides and make sure it worked with the message-passing system (which you probably know as
postMessage()), all while trying to avoid violating constraints that would make programming with
SharedArrayBuffer objects a nightmare.
We ended up making several changes (and to make sure they all end up being interoperable we wrote accompanying tests):
- Changed the “structured cloning” algorithm into distinct serialization and deserialization algorithms, thereby introducing an intermediate, serialized, form. This was a long overdue refactoring needed to define
MessageChannel objects properly, since when using those you don't always know at the start where you'll end up. For
SharedArrayBuffer objects this was critical, since we needed to enforce that they can't be sent across process boundaries.
- Redefined the way worker ownership works, so it’s effectively a chain of parent-based ownership rather than all workers being owned by documents. This was necessary as we needed to separate dedicated workers nested in shared workers (not widely supported) from those nested in documents, as memory sharing works differently in these two cases.
- Defined the boundaries between which globals you can share memory. For the record, the web platform has many global objects:
ServiceWorkerGlobalScope, and soon various subclasses of
WorkletGlobalScope. A simplified (and slightly inaccurate) description would be that a window can share with any of its same-origin windows in
iframe elements, and any descendant dedicated workers (if there’s no shared/service worker in that chain). A worker (dedicated, shared, or service) can share with any descendant dedicated workers (again, as long as there’s no shared worker in that chain). As worklets aren’t finished yet you’ll have to read up on the actual pull request for the ongoing deliberations. We might post an update when they’re shipping if there’s interest.
- Defined a new
messageerror event that basically ensures that when message-passing goes wrong that error does not get lost. These errors happen when you cannot allocate enough memory in the destination, or try to pass a
SharedArrayBuffer object across a (theoretical) process boundary. As this event is dispatched on the receiving end it’s not the best, but if we detect that libraries often end up passing this information back to the sender we might take care of that at the standards-level at some point. For now messaging errors back was deemed too complicated and not important enough given the conditions under which these occur.
- Actually defined how these
SharedArrayBuffer objects get serialized and then deserialized, how various platform objects integrate with that, and how all the existing APIs that deal with serialization and deserialization in some manner integrate with that. E.g., passing
SharedArrayBuffer objects to
pushState() ends up throwing, because we don’t want to store them to disk, but
postMessage() should generally work (although initial implementations will have limitations here, especially with
As always, nothing is perfect and there are some gotchas without a good solution:
- Imagine you have a window with a descendant
iframe element that has further descendant dedicated workers that all collaborate together with shared memory and then the
iframe element gets navigated. This ends up stopping the workers without the ability to do cleanup. Some workarounds are available, but in general it’s a somewhat fragile setup that deserves a better solution.
- Aborting scripts: browsers typically let users abort scripts that are detected to significantly slow down their computer through some heuristic. This can violate some of the invariants the shared memory design tries to provide.
Although the above covers the integration of shared memory into the foundations of the web platform, there is still ongoing work on allowing specific APIs to accept and operate on shared memory. This requires changes to IDL to introduce a mechanism for safelisting APIs that can operate on
SharedArrayBuffer objects, as well as updating specifications to use that new safelisting mechanism, and of course writing tests for these spec changes. This work is still ongoing, but at least now it can build on top of a solid foundation.
In a previous post we’ve already explained how interoperability is important to the WHATWG. Without it, we’re writing fiction, and in the world of standards that is no good.
From a similar perspective, we’ve now more clearly documented how the WHATWG creates standards. The Working Mode document describes what is expected of editors and contributors, what criteria any changes to standards must fulfill, and gives guidelines for conflicts and tests.
What has changed the most since 2004 is requiring tests and implementer support for any changes made. These should help ensure that decisions need not be revisited again. Documenting our processes is also new and is born out of necessity due to the wider range of standards the WHATWG maintains.
We appreciate any feedback on the Working Mode document as it can undoubtedly be refined further.
The goal of the WHATWG’s Living Standards is to achieve interoperable implementations. With an ever-evolving web platform, we want changes to our standards to reach all implementations quickly and reliably, but from time to time there have been mishaps:
Three months ago, we changed the process for the HTML Standard to encourage writing tests and filing browser bugs for normative changes. (Normative means that implementations are affected.) This was the first step on a path towards improving interoperability and shortening the feedback cycle, and it has thus far exceeded our own expectations:
- “Tests and bugs speed up the turnaround time a lot. Without this it could go years before a browser vendor picked up or objected to a change (even if someone from that browser had given an a-OK for the change). Now some changes have patches in browsers before the change has landed in the spec.”
- “Writing tests also increases the quality of the spec, since problems become clear in the tests. It also seems reasonable to assume that the tests help getting interoperable implementations, which is the goal of the spec.”
- “I feel much more sure that when I make a spec change, I am doing all I can to get it implemented everywhere, in precisely the manner I intended.”
As an example, see Remove "compatibility caseless" matching where 3 of the 4 browser bugs are now fixed, or Add <script nomodule> to prevent script evaluation where all vendors have indicated support, and WebKit has a patch to implement the proposed feature and are contributing their tests to web-platform-tests—even before the standard’s pull request has landed.
Note in particular that this has not amounted to WHATWG maintainers writing all new tests. Rather, we are a community of maintainers, implementers and other contributors, where tests can be written to investigate current behavior before even discussing a change to the standard, or where the most eager implementer writes tests alongside the implementation.
We have been using this process successfully for other WHATWG standards too, such as Fetch, URL, and Streams. And today, we are elevating this process to all WHATWG standards, as now documented in the WHATWG contributor guidelines.
Welcome to the newest standard maintained by the WHATWG: the Infra Standard! Standards such as DOM, Fetch, HTML, and URL have a lot of common low-level infrastructure and primitives. As we go about defining things in more detail we realized it would be useful to gather all the low-level functionality and put it one place. Infra seemed like a good name as it’s short for infrastructure but also means below in Latin, which is exactly where it sits relative to the other work we do.
In the long term this should help align standards in their vocabulary, make standards more precise, and also shorten them as their fundamentals are now centrally defined. Hopefully this will also make it easier to define new standards as common operations such as “ASCII lowercase” and data structures such as maps and sets no longer need to be defined. They can simply be referenced from the Infra Standard.
We would love your help improving the Infra Standard on GitHub. What language can further be deduplicated? What is common boilerplate in standards that needs to be made consistent and shared? What data types are missing? Please don’t hesitate to file an issue or write a pull request!