The WHATWG Blog

Please leave your sense of logic at the door, thanks!

Archive for the ‘Browsers’ Category

The state of fieldset interoperability

Wednesday, September 19th, 2018

As part of my work at Bocoup, I recently started working with browser implementers to improve the state of fieldset, the 21 year old feature in HTML, that provides form accessibility benefits to assistive technologies like screen readers. It suffers from a number of interoperability bugs that make it difficult for web developers to use.

Here is an example form grouped with a <legend> caption in a <fieldset> element:

Pronouns

And the corresponding markup for the above example.

<fieldset>
 <legend>Pronouns</legend>
 <label><input type=radio name=pronouns value=he> He/him</label>
 <label><input type=radio name=pronouns value=she> She/her</label>
 <label><input type=radio name=pronouns value=they> They/them</label>
 <label><input type=radio name=pronouns value=other> Write your own</label>
 <input type=text name=pronouns-other placeholder=&hellip;>
</fieldset>

The element is defined in the HTML standard, along with rendering rules in the Rendering section. Further developer documentation is available on MDN.

Usage

Based on a query of the HTTP Archive data set, containing the raw content of the top 1.3 million web pages, we find the relative usage of each HTML element. The fieldset element is used on 8.41% of the web pages, which is higher than other popular features, such as the video and canvas elements; however, the legend element is used on 2.46% of web pages, which is not ideal for assistive technologies. Meanwhile, the form element appears on 70.55% of pages, and we believe that if interoperability bugs were fixed, correct and semantic fieldset and legend use would increase, and have a positive impact on form accessibility for the web.

Fieldset standards history

In January 1997, HTML 3.2 introduces forms and some form controls, but does not include the fieldset or legend elements.

In July 1997, the first draft of HTML 4.0 introduces the fieldset and legend elements:

The FIELDSET element allows form designers to group thematically related controls together. Grouping controls makes it easier for users to understand their purpose while simultaneously facilitating tabbing navigation for visual user agents and speech navigation for speech-oriented user agents. The proper use of this element makes documents more accessible to people with disabilities.

The LEGEND element allows designers to assign a caption to a FIELDSET. The legend improves accessibility when the FIELDSET is rendered non-visually. When rendered visually, setting the align attribute on the LEGEND element aligns it with respect to the FIELDSET.

In December 1999, HTML 4.01 is published as a W3C Recommendation, without changing the definitions of the fieldset and legend elements.

In December 2003, Ian Hickson extends the fieldset element with the disabled and form attributes in the Proposed XHTML Module: XForms Basic, later renamed to Web Forms 2.0.

In September 2008, Ian Hickson adds the fieldset element to the HTML standard.

In February 2009, Ian Hickson specifies rendering rules for the fieldset element. The specification has since gone through some minor revisions, e.g., specifying that fieldset establishes a block formatting context in 2009 and adding min-width: min-content; in 2014.

In August 2018, I proposed a number of changes to the standard to better define how it should work, and resolve ambiguity between browser implementer interpretations.

Current state

As part of our work at Bocoup to improve the interoperability of the fieldset and legend child element, we talked to web developers and browser implementers, proposed changes to the standard, and wrote a lot of tests. At the time of this writing, 26 issues have been reported on the HTML specification for the fieldset element, and the tests that we wrote show a clear lack of interoperability among browser engines.

The results for fieldset and legend tests show some tests failing in all browsers, some tests passing in all browsers, and some passing and failing in different browsers.

Of the 26 issues filed against the specification, 17 are about rendering interoperability. These rendering issues affect use cases such as making a fieldset scrollable, which currently result in broken scroll-rendering in some browsers. These issues also affect consistent legend rendering which is causing web developers avoid using the fieldset element altogether. Since the fieldset element is intended to help people who use assistive technologies to navigate forms, the current situation is less than ideal.

HTML spec rendering issues

In April of this year, Mozilla developers filed a meta-issue on the HTML specification “Need to spec fieldset layout” to address the ambiguities which have been leading to interoperability issues between browser implementations. During the past few weeks of work on fieldset, we made initial proposed changes to the rendering section of the HTML standard to address these 17 issues. At the time of this writing, these changes are under review.

Proposal to extend -webkit-appearance

Web developers also struggle with changing the default behaviors of fieldset and legend and seek ways to turn off the “magic” to have the elements render as normal elements. To address this, we created a proposal to extend the -webkit-appearance CSS property with a new value called fieldset and a new property called legend that are together capable giving grouped rendering behavior to regular elements, as well as resetting fieldset/legend elements to behave like normal elements.

fieldset {
  -webkit-appearance: none;
  margin: 0;
  padding: 0;
  border: none;
  min-inline-size: 0;
}
legend {
  legend: none;
  padding: 0;
}

The general purpose proposed specification for an "unprefixed" CSS ‘appearance’ property, has been blocked by Mozilla's statement that it is not web-compatible as currently defined, meaning that implementing appearance would break the existing behavior of websites that are currently using CSS appearance in a different way.

We asked the W3C CSS working group for feedback on the above approach, and they had some reservations and will develop an alternative proposal. When there is consensus for how it should work, we will update the specification and tests accordingly.

We had also considered defining new display values for fieldset and legend, but care needs to be taken to preserve web compatibility. There are thousands of pages in HTTP Archive that set ‘display’ to something on fieldset or legend, but browsers typically behave as display: block was set. For example, specifying display: inline on the legend needs to render the same as it does by default.

In parallel, we authored an initial specification for the ‘-webkit-appearance’ property in Mike Taylor's WHATWG Compatibility standard (which reverse engineers web platform wonk into status quo specifications), along with accompanying tests. More work needs to be done for the ‘-webkit-appearance’ (or unprefixed ‘appearance’) to define what the values mean and to reach interoperability on the supported values.

Accessibility Issues

We have started looking into testing accessibility explicitly, to ensure that the elements remain accessible even when they are styled in particular ways.

This work has uncovered ambiguities in the specification, which we have submitted a proposal to address. We have also identified interoperability issues in the accessiblity mapping in implementations, which we have reported.

Implementation fixes

Meta bugs have been reported for each browser engine (Gecko, Chromium, WebKit, EdgeHTML), which depend on more specific bugs.

As of September 18 2018, the following issues have been fixed in Gecko:

In Gecko, the bug Implement fieldset/legend in terms of '-webkit-appearance' currently has a work-in-progress patch.

The following issues have been fixes in Chromium:

The WebKit and Edge teams are aware of bugs, and we will follow up with them to track progress.

Conclusion

The fieldset and legend elements are useful to group related form controls, in particular to aid people who use assistive technologies. They are currently not interoperable and are difficult for web developers to style. With our work and proposal, we aim to resolve the problems so that they can be used without restrictions and behave the same in all browser engines, which will benefit browser implementers, web developers, and end users.

(This post is cross-posted on Bocoup's blog.)

Posted in Browsers, Forms, WHATWG | 1 Comment »

Implementation progress on the HTML5 <ruby> element

Friday, November 13th, 2009

If you don't know what the HTML5 ruby element is, you might want to take a minute to first read the section about the ruby element in the HTML5 specification and/or the Wikipedia article on ruby characters. To quote from the HTML5 description of the ruby element:

The ruby element allows one or more spans of phrasing content to be marked with ruby annotations. Ruby annotations are short runs of text presented alongside base text, primarily used in East Asian typography as a guide for pronunciation or to include other annotations. In Japanese, this form of typography is also known as furigana.

I give a specific example further down, but for now I want to first say that the really great news about the ruby element is that last week, Google Chrome developer Roland Steiner checked in a change (r50495, and see also related bug 28420) that adds ruby support to the trunk of the WebKit source repository, thus making the ruby feature available in WebKit nightlies and Chrome dev-channel releases.

A simple example

The following is a simple example of what you can do with the ruby element; make sure to view it in a recent WebKit nightly or Chrome dev-channel release. Note that the text is an excerpt from the source of a ruby-annotated online copy of the short story Run, Melos, Run by the writer Osamu Dazai, which I came across by way of Piro's info page for his XHTML Ruby add-on for Firefox (and which I mention a bit more about further below).

?????????????<ruby>??<rp>?</rp>
<rt>????</rt><rp>?</rp></ruby>????
<ruby>??<rp>?</rp><rt>????</rt><rp>?</rp>
</ruby>???? ??????????????????? 
??????????<ruby>????<rp>?</rp>
<rt>??????</rt><rp>?</rp></ruby>?<ruby>??
<rp>?</rp><rt>????</rt><rp>?</rp></ruby>
??????????

If you don't happen to have Japanese fonts installed, here's a screenshot of the source for reference:

ruby source markup

Notice that the actual annotative ruby text (which I've highlighted in yellow in the source just for the sake of emphasis) is marked up using the rt element as a child of the ruby element, and the text being annotated is the node that's a previous sibling to that rt content as a child of the ruby element. The final new element in the mix is the rp element, which is simply a way to mark up the annotative ruby text with parenthesis, for graceful fallback in browsers that don't support ruby.

So here's the rendered view of that same text:

??????????????????????????????????????????????????????????????????????????????????????????????????????????

And here is a screenshot of how it should look in a recent WebKit nightly or Chrome dev-channel release:

ruby rendered view

Notice that the annotative ruby text is displayed above the ruby base it annotates. If you instead view this page in a browser that doesn't support the ruby feature, you'll see that the ruby text is just shown inline, in parenthesis following the ruby base it annotates. So the feature falls back gracefully in older browsers.

Support in other browsers

Current versions of Microsoft Internet Explorer also have native support for ruby, and you can also get ruby support in Firefox by installing Piro's XHTML Ruby add-on (and for more details, see his XHTML ruby add-on info page) — so we are well on the way to seeing the HTML5 ruby feature supported across a range of browsers. If you're not accustomed to reading printed books and magazines and such in Japanese, that might not sound like such a big deal. But for authors and developers and content providers in Japan who want to finally be able to use on the Web this very common feature of Japanese page layout from the print world, getting ruby support into another major browser engine is a huge win, and something to be very excited about.

Posted in Browsers, Elements | 3 Comments »

Sniffing for RSS 1.0 feeds served as text/html

Tuesday, September 29th, 2009

I recently found myself testing how browsers sniff for RSS 1.0 feeds that are served with an incorrect MIME type. (Yes, my life is full of delicious irony.) I thought I'd share my findings so far.

Firefox

Firefox's feed sniffing algorithm is located in nsFeedSniffer.cpp. As you can see, starting at line 353, it takes the first 512 bytes of the page, looks for a root tag called rss (for RSS 2.0), atom (for Atom 0.3 and 1.0), or rdf:RDF (for RSS 1.0). The RSS 1.0 marker is really a generic RDF marker, so it then does some additional checks for the two required namespaces of an RSS 1.0 feed, http://www.w3.org/1999/02/22-rdf-syntax-ns# and http://purl.org/rss/1.0/. This check is quite simple; it literally just checks for the presence of both strings, not caring whether they are the value of an xmlns attribute (or indeed any attribute at all).

Firefox has an additional feature which tripped up my testing until I understood it. IE and Safari both have a mode where they essentially say "I detected this page as a feed and tried to parse it, but I failed, so now I'm giving up, and here's an error message describing why I gave up." Firefox does not have a mode like this. As far as I can tell, if it decides that a resource is a feed but then fails to parse the resource as a feed, it reparses the resource with feed handling disabled. So an non-well-formed feed served as application/rss+xml will actually trigger a "Do you want to download this file" dialog, because Firefox tried to parse it as a feed, failed, then reparsed it as some-random-media-type-that-I-don't-handle. A non-well-formed feed served as text/html will actually render as HTML, but only after Firefox silently tries (and fails) to parse it as a feed.

There's nothing wrong with this approach; in fact, it seems much more end-user-friendly than throwing up an incomprehensible error message. I just mention it because it tripped me up while testing.

Internet Explorer

Internet Explorer's feed sniffing algorithm is documented by the Windows RSS team. About RSS 1.0, it states:

IE7 detects a RSS 1.0 feed using the content types application/xml or text/xml. ... The document is checked for the strings <rdf:RDF, http://www.w3.org/1999/02/22-rdf-syntax-ns# and http://purl.org/rss/1.0/. IE7 detects that it is a feed if all three strings are found within the first 512 bytes of the document. ... IE7 also supports other generic Content-Types by checking the document for specific Atom and RSS strings.

Now that I understand IE's algorithm, I have to concede that this documentation is 100% accurate. However, it doesn't tell the full story. Here's what actually happens. If the Content-Type is

...then IE will trigger its feed sniffing. Once IE triggers its feed sniffing, it will never change its mind (unlike Firefox). If feed parsing fails, IE will throw up an error message complaining of feed coding errors or an unsupported feed format. The presence or absence of a charset parameter in the Content-Type header made absolutely no difference in any of the cases I tested.

And how exactly does IE detect an RSS 1.0 feed, once it decides to sniff? The documentation on MSDN is literally true: "The document is checked for the strings <rdf:RDF, http://www.w3.org/1999/02/22-rdf-syntax-ns# and http://purl.org/rss/1.0/. IE7 detects that it is a feed if all three strings are found within the first 512 bytes of the document." Combined with our knowledge of which Content-Types IE considers "generic," we can conclude that the following page, served as text/html, will be treated as a feed in IE:

<!-- <rdf:RDF -->
<!-- http://www.w3.org/1999/02/22-rdf-syntax-ns# -->
<!-- http://purl.org/rss/1.0/ -->
<script>alert('Hi!');</script>

[live demonstration]

Why Bother?

I am working with Adam Barth and Ian Hickson to update draft-abarth-mime-sniff-01 (the content sniffing algorithm referenced by HTML5) to sniff RSS 1.0 feeds served as text/html. It is unlikely that we will adopt IE's algorithm, since it seems unnecessarily pathological. I am proposing the following change, which would bring the content sniffing specification in line with Firefox's sniffing algorithm:

In the "Feed or HTML" section, insert the following steps between step 10 and step 11:

10a. Initialize /RDF flag/ to 0.

10b. Initialize /RSS flag/ to 0.

10c. If the bytes with positions pos to pos+23 in s are exactly equal to 0x68, 0x74, 0x74, 0x70, 0x3A, 0x2F, 0x2F, 0x70, 0x75, 0x72, 0x6C, 0x2E, 0x6F, 0x72, 0x67, 0x2F, 0x72, 0x73, 0x73, 0x2F, 0x31, 0x2E, 0x30, 0x2F respectively (ASCII for "http://purl.org/rss/1.0/"), then:

  1. Increase pos by 23.
  2. Set /RSS flag/ to 1.

10d. If the bytes with positions pos to pos+42 in s are exactly equal to 0x68, 0x74, 0x74, 0x70, 0x3A, 0x2F, 0x2F, 0x77, 0x77, 0x77, 0x2E, 0x77, 0x33, 0x2E, 0x6F, 0x72, 0x67, 0x2F, 0x31, 0x39, 0x39, 0x39, 0x2F, 0x30, 0x32, 0x2F, 0x32, 0x32, 0x2D, 0x72, 0x64, 0x66, 0x2D, 0x73, 0x79, 0x6E, 0x74, 0x61, 0x78, 0x2D, 0x6E, 0x73, 0x23 respectively (ASCII for "http://www.w3.org/1999/02/22-rdf-syntax-ns#"), then:

  1. Increase pos by 42.
  2. Set /RDF flag/ to 1.

10e. Increase pos by 1.

10f. If /RDF flag/ is 1, and /RSS flag/ is 1, then the /sniffed type/ of the resource is "application/rss+xml". Abort these steps.

10g. If pos points beyond the end of the byte stream s, then continue to step 11 of this algorithm.

10h. Jump back to step 10c of this algorithm.

Further Reading

You can see the results of my research to date and test the feeds for yourself. Because my research results are plain text with embedded HTML tags, I have added 512 bytes of leading whitespace to the page to foil browsers' plain-text-or-HTML content sniffing. Mmmm -- delicious, delicious irony.

Update: Belorussian translation is available.

Posted in Browsers | 4 Comments »

Help Test HTML5 Parsing in Gecko

Wednesday, July 8th, 2009

The HTML5 parsing algorithm is meant to demystify HTML parsing and make it uniform across implementations in a backwards-compatible way. The algorithm has had “in the lab” testing, but so far it hasn’t been tested inside a browser by a large number of people. You can help change that now!

A while ago, an implementation of the HTML5 parsing algorithm landed on mozilla-central preffed off. Anyone who is testing Firefox nightly builds can now opt to turn on the HTML5 parser and test it.

How to Participate?

First, this isn’t release-quality software. Testing the HTML5 parser carries all the same risks as testing a nightly build in general, and then some. It may crash, it may corrupt your Firefox profile, etc. If you aren’t comfortable with taking the risks associated with running nighly builds, you shouldn’t participate.

If you are still comfortable with testing, download a trunk nightly build, run it, navigate to about:config and flip the preference named html5.enable to true. This makes Gecko use the HTML5 parser when loading pages into the content area and when setting innerHTML. The HTML5 parser is not used for HTML embedded in feeds, Netscape bookmark import, View Source, etc., yet.

The html5.enable preference doesn’t require a restart to take effect. It takes effect the next time you load a page.

What to Test?

The main thing is getting the HTML5 parser exposed to a wide range of real Web content that people browse. This may turn up crashes or compatibility problems.

So the way to help is to use nightly builds with the HTML5 parser for browsing as usual. If you see no difference, things are going well! If you see a page misbehaving—or, worse, crashing—with the HTML5 parser turned on but not with it turned off, please report the problem.

Reporting Bugs

Please file bugs in the “Core” product under “HTML: Parser” component with “[HTML5] ” at the start of the summary.

Known Problems

First and foremost, please refer to the list of known bugs.

However, I’d like to highlight a particular issue: Support for comments ending with --!> is in the spec, but the patch hasn’t landed, yet. Support for similar endings of pseudo-comment escapes within script element content is not in the spec yet. The practical effect is that the rest of the page may end up being swallowed up inside a comment or a script element.

Another issue is that the new parser doesn’t yet inhibit document.write() in places where it shouldn’t be allowed per spec but where the old parser allowed it.

Is There Anything New?

So what’s fun if success is that you notice no change? There are important technical things under the hood—like TCP packet boundaries not affecting the parse result and there never being unnotified nodes in the tree when the event loop spins—but you aren’t supposed to notice.

However, there is a major new visible feature, too. With the HTML5 parser, you can use SVG and MathML in text/html pages. This means that you can:

And yes, you can even put SVG inside MathML <annotation-xml> or MathML inside <foreignObject>. The mixing you’ve seen in XML is now supported in HTML, too.

If you aren’t concerned with taking the steps to make things degrade nicely in browsers that don’t support SVG and MathML in HTML, you can simply copy and paste XML output from your favorite SVG or MathML editor into your HTML source as long as the editor doesn’t use namespace prefixes for elements and uses the prefix xlink for XLink attributes.

If you don’t use the XML empty element syntax and you put you SVG text nodes in CDATA sections, the page will degrade gracefully in older HTML browser so that the image simply disappears but the rest of the page is intact. You can even put a fallback bitmap as <img> inside <desc>. Unfortunately, there isn’t a similar technique for MathML, though if you want to develop one, I suggest experimenting with the <annotation> as your <desc>-like container.

There are known issues with matching camelCase names with Selectors or getElementByTagName, though.

Posted in Browsers, Processing Model, Syntax | 8 Comments »

Supporting New Elements in Firefox 2

Wednesday, March 18th, 2009

We have previously talked about how to get Internet Explorer to play ball when using the new HTML5 elements, but today I'm going to talk about Firefox 2.

Firefox 2 (or any other Gecko-based browser with a Gecko version pre 1.9b5) has a parsing bug where it will close an unknown element when it sees the start tag of a "block" element like p, h1, div, and so forth. So if you have:

<body>
 <header>
  <h1>Test</h1>
 </header>
 <article>
  <p>...</p>
  ...
 </article>
 <nav>
  <ul>...</ul>
 </nav>
 <footer>
  <p>...</p>
 </footer>
</body>

...then in Firefox 2 it will be parsed as if it were:

<body>
 <header>
  </header><h1>Test</h1>
 
 <article>
  </article><p>...</p>
  ...
 
 <nav>
  </nav><ul>...</ul>
 
 <footer>
  </footer><p>...</p>
 
</body>

So if you style the new elements with CSS it will probably look completely broken in Firefox 2.

If you care about Firefox 2 then there are some ways to fix this:

  1. Go back to using div elements
  2. Use content type negotiation between text/html and application/xhtml+xml
  3. Fix up the DOM with scripting

(1) is probably wise if your content structure changes between pages or over time. (2) also works but means that users will be exposed to the Yellow Screen of Death should a markup error slip through your system. Otherwise (3) can be worth to consider.

Fixing up Firefox 2's DOM is actually pretty simple if you have a consistent structure. Using the same markup as above it could look something like this:

<body>
 <header>
  <h1>Test</h1>
 </header>
 <article>
  <p>...</p>
  ...
 </article>
 <nav>
  <ul>...</ul>
 </nav>
 <footer>
  <p>...</p>
 </footer>
 <!--[if !IE]>--><script>
  // dom fixup for gecko pre 1.9b5
  var n = document.getElementsByTagName('header')[0];
  if (n.childNodes.length <= 1) { // the element was closed early
    var tags = ['ARTICLE', 'NAV', 'FOOTER', 'SCRIPT'];
    for (var i = 0; i < tags.length; ++i) {
      while (n.nextSibling && n.nextSibling.nodeName != tags[i]) {
        n.appendChild(n.nextSibling);
      }
      n = n.nextSibling;
    }
  }
 </script><!--<![endif]-->
</body>

You might think that this script would work for IE, too, when not using the createElement hack, but apparently IE throws an exception when trying to append a child to an unknown element. So you still have to use the createElement hack for IE.

If you want to move the script to head and run it on load and you don't have anything after the footer then you would replace 'SCRIPT' in the tags array with undefined to make it work.

(If you want to do content type negotiation and want to just serve XHTML to Gecko-based browsers with this bug then you should look for the substrings "Gecko/" and "rv:1.x" where x is less than 9, or "rv:1.9pre" or "rv:1.9a" or "rv:1.9bx" where x is less than 5.)

Posted in Browsers, DOM, Elements | 17 Comments »