Help Test HTML5 Parsing in Gecko
The HTML5 parsing algorithm is meant to demystify HTML parsing and make it uniform across implementations in a backwards-compatible way. The algorithm has had “in the lab” testing, but so far it hasn’t been tested inside a browser by a large number of people. You can help change that now!
A while ago, an implementation of the HTML5 parsing algorithm landed on mozilla-central preffed off. Anyone who is testing Firefox nightly builds can now opt to turn on the HTML5 parser and test it.
How to Participate?
First, this isn’t release-quality software. Testing the HTML5 parser carries all the same risks as testing a nightly build in general, and then some. It may crash, it may corrupt your Firefox profile, etc. If you aren’t comfortable with taking the risks associated with running nighly builds, you shouldn’t participate.
If you are still comfortable with testing, download a trunk
nightly
build, run it, navigate to about:config
and flip the
preference named html5.enable
to true
. This
makes Gecko use the HTML5 parser when loading pages into the content
area and when setting innerHTML
. The HTML5 parser is not
used for HTML embedded in feeds, Netscape bookmark import, View
Source, etc., yet.
The html5.enable
preference doesn’t require a
restart to take effect. It takes effect the next time you load a
page.
What to Test?
The main thing is getting the HTML5 parser exposed to a wide range of real Web content that people browse. This may turn up crashes or compatibility problems.
So the way to help is to use nightly builds with the HTML5 parser for browsing as usual. If you see no difference, things are going well! If you see a page misbehaving—or, worse, crashing—with the HTML5 parser turned on but not with it turned off, please report the problem.
Reporting Bugs
Please file bugs in the “Core” product under “HTML: Parser” component with “[HTML5] ” at the start of the summary.
Known Problems
First and foremost, please refer to the list of known bugs.
However, I’d like to highlight a particular issue: Support for
comments ending with --!>
is in the spec, but the
patch
hasn’t landed, yet. Support for similar endings of
pseudo-comment escapes within script
element content is
not in
the spec yet. The practical effect is that the rest of the page
may end up being swallowed up inside a comment or a script
element.
Another issue is that the new parser doesn’t yet inhibit
document.write()
in places where it shouldn’t be
allowed per spec but where the old parser allowed it.
Is There Anything New?
So what’s fun if success is that you notice no change? There are important technical things under the hood—like TCP packet boundaries not affecting the parse result and there never being unnotified nodes in the tree when the event loop spins—but you aren’t supposed to notice.
However, there is a major new visible feature, too. With the HTML5
parser, you can use SVG and MathML in text/html
pages.
This means that you can:
Use SVG graphics inline without having to change your HTML content to work with XML parsing and without having to develop an alternative page for IE.
Use properly laid out math without having to change your HTML content to work with XML parsing.
And yes, you can even put SVG inside MathML <annotation-xml>
or MathML inside <foreignObject>
. The mixing
you’ve seen in XML is now supported in HTML, too.
If you aren’t concerned with taking the steps to make things
degrade nicely in browsers that don’t support SVG and MathML in
HTML, you can simply copy and paste XML output from your favorite SVG
or MathML editor into your HTML source as long as the editor doesn’t
use namespace prefixes for elements and uses the prefix xlink
for XLink attributes.
If you don’t use the XML empty element syntax and you put you
SVG text nodes in CDATA sections, the page will degrade gracefully in
older HTML browser so that the image simply disappears but the rest
of the page is intact. You can even put a fallback bitmap as <img>
inside <desc>
. Unfortunately, there isn’t a
similar technique for MathML, though if you want to develop one, I
suggest experimenting with the <annotation>
as
your <desc>
-like container.
There are known issues with matching camelCase names with
Selectors
or getElementByTagName
,
though.
Hello,
I have very little knowledge of Gecko internals, but i was wondering: is this HTML 5 parser developed in order to replace the legacy HTML parser in Gecko? Is it somewhat planned for Firefox 4 or maybe Firefox 4.next?
The advantages of this new parser look promising. I guess it would be a better technical foundation for further improvements. At the same time, this could mean a pretty harsh transition for a future version of Firefox, if say Firefox 4 “breaks” (arguably badly coded) websites that used to work in previous versions.
Sorry if this is discussed elsewhere.
Florent V, yes, this will eventually replace the existing parser in future Firefox builds. I don’t know whether it will make it into Firefox 4, though. There’s a chance it might, depending on Mozilla’s roadmap.
Hm…it’s not working for me. I’m running the 20090708 Shiretoko nightly on Linux. There wasn’t an html5.enable property in my about:config, so I created it as a boolean value and set it to true.
Is there anything else I need to do to get this to work?
Henri,
Maybe you could add to your post that people can create a new profile specifically for testing Firefox nightlies.
Justin, Shiretoko is the codename for the Firefox 3.5 branch. You should use the latest trunk build, codenamed Minefield.
Lachlan: Oh, I see! Thanks for clearing that up for me.
Florent V., if enabling the HTML5 parser breaks websites that used to work, then it’s either a bug in Firefox (if it doesn’t follow the spec) or in the HTML5 spec itself. And, I assume, either way it will be fixed.
Hi there,
Just wanted to mention that now there are two versions, one is branch which in-time would become Firefox 3.6 and the other Trunk which in time would become Firefox 4.0 You can get them at http://forums.mozillazine.org/viewforum.php?f=23