Monday, May 10th, 2010
I've started a page on the wiki to document the rationale for the decisions made about the HTML specification.
There are two goals for this document:
- Explain why things are the way they are
- Explain the difference between multiple similar elements by providing example usages.
One person can not possibly write the entire thing so I hope that this becomes a group process where anyone interested can contribute so go sign up, log in, and edit.
Posted in Syntax, WHATWG | 5 Comments »
Wednesday, July 8th, 2009
The HTML5 parsing algorithm is meant to demystify HTML parsing and
make it uniform across implementations in a backwards-compatible way.
The algorithm has had “in the lab” testing, but so far it hasn’t
been tested inside a browser by a large number of people. You
can help change that now!
A while ago, an implementation of the HTML5 parsing algorithm
landed on mozilla-central
preffed off. Anyone who is testing Firefox nightly builds can now opt
to turn on the HTML5 parser and test it.
How to Participate?
First, this isn’t release-quality software. Testing the HTML5
parser carries all the same risks as testing a nightly build in
general, and then some. It may crash, it may corrupt your Firefox
profile, etc. If you aren’t comfortable with taking the risks
associated with running nighly builds, you shouldn’t participate.
If you are still comfortable with testing, download a trunk
nightly
build, run it, navigate to about:config
and flip the
preference named html5.enable
to true
. This
makes Gecko use the HTML5 parser when loading pages into the content
area and when setting innerHTML
. The HTML5 parser is not
used for HTML embedded in feeds, Netscape bookmark import, View
Source, etc., yet.
The html5.enable
preference doesn’t require a
restart to take effect. It takes effect the next time you load a
page.
What to Test?
The main thing is getting the HTML5 parser exposed to a wide range
of real Web content that people browse. This may turn up crashes or
compatibility problems.
So the way to help is to use nightly builds with the HTML5 parser
for browsing as usual. If you see no difference, things are going
well! If you see a page misbehaving—or, worse, crashing—with the
HTML5 parser turned on but not with it turned off, please report the
problem.
Reporting Bugs
Please file bugs in the
“Core” product under “HTML: Parser” component with “[HTML5]
” at the start of the summary.
Known Problems
First and foremost, please refer to the list
of known bugs.
However, I’d like to highlight a particular issue: Support for
comments ending with --!>
is in the spec, but the
patch
hasn’t landed, yet. Support for similar endings of
pseudo-comment escapes within script
element content is
not in
the spec yet. The practical effect is that the rest of the page
may end up being swallowed up inside a comment or a script
element.
Another issue is that the new parser doesn’t yet inhibit
document.write()
in places where it shouldn’t be
allowed per spec but where the old parser allowed it.
Is There Anything New?
So what’s fun if success is that you notice no change? There are
important technical things under the hood—like TCP packet
boundaries not affecting the parse result and there never being
unnotified nodes in the tree when the event loop spins—but you
aren’t supposed to notice.
However, there is a major new visible feature, too. With the HTML5
parser, you can use SVG and MathML in text/html
pages.
This means that you can:
And yes, you can even put SVG inside MathML <annotation-xml>
or MathML inside <foreignObject>
. The mixing
you’ve seen in XML is now supported in HTML, too.
If you aren’t concerned with taking the steps to make things
degrade nicely in browsers that don’t support SVG and MathML in
HTML, you can simply copy and paste XML output from your favorite SVG
or MathML editor into your HTML source as long as the editor doesn’t
use namespace prefixes for elements and uses the prefix xlink
for XLink attributes.
If you don’t use the XML empty element syntax and you put you
SVG text nodes in CDATA sections, the page will degrade gracefully in
older HTML browser so that the image simply disappears but the rest
of the page is intact. You can even put a fallback bitmap as <img>
inside <desc>
. Unfortunately, there isn’t a
similar technique for MathML, though if you want to develop one, I
suggest experimenting with the <annotation>
as
your <desc>
-like container.
There are known issues with matching camelCase names with
Selectors
or getElementByTagName
,
though.
Posted in Browsers, Processing Model, Syntax | 8 Comments »
Monday, May 25th, 2009
Version 1.2.1 of the Validator.nu HTML Parser is now available. It fixes an incompatibility with the DOM implementation of the latest Xerces.
Posted in DOM, Processing Model, Syntax | Comments Off on Validator.nu HTML Parser 1.2.1
Friday, March 27th, 2009
I put together a new release of the Validator.nu HTML Parser. This is a highly recommended update for everyone who is using a previous version the parser in an application.
- Fixed an issue where under rare circumstances attribute values were leaking into element content.
- Fixed a bug where
isindex
processing added attributes to all elements that were supposed to have no attributes.
- Implemented spec changes. (Too numerous to enumerate, but, as a highlight, framesets parse much better now.)
- Moved to WebKit-style foster parenting.
- Changed the API for tree builder subclasses again due to new constraints. If you have previously written your own tree builder subclass, you need to change it.
- Fixed the bundled XML serializer.
- Made it possible to generate a C++ version that does not leak memory from the Java source.
- Removed the C++ translator from the release. (Get it from SVN.)
Posted in Processing Model, Syntax | Comments Off on Validator.nu HTML Parser 1.2.0
Wednesday, January 7th, 2009
Internet Explorer poses a small challenge when it comes to making use of the new elements introduced in HTML5. Among others, these include elements like section
, article
, header
and footer
. The problem is that due to the way parsing works in IE, these elements are not recognised properly and result in an anomalous DOM representation.
To illustrate, consider this simple document fragment:
<body>
<section>
<p>This is an example</p>
</section>
</body>
Strangely, IE 6, 7 and 8 all fail to parse the section
element properly and the resulting DOM looks like this.
BODY
SECTION
P
#text
: This is an example
/SECTION
Notice how IE actually creates 2 empty elements. One named SECTION
and the other named /SECTION
. Yes, it really is parsing the end tag as a start tag for an unknown empty element.
There is a handy workaround available to address this problem, which was first revealed in a comment by Sjoerd Visscher.
The basic concept is that by using document.createElement(tagName)
to create each of the unknown elements, the parser in IE then recognises those elements and parses them in a more reasonable and useful way. e.g. By using the following script:
document.createElement("section");
The resulting DOM for the fragment given above looks like this:
BODY
section
P
#text
: This is an example
This same technique works for all unknown elements in IE 6, 7 and 8. Note that there is a known bug that prevented this from working in IE 8 beta 2, but this has since been resolved in the latest non-public technical preview.
For convenience, Remy Sharp has written and published a simple script that provides this enhancement for all new elements in the current draft of HTML5, which you can download and use.
This script is not needed for other browsers. Opera 9, Firefox 3 and Safari 3 all parse unknown elements in a more reasonable way by default. Note, however, that Firefox 2 does suffer from some related problems, for which there is unfortunately no known solution; but it is hoped that given the faster upgrade cycle for users of Firefox, relatively speaking compared with IE, Firefox 2 won't pose too much of a problem in the future.
Posted in Browsers, DOM, Elements, Events, Syntax | 25 Comments »