Wednesday, March 9th, 2011
There is now a new release of the Validator.nu HTML Parser. The new release contains files that were missing from the previous release package by accident. It also contains one tree builder correctness fix and one error reporting improvement.
Posted in Syntax | No Comments »
Thursday, January 13th, 2011
After over a year without proper releases, there is now a new release of the Validator.nu HTML Parser. There have been numerous changes to the HTML5 spec and, consequently, to the parser since the previous release. All users of the parser should update to the latest release in order to run a version that corresponds to the current spec.
Posted in Syntax | No Comments »
Sunday, July 25th, 2010
The WHATWG Wiki portal has a nice section describing HTML vs. XHTML differences, as well as specifics of a polyglot HTML document that also would be able to serve HTML5 document as valid XML document. I'd like to review what it takes to transform an HTML5 polyglot document into a valid XHTML5 document: it appears, finally the 'XHTML5' has become an official name.
The W3C first public working draft of "Polyglot Markup" recommendation describes polyglot HTML document as a document that conforms to both the HTML and XHTML syntax by using a common subset of both the HTML and XHTML and in a nutshell the HTML5 polyglot document is:
- HTML5 doctype/namespace
- XHTML well-formed syntax
Polyglot document could serve as either HTML or XHTML, depending on browser support and MIME type. A polyglot HTML5 code essentially becomes XHTML5 document if it is served with the XML MIME type
application/xhtml+xml .
In a nutshell the XHTML5 document is:
- HTML doctype/namespace: The
<!DOCTYPE html>
definition is optional, but it would be useful in a polyglot document by preventing browser quirks mode.
- XHTML well-formed syntax
- XML MIME type:
application/xhtml+xml.
This MIME declaration is not visible in the source code, but it would appear in the HTTP Content-Type header that could be configured on the server. Of course, the XML MIME type is not yet supported by the current version Internet Explorer though IE can render XHTML documents.
- Default XHTML namespace:
<html xmlns="http://www.w3.org/1999/xhtml">
- Secondary namespace such as SVG, MathML, Xlink, etc. To me this is like a test, if you don’t have a need for these namespaces in your document, then the use of XHTML is overkill in the first place.
Finally, the basic XHTML5 document would look like this:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
<meta charset="UTF-8" />
</head>
<body>
<svg xmlns="http://www.w3.org/2000/svg">
<rect stroke="black" fill="blue" x="45px" y="45px" width="200px" height="100px" stroke-width="2" />
</svg>
</body>
</html>
The XML declaration <?xml version=”1.0” encoding=”UTF-8”?> is not required if the default UTF-8 encoding is used: an XHTML5 validator would not mind if it is omitted. However it is strongly recommended to configure the encoding using server HTTP Content-Type header, otherwise this character encoding could be included in the document as part of a meta tag <meta charset="UTF-8" />. This encoding declaration would be needed for a polyglot document so that it will be treated as UTF-8 if served as either HTML or XHTML.
The Total Validator Tool - Firefox plugin/desktop app has now the user-selectable option for XHTML5-specific validation.
I would say that the main advantage of using XHTML5 would be the ability to extend HTML5 to XML-based technologies such as SVG and MathML. The disadvantage is the lack of Internet Explorer support, more verbose code, and error handling. Unless we need that extensibility, HTML5 is the way to go.
Posted in Syntax, What's Next, WHATWG | 20 Comments »
Monday, May 10th, 2010
I've started a page on the wiki to document the rationale for the decisions made about the HTML specification.
There are two goals for this document:
- Explain why things are the way they are
- Explain the difference between multiple similar elements by providing example usages.
One person can not possibly write the entire thing so I hope that this becomes a group process where anyone interested can contribute so go sign up, log in, and edit.
Posted in Syntax, WHATWG | 5 Comments »
Wednesday, July 8th, 2009
The HTML5 parsing algorithm is meant to demystify HTML parsing and
make it uniform across implementations in a backwards-compatible way.
The algorithm has had “in the lab” testing, but so far it hasn’t
been tested inside a browser by a large number of people. You
can help change that now!
A while ago, an implementation of the HTML5 parsing algorithm
landed on mozilla-central
preffed off. Anyone who is testing Firefox nightly builds can now opt
to turn on the HTML5 parser and test it.
How to Participate?
First, this isn’t release-quality software. Testing the HTML5
parser carries all the same risks as testing a nightly build in
general, and then some. It may crash, it may corrupt your Firefox
profile, etc. If you aren’t comfortable with taking the risks
associated with running nighly builds, you shouldn’t participate.
If you are still comfortable with testing, download a trunk
nightly
build, run it, navigate to about:config and flip the
preference named html5.enable to true. This
makes Gecko use the HTML5 parser when loading pages into the content
area and when setting innerHTML. The HTML5 parser is not
used for HTML embedded in feeds, Netscape bookmark import, View
Source, etc., yet.
The html5.enable preference doesn’t require a
restart to take effect. It takes effect the next time you load a
page.
What to Test?
The main thing is getting the HTML5 parser exposed to a wide range
of real Web content that people browse. This may turn up crashes or
compatibility problems.
So the way to help is to use nightly builds with the HTML5 parser
for browsing as usual. If you see no difference, things are going
well! If you see a page misbehaving—or, worse, crashing—with the
HTML5 parser turned on but not with it turned off, please report the
problem.
Reporting Bugs
Please file bugs in the
“Core” product under “HTML: Parser” component with “[HTML5]
” at the start of the summary.
Known Problems
First and foremost, please refer to the list
of known bugs.
However, I’d like to highlight a particular issue: Support for
comments ending with --!> is in the spec, but the
patch
hasn’t landed, yet. Support for similar endings of
pseudo-comment escapes within script element content is
not in
the spec yet. The practical effect is that the rest of the page
may end up being swallowed up inside a comment or a script
element.
Another issue is that the new parser doesn’t yet inhibit
document.write() in places where it shouldn’t be
allowed per spec but where the old parser allowed it.
Is There Anything New?
So what’s fun if success is that you notice no change? There are
important technical things under the hood—like TCP packet
boundaries not affecting the parse result and there never being
unnotified nodes in the tree when the event loop spins—but you
aren’t supposed to notice.
However, there is a major new visible feature, too. With the HTML5
parser, you can use SVG and MathML in text/html pages.
This means that you can:
And yes, you can even put SVG inside MathML <annotation-xml>
or MathML inside <foreignObject>. The mixing
you’ve seen in XML is now supported in HTML, too.
If you aren’t concerned with taking the steps to make things
degrade nicely in browsers that don’t support SVG and MathML in
HTML, you can simply copy and paste XML output from your favorite SVG
or MathML editor into your HTML source as long as the editor doesn’t
use namespace prefixes for elements and uses the prefix xlink
for XLink attributes.
If you don’t use the XML empty element syntax and you put you
SVG text nodes in CDATA sections, the page will degrade gracefully in
older HTML browser so that the image simply disappears but the rest
of the page is intact. You can even put a fallback bitmap as <img>
inside <desc>. Unfortunately, there isn’t a
similar technique for MathML, though if you want to develop one, I
suggest experimenting with the <annotation> as
your <desc>-like container.
There are known issues with matching camelCase names with
Selectors
or getElementByTagName,
though.
Posted in Browsers, Processing Model, Syntax | 8 Comments »