Thursday, April 2nd, 2009
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
The big news for the week of March 23rd is that SVG can once again be included directly in HTML 5 documents served as text/html:
I've made the following changes to HTML5:
- Uncommented out the XXXSVG bits, reintroducing the ability to have SVG content in text/html.
- Defined
<script>
processing for SVG <script>
in text/html by deferring to the SVG Tiny 1.2 spec and blocking synchronous document.write()
. The alternative to this is to integrate the SVG script processing model with the (pretty complicated) HTML script processing model, which would require changes to SVG and might result in a dependency from SVG to HTML5. Anne would like to do this, but I'm not convinced it's wise, and it certainly would be more complex than what we have now. If we ever want to add async=""
or defer=""
to SVG scripts, then this would probably be a necessary part of that process, though.
- Added a paragraph suggesting: "To enable authors to use SVG tools that only accept SVG in its XML form, interactive HTML user agents are encouraged to provide a way to export any SVG fragment as a namespace-well-formed XML fragment."
- Added a paragraph defining the allowed content model for SVG
<title>
elements in text/html documents.
r2904 (and, briefly, r2910) give all the details of this solution. There are still a number of differences between the text in HTML 5 and the proposal brought by the SVG working group. Some of these are addressed further down in the announcement:
- SVG-in-XML is case-preserving; SVG-in-HTML is not.
- SVG-in-XML requires quoted attribute values; SVG-in-HTML does not.
- When SVG-in-XML uses CDATA blocks, they show up as CDATA nodes in the DOM; when SVG-in-HTML uses CDATA blocks, they show up in the DOM as conventional text nodes. [Clarified based on Henri's feedback]
- The
<svg>
element can not be the root element of a text/html document.
Doug Schepers, who has been the SVG working group's HTML 5 liason, does not like this solution:
To be honest, I think it's not a good use of the SVG WG's time to provide feedback when Ian already has his mind made up, even if I don't believe that he is citing real evidence to back up his decision. What I see is this: one set of implementers and authors (the SVG WG) and the majority of the author and user community (in public comments) asking for some sort of preservation of SVG as an XML format, even if it's looser and error-corrected in practice, and a few implementers (Jonas and Lachy, most notably) disagreeing, and Ian giving preference to the minority opinion. Maybe there is sound technical rationale for doing so, but I haven't been satisfied on that score.
Turning to technical matters, one of the features of web forms in HTML 5 is allowing the attributes for form submission on either the <form>
element (as in HTML 4) or on the submit button (new in HTML 5). Originally, the attributes for submit buttons were named action
, enctype
, method
, novalidate
, and target
, which exactly mirrored the attribute names that could be declared on the <form>
element.
However, in January 2008, Hallvord R. M. Steen (Opera developer) noted that "INPUT action [attribute] breaks web applications frequently. Both GMail and Yahoo mail (the new Oddpost-based version) use input/button.action and were seriously broken by WF2's action attribute."
Following up in November 2008, Ian Hickson replied, "I notice that Opera still supports 'action' and doesn't seem to have problems in GMail; is this still a problem?" to which Hallvord replied, "GMail fixed it on their side a while ago. It is still a problem with Yahoo mail, breaking most buttons in their UI for a browser that supports 'action'. We work around this with a browser.js hack. ('Still a problem' means 'I tested this again a couple of weeks ago and things were still broken without this patch'.)"
Ian replied, "This is certainly problematic. It's unclear what we should do. It's hard to use another attribute name, since the whole point is reusing existing ones... can we trigger this based on quirks mode, maybe? Though I hate to add new quirks." Hallvord did not like that idea: "In my personal opinion, I don't see why re-using attribute names is considered so important if we can find an alternative that feels memorable and usable. How does this look? <input type="submit" formaction="http://www.example.com/">
"
Finally, in March 2009, Ian replied:
That seems reasonable. I've changed "action", "method", "target", "enctype" and "novalidate" attributes on <input> and <button> to start with "form" instead: "formaction", "formmethod", "formtarget", "formenctype" and "formnovalidate".
And thus we have r2890: Rename attributes for form submission to avoid clashes with existing usage.
Other interesting changes this week:
- r2889 adds support for
select.add(e)
and select.options.add(e)
with no second argument.
- r2888 defines how to determine the character encoding of Web Worker scripts. Briefly, it says to look for a Byte Order Mark, then look at the Content-Type HTTP header, then fall back to UTF-8.
- r2898, r2899, r2901, r2914, and r2916 define a locking mechanism to allow thread-safe read/write access to
document.cookie and
.localStorage
. The lock is acquired during page fetching (which sets the cookie based on HTTP headers) and released once the cookie is set. It is also released automatically whenever something modal happens (such as window.alert()
). (I first mentioned the discussion of this issue in episode 27. The problem is that Web Workers allows threaded client-side script execution, which means access to shared storage like document.cookie
needs to be made explicitly thread-safe with some sort of locking mechanism.)
Tune in next week for another exciting episode of "This Week in HTML 5."
Posted in Weekly Review | 3 Comments »
Tuesday, October 14th, 2008
Welcome back to "This Week in HTML 5," where I'll try to summarize the major activity in the ongoing standards process in the WHATWG and W3C HTML Working Group.
Most of the changes in the spec this week revolve around the <textarea>
element.
Shelley Powers pointed out that I haven't mentioned the issue of distributed extensibility yet. (The clearest description of the issue is Sam Ruby's message from last year, which spawned a long discussion.) The short version: XHTML (served with the proper MIME type, application/xhtml+xml
) supports embedding foreign data in arbitrary namespaces, including SVG and MathML. None of these technologies (XHTML, SVG, or MathML) have had much success on the public web. Despite Chris Wilson's assertion that "we cannot definitively say why XHTML has not been successful on the Web," I think it's pretty clear that Internet Explorer's complete lack of support for the application/xhtml+xml
MIME type has something to do with it. (Chris is the project lead on Internet Explorer 8.)
Still, it is true that XHTML does support distributed extensibility, and many people believe that the web would be richer if SVG and MathML (and other as-yet-unknown technologies) could be embedded and rendered in HTML pages. The key phrase here is "as-yet-unknown technologies." In that light, the recent SVG-in-HTML proposal (which I mentioned several weeks ago) is beside the point. The point of distributed extensibility is that it does not require approval from a standards body. "Let a thousand flowers bloom" and all that, where by "flowers," I mean "namespaces." This is an unresolved issue.
Other interesting changes this week:
- r2314 ensures that the
required
attribute only applies to form controls whose value can change.
- r2316 defines the
name
attribute for form controls.
- r2317 defines the
disabled
attribute for form controls.
- r2320 defines all the different ways that a form control can fail to satisfy its constraints. For example, an
<input maxlength=20>
element with a 21-character value.
- r2322 defines exactly how form data should be encoded before being submitted to the server. I've previously mentioned character encoding in this series; this revision marks the first time that an HTML specification has acknowledged the existence of
<input type=hidden name=_charset_>
method of specifying the character encoding of submitted form data.
- r2319 removes support for data templates and repetition templates. These were inventions in the original Web Forms 2 specification, but they were never picked up by any major browser.
Around the web:
Tune in next week for another exciting episode of "This Week in HTML 5."
Posted in Weekly Review | 2 Comments »
Monday, August 25th, 2008
I have released a new version of the Validator.nu HTML Parser (an implementation of the HTML5 parsing algorithm in Java). The new release supports SVG and MathML subtrees, is faster than the old version, fixes bugs, is more portable and supports applications that want to do document.write()
.
The parser comes with a sample app that makes it possible to use XSLT programs written for XHTML5+SVG+MathML with text/html
.
Warning! The internal APIs have changed. Please refer to the Upgrade Guide below.
Change Log
- Made the SAX, DOM and XOM parser entry point constructors default to altering the infoset instead of throwing when the input needs coercing to be an XML 1.0 4th ed. plus Namespaces infoset.
- Isolated Java IO dependent code from the parser core. The parser core now compiles on Google Web Toolkit.
- Refactored the tokenizer to use a
switch
branch per state instead of method per state.
- Made various performance tweaks to the tokenizer.
- Implemented support for MathML and SVG foreign content. (Note that the SVG part is based on spec text that has been commented out from the spec at the request of the SVG WG.)
- Made the parser suspendable after any input character.
- Made it possible for custom
TreeBuilder
subclasses to request parser suspension. (Applications wishing to implement document.write()
should provide their own TreeBuilder
subclass and a document.write()
-aware replacement of the Driver
class. Look in the gwt-src/
directory for sample code.)
- Made changes to the parser core to make it more suitable for mechanical translation into other object-oriented programming languages that have C-like control structures but not necessarily a garbage collector (with focus on targeting C++). This work is not complete.
- Made the HTML serializer do the right thing when input represents a conforming XHTML+SVG+MathML tree. (Results may be bad for non-conforming input trees.)
- Developed sample programs for converting between HTML5 and XHTML5 when the input is known to be conforming.
- Provided an XML serializer so that the sample code no longer depends on the Xalan serializer.
- Improved API documentation.
- Fixed bugs in the tokenizer, tree builder and the input stream character encoding decoder.
- Made coercion to an XML infoset work according to the HTML5 spec.
- Added ID uniqueness checking.
- Various other fixes.
Upgrade Guide from 1.0.7 to 1.1.0
In all cases, you need to check that your application does not break when it receives SVG or MathML subtrees.
- If you use the parser through the SAX, DOM or XOM API and do not pass an explicit
XmlViolationPolicy
to the constructor of HtmlParser
, HtmlDocumentBuilder
or HtmlBuilder
:
If you really wanted the old default behavior, you should now pass XmlViolationPolicy.FATAL
to the constructor.
If you did not really want to have fatal errors by default, you do not need to do anything, since ALTER_INFOSET
is now the default.
- If you use the parser through the SAX, DOM or XOM API and do pass an explicit
XmlViolationPolicy
to the constructor of HtmlParser
, HtmlDocumentBuilder
or HtmlBuilder
:
You do not need to change your code to upgrade.
- If you have your own subclass of
TreeBuilder
:
The abstract methods on TreeBuilder
now have additional arguments for passing the namespace URI. You should upgrade your subclass to deal with the namespace URIs. (The URI is always an interned string, so you can use ==
to compare.)
The entry point for passing in a SAX InputSource
has moved from the Tokenizer
class to the Driver
class (in the io
package), so you should change your references from Tokenizer
to Driver
.
- If you have your own implementation of
TokenHandler
:
Please refer to the JavaDocs of TokenHandler
. Also note the new separation of Tokenizer
and Driver
mentioned above.
Posted in Syntax | Comments Off on Validator.nu HTML Parser 1.1.0