The WHATWG Blog

Please leave your sense of logic at the door, thanks!

What Makes the Application of HTML 5 Different?

by Sean Fraser in WHATWG

All markup languages have three aspects: theory, application and philosophy.

Most web developers do not concern themselves with the theory of markup languages, e.g., HTML, XHTML and XML. That is for those who write the specifications and UAs, or User Agents. UAs make things work. Most web developers are not interested in how things works as long as it works.

Most web developers, however, are concerned with the practical application of markup languages in websites they construct. Specification requirements are easier to understand.

HTML 5 isn't any different. It has its theory but that is not what this article is about. (The theory of HTML 5 can be saved for a later day.) The following addresses the application of HTML 5 by web developers while an attempt is made to understand those reasons which make HTML 5 different.

Three fundamental considerations are made by web developers. They are:

1. Document Type Declaration

The W3C DTD, or Document Type Definition, specifies either “Standards Mode” or “Quirks Mode” for UAs parsing Cascading Style Sheets (CSS). (“Standards Mode” is the default for XHTML [except when an XML declaration has been included above the DTD which will then trigger “Quirks Mode”].) Web developers who have chosen HTML 4.01 use DTDs which trigger “Standards Mode”. HTML5 specifies a Document Type Declaration, i.e., <!DOCTYPE html>. which triggers “Standards Mode”. (This DocType was not invented by The WHAT WG; it existed previously.) Further, DTDs are unnecessary for elements and attributes. All elements and attributes are recognized by UAs, e.g., browsers, with this DocType.

2. MIME Type

HTML 4.01 is primarily sent as “text/html”. XHTML 1.0 is primarily sent as “text/html”. Web Applications 1.0, 1.4.1 - HTML vs XHTML states that all documents sent as “text/html” are HTML5.

3. Well-Formedness

XHTML introduced the concept of “well-formedness”. (See XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition), Appendix C.) “Well-formedness” was simple. However, these days, “well-formedness” has come to include all of the requirements set forth in Appendix C and Section 4. Differences with HTML 4, too. It is one of the cardinal principles of web standards. “Well-formed” sites define web standards.

Conclusion

Most web developers want their CSS displayed in a “standards-compliant mode”; most web developers will continue sending documents as “text/html”; and, most will not veer from writing “well-formed” code.

It – Then - seems that all one needs to do for using HTML 5 is:

  1. Replace the W3C Document Type Definition with <!DOCTYPE html>.
  2. Continue sending documents as “text/html”.
  3. Do not alter “well-formed” source code.

Nevertheless, all website pages remain (X)HTML but with a different DocType.

So, theory aside, the application of HTML 5 isn’t any different from common, existing practices practiced by authors who use web standards.

It’s philosophically different. That's all.

16 Responses to “What Makes the Application of HTML 5 Different?”

  1. Hi, Sean.

    A couple of points that conflict with my (maybe flawed?) understanding…

    Quirks Mode will always be used where a proper DOCTYPE is not provided. Antique UAs will not know what to do with “” and resort to Quirks Mode. IE6 renders in Quirks when faced with that DOCTYPE.

    Also, to the best of my knowledge, the XML declaration does not in itself trigger Quirks Mode. The XML declaration triggers Quirks in IE6 because IE does not have any knowledge of how to handle it and expects the DOCTYPE declaration in its place, not following it. XML declarations consumed by UAs with knowledge of it render Standards Mode (assuming a DOCTYPE is specified as well).

    As someone who is examining this from a web developer’s perspective, I have particular concerns about HTML5 support in all currently popular browsers; so the issue of what will and will not trigger the Quirks Mode bugbear is very important to me and I want to understand the scenario as thoroughly as possible before considering whether to adopt HTML5 for my own web applications.

  2. “XHTML 1.0 is primarily sent as “text/html”.

    That may be true, but I think it would be A Good Thing to educate people that this is the wrong thing to do.

  3. Trevor: The HTML 5 DocType <!DOCTYPE html> triggers standards-compliant mode because it is not recognized. Please read the table in
    !DOCTYPE and note that an “Unrecognized DOCTYPE”, i.e., <!DOCTYPE html>, is “On” (for standards-compliant mode).

    An XML Declaration does trigger “Quirks Mode” in IE and unless the document is sent as XML, e.g., application/xhtml+xml, in the Content Type there isn’t any benefit.

    Web developers can use <!DOCTYPE html> this afternoon. This blog uses it, I’ve used on my home page and Sam Ruby’s Intertwingly uses it. [There may be others but I haven’t found them.] There isn’t any difference in UA rendering: it’s standards-complaint mode. I tested it in IE6, IE7, Safari, Mac/Firefox, PC/Firefox, Mac/Opera and PC/Opera. Works fine.

    There are two requirements in the present version HTML5 that need to be met: embedded content and significant text. The (X)HTML Conformance Checker helps. And, there’s one caveat. The W3C HTML Validation service has difficulties with <!DOCTYPE html>.

    I hope that answered your concerns.

    Julian: I agree that it is wrong but it has become a religious war, hasn’t it. HTML 5 removes any need for discussion. Web Applications 1.0, 1.4.1. HTML vs XHTML has this sentence:

    “If a document is transmitted with the MIME type text/html, then it will be processed as an ‘HTML5’ document by Web browsers.”

    HTML 4.01 and XHTML 1.0 are identical.

  4. The mime type issue doesn’t make as much sense to me as I’d like it to.

    Last I knew, HTML5 allows authors to write it in XML syntax. If that’s still true*, then it should be allowed to be served with an XML mime type. If not, then what’s the point of allowing it to be authored in the XML syntax? XML parsers aren’t required to process anything that comes in as text/html. So an XML parser will likely ignore all HTML5 documents even when they’re perfectly suitable for an XML parser.

    * – i haven’t checked the draft in the past couple weeks so I don’t know if this point has changed.

  5. Devon: That Web Applications 1.0 HTML vs XHTML section has this:

    “The second concrete syntax uses XML, and is known as “XHTML5”. When a document is transmitted with an XML MIME type, such as application/xhtml+xml, then it is processed by an XML processor by Web browsers, and treated as an “XHTML5” document. Generally speaking, authors are discouraged from trying to use XML on the Web, because XML has much stricter syntax rules than the “HTML5″ variant described above, and is relatively newer and therefore less mature.”

    So, if one writes XML syntax, e.g., XHTML 2, and serves it with the MIME type application/xhtml+xml, it is XHTML 5.

    It still remains which MIME type dictates User Agent parsing and rendering.

  6. Okay, I’ve recalled where I saw the concise nugget about what will and will not trigger quirks mode. From Wikipedia:

    Most often, browsers determine which rendering mode to use based on the presence of a Document Type Declaration in the page; if a full DOCTYPE is present the browser will use standards mode, and if it is absent the browser will use quirks mode. For example, a web page which began with the following DOCTYPE would trigger standards mode:

    While this DOCTYPE (which does not contain either the version of HTML in use, or the URL of an HTML Document Type Definition) would trigger quirks mode:

    Additionally, a web page which does not include a DOCTYPE at all will render in quirks mode.

    However, clicking through to a related article by our Stylin’ Savior, Eric Meyer, it appears that IE6 will indeed give standards mode a go if there’s an unrecognized DOCTYPE in play, matching the MSDN info you provided.

    That is fantastic news, as the threat of having to consider quirks mode again was really a deal-breaker. Thanks for sorting it out with me.

  7. Trevor: Wikipedia’s correct but not user-friendly written. Actually, it seems that nearly Everyone uses DTD when they mean DocType.

    A DocType Declaration may include a definition; and, it may include its appropriate URL.

    Example:

    !DOCTYPE HTML PUBLIC = Document Type Declaration
    "-//W3C//DTD HTML 4.0 Transitional//EN" = Document Type Definition
    "http://www.w3.org/TR/html4/loose.dtd" = Document (Source) URL

    If there’s anything else you would like to specifically know about HTML 5, The WHATWG Forums would be an good resource.

  8. I see what you mean about the loose wording, however the facts now seem muddy again.

    Going by the earlier MSDN reference (second entry in the table), omitting any version information (such as “-//W3C//DTD HTML 4.01//EN”) will result in quirks mode.

    So then, is “!DOCTYPE html” considered an unrecognized DOCTYPE, and not a version-less one? Why?

    (Thanks; I created a forum account earlier this week, and I’ve been reading the mailing list as well.)

  9. Hmm… I’ll try to clear up some things that don’t seem quite right. 🙂

    The W3C DTD, or Document Type Definition, specifies either “Standards Mode” or “Quirks Mode” for UAs parsing Cascading Style Sheets (CSS).

    DTDs are for completely other things (validation, inclusion and infoset augmentation). That browsers trigger different rendering modes depending on what happens to be the doctype declaration is mostly a coincidence (they could equally well have sniffed for something else). If it wasn’t for browsers sniffing for doctypes then HTML5 wouldn’t have one at all. (Indeed, browsers don’t sniff for doctypes in XML and hence XHTML5 doesn’t require a doctype.)

    Web Applications 1.0, 1.4.1 – HTML vs XHTML states that all documents sent as “text/html” are HTML5.

    That section is non-normative. It also just says that browsers will treat any text/html as HTML5, not that any text/html is HTML5.

    XHTML introduced the concept of “well-formedness”.

    No, XML introduced the concept. The XHTML 1.0 spec is misleading, but then again Appendix C is also non-normative.

    It – Then – seems that all one needs to do for using HTML 5 is: […]

    Doing this is completely pointless. All you’re doing is changing one doctype that triggers standards mode to another doctype that also triggers standards mode, which is exactly equivalent to switching between the HTML4 and XHTML1 doctypes — we should know by now that that doesn’t bring any benefits and the same applies here.

    Web developers can use <!DOCTYPE html> this afternoon.

    I want to emphasize that there’s nothing special with this doctype. It’s just a short one that triggers standards mode in browsers. (I just use it because it’s less to type; I even used it before it was in the spec.)

    There are two requirements in the present version HTML5 that need to be met: embedded content and significant text.

    Actually there are lots more requirements in HTML5. Some are not even machine-checkable (so won’t be caught by a conformance checker). Also, “embedded content” is not a requirement, it’s just a fancy way of saying “images” (or, more specifically, it’s a group of elements — just like “block-level elements” is).

    So, if one writes XML syntax, e.g., XHTML 2, and serves it with the MIME type application/xhtml+xml, it is XHTML 5.

    No. An XML MIME type just says that it’s XML. Then you look at namespaces to find out which elements are XHTML, XHTML2, SVG, etc.

  10. Oh, ok. So the point of the part about mime types in this post is actually that even if one uses an XML syntax and serves it as text/html it is HTML5 and not XHTML5. I get it now. I was confused.

  11. Trevor: Yes, a <!DOCTYPE html> declaration is considered an unrecognized DOCTYPE (just as <!DOCTYPE bramble> would be). And, No, it is not version-less. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML//EN"> would be a version-less example because “DTD HTML” does not include version, e.g., 4.01. Or, simplified, think of version-less as HTML 1. [Strict mode was not invented until HTML 2.]

  12. <!DOCTYPE bramble> triggers the quirks mode in Gecko. I did not even bother testing with other browsers.

    As far as doctype sniffing in Gecko and WebKit go (cannot verify IE and Opera due to source not being open), public ids are treated as opaque strings. Anything that might appear to be a version number to a human is just part of the string as far as sniffing goes. FWIW, as far as SGML goes, they are supposed to be opaque strings.

    The concept of standards mode was invented after HTML 4.01.

    P.S. I agree with what Simon Pieters said. It is pointless to slap a new doctype on old-style content. It’s like the Appendix C: if something “new” works in legacy browsers, it isn’t really new. Something new and better has been achieved only when browsers start supporting the new features of HTML5.

    Let’s not be satisfied with celebrating the emperor’s new clothes (as happened with the infamous Appendix C) but instead aim to achieve true improvements over what browsers already support.

  13. Henri: Thanks for the clarification. [I misread my notes about standards-mode.]

    I don’t understand your comment,

    It is pointless to slap a new doctype on old-style content.

    Why is it pointless? The WHATWG Blog has a new DocType on old-style content?

  14. Maybe I am missing something, why include DTDs at all?

    DTDs are so last year, it’s all RelaxNG these days baby.

  15. Why is it pointless? The WHATWG Blog has a new DocType on old-style content?

    It is pointless in the sense that nothing new happens in browsers compared to other doctypes that trigger the standards mode.

    This blog does actually use new form input types from HTML5. And they already work in Opera 9. Moreover, this blog serves as a dogfood testbench for gaining experience in what it takes to convert WordPress to HTML5.

    A couple of years from now browsers probably support more features of HTML5. At that point, it is not pointless to use the HTML5 doctype routinely whether or not a particular document uses markup features that were introduced in HTML5. On the contrary, it would then be pointless to put energy into deciding which doctype to use. For some test cases I write, I use the HTML5 doctype today, because I want the standards mode and the HTML5 doctype is the one that I can type off the top of my head without copying and pasting it from somewhere. (So that’s an example of a useful use case.)

    However, taking existing content and slapping a new doctype onto it is not a useful activity. It doesn’t lead to any new technical effect in browsers. It just makes view source more fashionable.

    Of course, if you want to use the HTML5 doctype to make a political statement about DTDs, that’s cool. 🙂

    DTDs are so last year, it’s all RelaxNG these days baby.

    Indeed.