Declaring the Character Encoding

February 22nd, 2007 by Lachlan Hunt

HTML requires that authors declare the character encoding of the file either using HTTP headers (when served over HTTP) or metadata in the file. In previous versions of HTML, authors could specify the character encoding using a relatively complex meta element like this:

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

The idea of the http-equiv attribute was that it would act as a substitute for real HTTP headers. However, in practice, that is not entirely true. Only a few headers actually have any effect in browsers. In fact, HTML4 even suggested that servers use this attribute to gather information for HTTP response message headers; but in reality, no known server ever did this.

Although the MIME type is included in the value for the Content-Type header above, it has no effect in browsers. The only useful and practical piece of information in that element is: charset=UTF-8.

In order to simplify the meta element and remove unnecessary markup, HTML5 has changed it slightly. The new way to declare the character encoding in the file will be to use the following:

<meta charset="UTF-8">

Obviously, that is much shorter and easier to remember. Luckily, due to the way encoding detection has been implemented by browsers, it is backwards compatible and believed to be supported by all known browsers.

Along with this, the spec has recently defined how encoding detection must be implemented by browsers and imposed a few additional restrictions for documents to be considered conforming.

  • When serialised, the charset attribute and its value must be contained completely in the first 512 bytes of the file.
  • The attribute value must be serialised without the use of character entity references of any kind. e.g. You cannot use <meta charset=" &#x55;&#x54;&#x46;&#x2D;&#x38;"> to declare UTF-8. This is because the encoding detection algorithm does not decode character references, because it occurs before the actual parsing begins.
  • The character encoding used must be a rough superset of US-ASCII e.g. you can’t use this for EBCDIC encoded files.
  • User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU encodings.

If the encoding is either UTF-8, UTF-16 or UTF-32, then authors can use a BOM at the start of the file to indicate the character encoding.

12 Responses to “Declaring the Character Encoding”

  1. Devon Young Says:

    This seems to complicate things.

    Instead of creating a whole new attribute, why not just require HTML5 comforming UA’s to parse the file as UTF-8 (or whatever charset you want) by default if there’s no encoding found? The only time one would need to use the meta element then, would be if they used a charset different from the default and needed to specify it directly in the file (in case someone saves the file offline) as well as the HTTP header. Then people who don’t know what they’re doing, will get it right more often than not. People who do know what they’re doing, usually get it right more often than not even if it’s a few extra characters in a line to do it.

    Most authors don’t care about character encoding and don’t know anything about it. They shouldn’t need to, unless they’re doing something unusual.

  2. Simon Pieters Says:

    Devon, most HTML pages on the Web that don’t declare their encoding is windows-1252, so requireing browsers to default to utf-8 would break a lot of pages. Windows-1252 is also what most browsers use by default for HTML.

    You can change the default in your browser though; try changing it to utf-8.

  3. jgraham Says:

    Instead of creating a whole new attribute, why not just require HTML5 comforming UA’s to parse the file as UTF-8 (or whatever charset you want) by default if there’s no encoding found

    What do you mean “no encoding found” do you mean “found at the transport level i.e. in the HTTP headers”? That would be incompatible with legacy browsers and hence incompatible with the design of HTML5 (it would also, IMHO, be poor design as the file itself should be able to carry around its own encoding metadata; according to Ruby’s Postulate, encoding metadata in the file is significantly more likely to be correct than that at the transport-layer). I already worry about the 512 character limit causing problems, not to mention the possibility the algorithm leaves for the detected encoding to depend on the connection speed.

  4. Devon Young Says:

    Yes, I meant ISO-8859-1 is the default HTTP encoding. Keep in mind that the default for application/xhtml+xml is UTF-8. Since ISO-8859-1 is a subset of UTF-8, there shouldn’t be a serious issue with it and it will encourage internationalization.

    In addition, it would also minimize any complications between using HTML5 and XHTML5 syntax’s, so it doesn’t become another text/xml vs application/xml type of issue.

    If a file is Windows-1252, which isn’t any Spec’s default that I’m aware of, it should be declared in a meta tag that it is the encoding. But, is there a real need to add a whole new attribute just for one value and one value only? I doubt that. It only complicates the markup language. Not much of a complication, but a little here and a little there…next thing you know, you’ve got a lot of split ends you need to tidy up. An http-equiv attribute makes perfect sense for anything that is, well, a markup equivalent to an HTTP header.

    I already have my browser’s default set to UTF-8, and there’s rarely ever any problem on any webpage. It’s not really an issue.

  5. Sean Fraser Says:

    How will HTML5-compliant browsers handle charset (or, character encoding) mismatches, e.g., server=iso-8859-1 but meta element=UTF-8? Which takes precedence?

  6. Devon Young Says:

    When served by a web server, HTTP header’s would take preference over meta elements. In the absence of either of those, there should be some default or else it’ll be like with HTML 4.01 where the UA itself defines the default and that’s just messy. That’s what I’m thinking would make sense. If someone wants something other than the default, they should define it. And I’m not convinced there should be an extra attribute just to define a character encoding. It doesn’t quite make enough sense. I’d have pointed this out on the mailing list, but my e-mails don’t seem to go through (and I am subscribed)

  7. Blake Winton Says:

    Since ISO-8859-1 is a subset of UTF-8

    That’s not exactly true… ISO-8859-1 is a subset of Unicode, sure, but if I wanted to write out, say, lower-case-thorn lower-case-y-umlaut, in ISO-8859-1, that would be 0xFE 0xFF, which in UTF-8 is the byte-order-mark. In UTF-8, 0xFE 0xFF is represented by 0xDF 0xBE 0xDF 0xBF (I think, assuming my math is correct).

    Now, ASCII, being 7-bit, is a subset of both UTF-8 and ISO-8859-1. Was that what you meant to say?

    Later,
    Blake.

  8. Anne van Kesteren Says:

    Devon Young, you seem to be missing the fact that this attribute is already widely supported (and many sites rely on it!). Besides that it obviously decreases the complexity of authoring although there may be a small learning curve for old timers.

  9. Lachlan Hunt Says:

    Devon, the ISO-8859-1 encoding is not a fully compatible subset of the UTF-8 encoding. Although the ISO-8859-1 repertoire and code positions are the same as the first 256 characters in Unicode, only the US-ASCII subset of both are fully compatible. Characters in the range from 128 to 255 are encoded differently. e.g. A copyright symbol (©, U+00A9) is encoded with a single octet in ISO-8859-1 (0xA9), and 2 octets in UTF-8 (0xC2 0xA9).

    Thus, if your file were saved as ISO-8859-1, but a browser read it as UTF-8, then it would be an error and the browser would render a replacement character: (U+FFFD). Conversely, if your file was saved as UTF-8, but read as ISO-8859-1, the browser would display the 2 characters: ©. (Try it now. Change your browser to read this page as ISO-8859-1 instead of UTF-8 and see how those characters get rendered. But remember to change it back to UTF-8 before posting another comment)

    Windows-1252 is a superset of ISO-8859-1, both of which are supersets of US-ASCII. Both are single octet encodings and encode characters in the same way. Windows-1252 defines extra characters in the range from 128 to 159; whereas in ISO-8859-1, they are C1 Control Characters.

    For more information, see my Guide to Unicode.

    If a file is Windows-1252, which isn’t any Spec’s default that I’m aware of, it should be declared in a meta tag that it is the encoding. But, is there a real need to add a whole new attribute just for one value and one value only?

    Windows-1252 is not the only other encoding used, although it is the most common in western cultures. There’s also ISO-8859-2 to -15, Windows-1250 to -1258, Shift_JIS, GB2312, and many, many more. Just see how many choices your browser offers to manually set the encoding.

  10. Siegfried Says:

    Why invent new attributes? Only slightly longer, but fully backwards compatible would be:

    <meta name=”charset” content=”utf-8″>

  11. lockoom Says:

    Siegfried, your proposal may be backward compatible with HTML4 specification of meta element but it’s not compatible with browsers. And by this I mean browsers won’t understand that this meta is describing character encoding. Contrary, as Lachlan Hunt wrote:

    Luckily, due to the way encoding detection has been implemented by browsers, it [] is backwards compatible and believed to be supported by all known browsers.

  12. j.j. Says:

    Do major (and minor) search engines support this?

purchase tramadol Carisoprodol Prescription order ultram; "hoodia gordonii" Phentermine Prices "vicodin prescription" 50 tramadol Order Carisoprodol Online Tramadol com tramadol hcl 991. cheap ambien Order Valium diet pills adipex cheap ultram; Vicodin For Sale Legal vicodin norco hydrocodone 927. order phentermine Phentermine Pharmacy order carisoprodol

phentermine sales,

Valium No Prescription ultram 50 mg phentermine prescriptions Xanax Xr phentermine without prescription ultram online Hoodia Gordoni ultram online tramadol 50mg, Buy Vicodin Online oxycodone vs hydrocodone 37.5 phentermine Phentermine Com vicadin xanax without prescription Ultram Buy hoodia weight buy lorazepam online

Hydrocodone Vicodin

tramadol prescription ionamin online Vicoden "hydrocodone 10" buy cheap soma Ultram Online phentermine for sale buy tramadol online Hydrocodone Lortab xanex, vicodin 500 Hydrocodone Pain "fastin" Cheapest phentermine com fastin 820. Phentermine Adipex hydrocodone drugs; phentermine on line Tramadol Sale ionamin diet pill buy xanax Buy Carisoprodol Online alprazolam online? rx phentermine; Order Valium Online alprazolam prescription, ionamin prescription Buy Ultram Online phentermine sale Buy valium online without a prescription order valium 315. Lorazepam Prescription ionamin diet pill alprazolam discount; Acomplia Hydrocodone apap hydrocodone canada 723. www ultram com Vicodin Hydrocodone adipex online phentermine on line 37.5 Phentermine "generic adipex" "buy zolpidem" Com Fastin order vicodin online order tramadol? Carisoprodol Soma drug fastin order alprazolam Adipex Pharmacy adipex p Hydrocodone apap hydrocodone canada 723. Hydrocodone buy phentermine online xanax prescription Vicodin Com ionamin diet pill purchase phentermine online Buy Ambien Online adipex pills purchase tramadol Ultram 50 Mg adipex no prescription Adipex adipex 37.5 992. Cheap Ultram buy ambien online adipex pills Xenical Prices online xanax! get phentermine Tramadol Cod Buy ativan online buy cheap xanax 352. buy carisoprodol online Ic Hydrocodone Apap buy soma discount carisoprodol Get Tramadol generic tramadol; fastin prescription! Darvocet Vicodin phentermine price discount valium online Cheap Adipex carisoprodol 350 compare phentermine Soma Prescription com fastin! xenical tablets Cheapest Tramadol Hydrocodone acetaminophen hydrocodone apap 843. ultram com Phentermine Hcl vicodin hp! adipex 37.5 50 Tramadol diazepam prescription order ultram online; Online Hydrocodone lorazepam online discount valium Buy Ativan tramadol rx buy carisoprodol Hydrocodone Cheap hydrocodone "get tramadol" Buy Cheap Tramadol tramadol sale ultram 50 Order Ativan ultram price phentermine on line Online Xanax adipex pill carisoprodol 350mg Overnight Tramadol "vicodin generic" online pharmacy phentermine Online Vicodin hydrocodone medication valium online; Vicodin Lortab cheap soma phentermine on line Buy Ambien order valium online xenical prices Phentermine Prescriptions alprazolam buy cheap adipex! Xenical Price phentermine price adipex com Ativan vicodin tablets, generic tramadol; Meridia Phentermine cheap valium drugs vicodin Alprazolam 2mg ionamin ic hydrocodone apap Order Soma order valium online discount phentermine Buy Cheap Phentermine low cost phentermine adipex 37.5 mg Diazepam 10mg xanax pharmacy vicodin com Ionamin medication tramadol, generic ambien Phentermine Diet Pill xanax no prescription discount soma Soma Pharmacy ultram er! generic hydrocodone, Tramadol Hcl order ultram; overnight tramadol Xenical Pill vicodin for sale order xanax Alprazolam 0.5mg cheap soma Phentermine diet phentermine diet pill 186. Buy Lorazepam Online phentermine 90 Phentermine weight loss phentermine without a prescription 989. Buy Tramadol Online buy lorazepam online "generic adipex" Phentermine On Line ultram tramadol hcl? Buy Soma Online phentermine hcl tramadol hydrochloride Hydrocodone 7.5 xanax prescription online vicodin Adipex Com xanax drug buy valium no prescription Zolpidem ambien online hydrocodone pill Vicodin 500 tramadol sale "www adipex com"
Hydrocodone Apap
tramadol alprazolam online pharmacy; Ultram Prescription ativan lorazepam tramadol sale Diet Adipex phentermine on line vicodin Legal Vicodin compare phentermine adipex sale Hydrocodone Mg tramadol pharmacy

alprazolam generic

Ultram Er

diazepam 5mg ambien generic Hydrocodone Acetaminophen xanax drug Hydrocodone pain hydrocodone pill 870. Hoodia Weight tramadol hcl? order valium online Tramadol Pharmacy ultram com

Legal vicodin norco hydrocodone 927.

Phentermine Capsules "vicodin prescription" vicodin com Hydrocodone For Sale Tramadol com tramadol hcl 991. cheap ambien Phentermine Price diet pills adipex cheap ultram; Cheap Soma Legal vicodin norco hydrocodone 927. order phentermine Ultram order carisoprodol phentermine sales, Adipex Ionamin ultram 50 mg prescription phentermine Diazepam Pharmacy phentermine without prescription ultram online Hydrocodone 500 ultram online tramadol 50mg, Get Phentermine oxycodone vs hydrocodone adipex no prescription Tramadolultram vicadin xanax without prescription Cheap Tramadol hoodia weight buy lorazepam online Fastin Prescription tramadol prescription phentermine 90 Order Tramadol "hydrocodone 10" buy cheap soma Generic Hydrocodone phentermine for sale buy tramadol online Generic Valium xanex, vicodin 500 Diazepam 5mg "fastin" Cheapest phentermine com fastin 820. Buy Alprazolam Online hydrocodone drugs; phentermine on line Adipex Without Prescription ionamin diet pill buy xanax Phentermine Without A Prescription alprazolam online? purchase phentermine
Alprazolam Prescription
alprazolam prescription, hoodia diet pill

Buying Vicodin

Buy valium online without a prescription order valium 315. diet pills phentermine Vicodin Prescription alprazolam discount; Hydrocodone apap hydrocodone canada 723. Hydrocodone Medication www ultram com adipex online Ativan Prescription phentermine on line "generic adipex" Taking Phentermine diazepam 10mg online vicodin Xenical Sales order tramadol? drug fastin Buy Xanax Online order alprazolam adipex p Purchase Tramadol Hydrocodone apap hydrocodone canada 723. buy phentermine online Purchase Hydrocodone xanax prescription ionamin diet pill Phentermine Canada www adipex! adipex diet pill Phentermine 15 purchase tramadol adipex no prescription Adipex Drug Adipex adipex 37.5 992. buy ambien online Xanax Com adipex pills online xanax! Ionamine get phentermine Buy ativan online buy cheap xanax 352. Order Adipex buy carisoprodol online buy soma Hydrocodone Canada generic soma generic tramadol; Buy Zolpidem fastin prescription! phentermine price Ultram Tramadol discount valium online carisoprodol 350 Drugs Vicodin compare phentermine buy phentermine online No Prescription Phentermine xenical tablets Hydrocodone acetaminophen hydrocodone apap 843. Buy Carisoprodol ultram com vicodin hp! Www Adipex adipex com diazepam prescription Carisoprodol 350mg order ultram online; lorazepam online Alprazolam 1mg discount valium tramadol rx Ultram Pharmacy buy carisoprodol hydrocodone Online Pharmacy Phentermine "get tramadol" tramadol sale Tramadol Online ultram 50 ultram price Xanex phentermine on line adipex pill Generic Xanax carisoprodol 350mg vicodin for sale Valium online pharmacy phentermine hydrocodone medication Generic Soma valium online; cheap soma Prescription Adipex Online phentermine on line order valium online Vicodin Medication xenical prices alprazolam buy Hoodia Diet cheap adipex! phentermine price Buy Valium Online Without A Prescription adipex com "vicodin prescription" Hydrocodone Pill generic tramadol; cheap valium Vicodin Without Prescription drugs vicodin ionamin Ativan Com ic hydrocodone apap order valium online Xanax Without Prescription discount phentermine low cost phentermine Alprazolam adipex 37.5 mg xanax pharmacy Adipex No Prescription vicodin on line medication tramadol, Buy Ionamin Online generic ambien xanax no prescription Www Ultram Com discount soma ultram er! Buy Ativan Online generic hydrocodone, order ultram; Xanax Prescription overnight tramadol vicodin for sale Alprazolam Xr order xanax cheap soma Generic Tramadol Phentermine diet phentermine diet pill 186. phentermine 90 Xanax 1mg Phentermine weight loss phentermine without a prescription 989. buy lorazepam online Www Alprazolam "generic adipex" ultram 2mg Xanax tramadol hcl? phentermine hcl Order Ultram tramadol hydrochloride xanax prescription Phentermine 30mg online vicodin xanax drug Hoodia Gordonii buy valium no prescription ambien online Alprazolam Discount hydrocodone pill tramadol online Buy Cheap Xanax "www adipex com" overnight tramadol Buy Hydrocodone Buy ativan online buy cheap xanax 352.

"));