Scatter/Gather thoughts

by Johan Petersson

Why you may want that superfluous meta http-equiv

One of Michael Kaplan's readers point out that his blog lacks a a meta tag specifying the character set to use. There are a lot of "deadly sins" being committed on the web, but this is not one of them. Character encoding should always be specified explicitly for web pages, but doing so through the HTTP Content-Type header is sufficient. Actually, it's a superior way.

The appropriate header field can be set in your web server configuration (e.g. through Apache's AddDefaultCharset or ForceType directives), or on a page-by-page basis in CGI programs and server side scripting languages. For PHP you use the header function:

header('Content-Type: text/html; charset=UTF-8');

In ASP, you set the ContentType and Charset properties of the Response object:

Response.ContentType = "text/html"
Response.Charset     = "UTF-8"

I could give more examples, but your favorite web application platform will have a mechanism for doing this. In the unlikely case that it does not, I strongly advise you to find a new favorite. Failure to set the content type with an explicit charset parameter is one of those deadly sins referred to earlier and a common source of problems.

Needless to say, I take care to set an accurate Content-Type in the HTTP header on all my sites. However, if you take a look at the HTML code for this site, you'll probably see the following:

<meta http-equiv="Content-Type" 
      content="application/xhtml+xml; charset=UTF-8"/>

If you use Internet Explorer you might see something different. That is because Internet Explorer doesn't deal with XHTML properly, and visitors using IE currently gets served a pseudo-XHTML tagsoup as if it was HTML. It's an embarrassing hack that I hope to fix soon, but it's not relevant for the issue being discussed.

Question is, why bother with this meta element if it's not strictly needed? Removing it would save almost 100 bytes per page as well as make my site maintenance marginally easier. The HTTP header field should always be set, but I know of two good reasons for adding the corresponding meta element as well.

One reason is that doing so makes it possible to view a HTML document stored as a local file and be sure to get the correct character encoding. The HTTP header is conceptually part of the web page but not part of the HTML document. If a HTML file not labeled with a character encoding is viewed locally, the browser will be forced to either use a default or guess, with unpredictable results.

The second reason you may want to do this is to work around a somewhat obscure Internet Explorer bug. Under certain circumstances, Internet Explorer will fail to use the character encoding from the HTTP header when showing a page from cache. It's similar to the situation with local HTML files, because IE will then attempt to autodetect the character encoding unless you help it along with a meta element.

When you run into this particular bug, you may find that everything is displayed correctly the first time the page is viewed but that the text is garbled on subsequent visits (until the cached page expires or the browser cache is cleared). I'm not sure if this affects all versions, but I have seen it happen with Internet Explorer 6. The bug could be fixed by the time you read this, but there are an awful lot of old browsers out there and webmasters will generally want to cater for all reasonably recent IE versions.

You'd think that any problems with character sets and encoding affecting Sorting It All Out would have been discovered – especially so for an Internet Explorer bug. Michael works at Microsoft and since he writes about internationalization issues his pages frequently feature exotic characters. Surely someone would have noticed any character encoding problems by now?

Well... you are not likely to experience the Internet Explorer bug on Michel's site because it's only seen for cached pages and neither Community Server nor its predecessor .Text generate cache-friendly pages. But that's a web sin I'll have to deal with another time.

23 September, 2005