HTML character encoding

This article was first written in July 2004 for the BeezNest technical
website (http://glasnost.beeznest.org/articles/139).
To enable all characters to be displayed correctly in an HTML page, even if you use different languages (english, japanese, russian, …), a good way is to encode everything in unicode, using the UTF-8 character set representation.

Server & client config

In Apache config file httpd.conf, one of the following must be defined: #AddDefaultCharset on AddDefaultCharset off AddDefaultCharset utf-8 More info: Apache AddDefaultCharset directive In your HTML page, define: <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Working with unicode databases

To display the content of a unicode database, no need to decode the data. For example in PHP, no need to use utf8_decode(). If you use forms to update a unicode database, there is no need to encode the POST data. For example in PHP, no need to utf8_encode($_POST['var']).

Working with non-unicode databases

To display the content of a non-unicode database, you need to decode the data before displaying them. For example in PHP, you must use utf8_decode(). If you use forms to update a non-unicode database, you need to encode the POST data prior to send them to the database. For example in PHP, you must use utf8_encode($_POST['var']).

Remark

The best solution to not have to worry about encoding/decoding is to use the same character encoding on the client (HTML page) as on the server (database).

Links

W3C: Q&A: Checking HTTP Headers UTF-8 explained