This article was first written in July 2004 for the BeezNest technical
website (http://glasnost.beeznest.org/articles/139).
To enable all characters to be displayed correctly in an HTML page, even if you use different languages (english, japanese, russian, …), a good way is to encode everything in unicode, using the UTF-8 character set representation.
Server & client config
In Apache config file
httpd.conf, one of the following must be defined:
#AddDefaultCharset on
AddDefaultCharset off
AddDefaultCharset utf-8
More info:
Apache AddDefaultCharset directive
In your HTML page, define:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Working with unicode databases
To display the content of a unicode database, no need to decode the data.
For example in PHP, no need to use utf8_decode().
If you use forms to update a unicode database, there is no need to encode the POST data.
For example in PHP, no need to utf8_encode($_POST['var']).
Working with non-unicode databases
To display the content of a non-unicode database, you need to decode the data before displaying them.
For example in PHP, you must use utf8_decode().
If you use forms to update a non-unicode database, you need to encode the POST data prior to send them to the database.
For example in PHP, you must use utf8_encode($_POST['var']).
Remark
The best solution to not have to worry about encoding/decoding is to use the same character encoding on the client (HTML page) as on the server (database).
Links
W3C: Q&A: Checking HTTP Headers
UTF-8 explained