mbstring vs iconv

I was wondering today why use mbstring rather than iconv in Dokeos, and honestly I didn't remember exactly why I had chosen mbstring in the past, but finding information about the *differences* between the two. mbstring is a library that deals with multi-byte character strings iconv is also a library that deals with multi-byte character strings However, their respective documentation doesn't mention the other one. Searching a bit more, I found a PPT presentation from Carlos Hoyos on Google saying this:
PHP supports multi byte in two extensions: iconv  and mbstring * iconv uses an external library (supports more encodings but less portable) * mbstring has the library bundled with PHP (less encodings but more portable)
If that's the case (I assume he's right as he was giving that conference at NewYork PHP) then the choice I made a few years ago was the right one. We want maximum portability, so we're going to continue using mbstring until there is a major reason not to. Although asking for more info on the general PHP list (which has traffic of a few thousand mails per months), I only got one personal opinion answer, so I guess the difference is really boiling down to a few minor things (including the licence), making this article almost useless, or at least in the sense of comparing the libraries.

Comments

In Zend Framework and applications depending on it like Magento they prefer iconv instead of the old mbstring. Although I guess there are good reasons why Zend has choosen the "new" iconv lib, mbstring has still a lot more and helpful functions to offer.
Perhaps Zend will rely on the libiconv in feature PHP releases - just think of mb support in PHP6....

Interesting to know. I just sent a mail to php-general's list to see if there's someone there who knows more about actual benchmarks or anything useful in terms of comparison.

Actually, nothing really came out of that post. It seems it's a general lack of knowledge, a lack of interest or a fear of starting a flame wars (although I would not tend to believe it's the latter, considering how easily flame wars are initiated).

The thread archive can be found here:
http://lists-archives.org/php-general/330773-mbstring-vs-iconv-any-exis…
and as you can see, only one person replied, and didn't give much info.

Just for fun this evening I did a little benchmarking program on the various strlen versions (strlen, mb_strlen and iconv_strlen). I was expecting something boring but the results has been quite surprising: on my PC, mb_strlen() is about 4/5 times quicker than iconv_strlen().

I applied strlen line per line on an UTF8 encoded text from Gutenberg (R.U.R., I don't know what it is), and these are a sample of my results:

dummy_strlen 5853 (4485 lines and 0 chars)
strlen 4853 (4485 lines and 150846 chars)
mb_strlen 8574 (4485 lines and 140932 chars)
iconv_strlen 38774 (4485 lines and 140932 chars)

The dummy_strlen() is an empty callback. The times are expressed in uSeconds.

Yannick, if you give me some reference I can send you the program: I fear posting it here will result in an ugly, unreadable comment.

Mmmh, that's great! You can send that to me at yannick dot warnier -at- dokeos dot com. It would probably result in an ugly, unreadable comment indeed, so I'll try to see if I can put it in the post itself nicely (wordpress isn't really *good* at code display, but still... I can fake indentation) or attach it as a file.
Don't forget to mention under which license you are providing this bit of code (I suggest LGPL, should be free enough to do whatever one might want to do).

[...] vs iconv benchmarking Following up on my previous post about the differences between the mbstring and iconv international characters libraries (which [...]

One difference between iconv() and mb_convert_encoding() is that iconv supports flags such as //IGNORE and //TRANSLIT, which can be used to deal with characters present in one character set but not in the other, eg.

$iso = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $utf8_string);

[...] — Support for multibyte character sets. It might be the preferred method for converting character encoding, depending on your needs. (This extension is also already available on our [...]