mbstring vs iconv benchmarking

Following up on my previous post about the differences between the mbstring and iconv international characters libraries (which resulted in a tentative conclusion that nobody knew anything about those differences), and particularly the comments by Nicola, we have combined forces (mostly efforts from Nicola, actually) to provide you with a little benchmarking, if that can help you decide. Nicola wrote the following script (which he gladly releases as public domain license) to test the efficiency of both the mbstring and iconv extensions for PHP5:
<?php ini_set('iconv.internal_encoding', 'UTF-8'); ini_set('mbstring.internal_encoding', 'UTF-8'); function dummy_strlen($str) { } function scan_file($file, $strlen) { fseek($file, 0); $chars = 0; $lines = 0; $start_time = microtime(true); while (!feof($file)) { $chars += $strlen(fgets($file)); ++$lines; } // Calculating elapsed time $sec = microtime(true) - $start_time; $usec = (int) ($sec * 1000000); // Formatting $strlen = str_pad($strlen, 20); $usec = str_pad($usec, 10); echo "$strlen$usec($lines lines and $chars chars)n"; } // Text: 13083-utf8.txt // From: http://www.gutenberg.org/files/13083/13083-utf8.txt $file = fopen('13083-utf8.txt', 'r'); scan_file($file, 'dummy_strlen'); scan_file($file, 'strlen'); scan_file($file, 'mb_strlen'); scan_file($file, 'iconv_strlen'); fclose($file); ?>
Although we were expecting boring results, they are actually quite interesting. The following are results for the script above (always prefixed by the system details)
Linux amd64 2.6.26-ntd #1 PREEMPT Mon Jul 28 18:54:30 CEST 2008 x86_64 x86_64 x86_64 GNU/Linux PHP 5.2.4 (cli) iconv library version => 2.5 Multibyte regex (oniguruma) version => 4.4.4 dummy_strlen 5853 (4485 lines and 0 chars) strlen 4853 (4485 lines and 150846 chars) mb_strlen 8574 (4485 lines and 140932 chars) iconv_strlen 38774 (4485 lines and 140932 chars) ------------ PHP 5.2.4 (CLI version) iconv 2.7 for glibc mbstring regex oniguruma 4.4.4 (string engine libmbfl) on an Apple Mac Book Pro (II), at 2.1GHz dummy_strlen        15965     (4485 lines and 0 chars) strlen              12470     (4485 lines and 150846 chars) mb_strlen           34167     (4485 lines and 140932 chars) iconv_strlen        48865     (4485 lines and 140932 chars) Second execution (caching might help): dummy_strlen        15692     (4485 lines and 0 chars) strlen              12532     (4485 lines and 150846 chars) mb_strlen           14727     (4485 lines and 140932 chars) iconv_strlen        40086     (4485 lines and 140932 chars) Tenth execution (stabilised): dummy_strlen        15661     (4485 lines and 0 chars) strlen              8369      (4485 lines and 150846 chars) mb_strlen           6906      (4485 lines and 140932 chars) iconv_strlen        34092     (4485 lines and 140932 chars) ----- PHP 5.2.4 (Apache 2 version) iconv 2.7 for glibc mbstring regex oniguruma 4.4.4 (string engine libmbfl) on an Apple Mac Book Pro (II), at 2.1GHz dummy_strlen  38218 (4485 lines and 0 chars) strlen        35356 (4485 lines and 150846 chars) mb_strlen     35804 (4485 lines and 140932 chars) iconv_strlen  66275 (4485 lines and 140932 chars) And second execution: dummy_strlen  52164 (4485 lines and 0 chars) strlen        34219 (4485 lines and 150846 chars) mb_strlen     35706 (4485 lines and 140932 chars) iconv_strlen  65080 (4485 lines and 140932 chars) Tenth execution: dummy_strlen  59885 (4485 lines and 0 chars) strlen        35341 (4485 lines and 150846 chars) mb_strlen     36179 (4485 lines and 140932 chars) iconv_strlen  66632 (4485 lines and 140932 chars)
The evident conclusion from this set of results is that iconv is quite slower than mbstring, although we do not have enough insight to tell if this is due to a compilation difference or anything like that. The idea is that anyway, we have configurations identical to what most people will have, so it is very likely that the results will be the same in many cases, and as such iconv should be avoided for efficiency reasons, unless some of its very specific functions are required. You can download the text file used in this example from here:http://mirror3.mirrors.tds.net/pub/gutenberg.org/1/3/0/8/13083/13083-utf8.txt You will have to copy and paste the script above (there is no way to upload a simple text file here).

Comments

Good article Yannick: let's check if we are the only ones curious about this!
It is worth to mention the empty callback (dummy_strlen) is a lot slower than what I was expecting: using builtin functions has a great advantage.
Also it's logical to get some overhead because of the more complex encoding, but mb_strlen() is so quick that the difference respect plain strlen() is negligible.

[...] on his blog Yannick has done some benchmarking comparing mbstring and iconv in PHP 5.2.4 release.Following up on my previous post about the differences between the mbstring [...]

Good. Now we have a good reason to use mbstring rather than using iconv in cases when they serve the same. I also prefer using mbstring though i didn't know it would be more faster than iconv :).

[...] on his blog Yannick has done some benchmarking comparing mbstring and iconv in PHP 5.2.4 [...]