Following up on my previous post about the differences between the mbstring and iconv international characters libraries (which resulted in a tentative conclusion that nobody knew anything about those differences), and particularly the comments by Nicola, we have combined forces (mostly efforts from Nicola, actually) to provide you with a little benchmarking, if that can help you decide.
Nicola wrote the following script (which he gladly releases as public domain license) to test the efficiency of both the mbstring and iconv extensions for PHP5:
<?php ini_set('iconv.internal_encoding', 'UTF-8'); ini_set('mbstring.internal_encoding', 'UTF-8'); function dummy_strlen($str) { } function scan_file($file, $strlen) { fseek($file, 0); $chars = 0; $lines = 0; $start_time = microtime(true); while (!feof($file)) { $chars += $strlen(fgets($file)); ++$lines; } // Calculating elapsed time $sec = microtime(true) - $start_time; $usec = (int) ($sec * 1000000); // Formatting $strlen = str_pad($strlen, 20); $usec = str_pad($usec, 10); echo "$strlen$usec($lines lines and $chars chars)n"; } // Text: 13083-utf8.txt // From: http://www.gutenberg.org/files/13083/13083-utf8.txt $file = fopen('13083-utf8.txt', 'r'); scan_file($file, 'dummy_strlen'); scan_file($file, 'strlen'); scan_file($file, 'mb_strlen'); scan_file($file, 'iconv_strlen'); fclose($file); ?>Although we were expecting boring results, they are actually quite interesting. The following are results for the script above (always prefixed by the system details)
Linux amd64 2.6.26-ntd #1 PREEMPT Mon Jul 28 18:54:30 CEST 2008 x86_64 x86_64 x86_64 GNU/Linux PHP 5.2.4 (cli) iconv library version => 2.5 Multibyte regex (oniguruma) version => 4.4.4 dummy_strlen 5853 (4485 lines and 0 chars) strlen 4853 (4485 lines and 150846 chars) mb_strlen 8574 (4485 lines and 140932 chars) iconv_strlen 38774 (4485 lines and 140932 chars) ------------ PHP 5.2.4 (CLI version) iconv 2.7 for glibc mbstring regex oniguruma 4.4.4 (string engine libmbfl) on an Apple Mac Book Pro (II), at 2.1GHz dummy_strlen 15965 (4485 lines and 0 chars) strlen 12470 (4485 lines and 150846 chars) mb_strlen 34167 (4485 lines and 140932 chars) iconv_strlen 48865 (4485 lines and 140932 chars) Second execution (caching might help): dummy_strlen 15692 (4485 lines and 0 chars) strlen 12532 (4485 lines and 150846 chars) mb_strlen 14727 (4485 lines and 140932 chars) iconv_strlen 40086 (4485 lines and 140932 chars) Tenth execution (stabilised): dummy_strlen 15661 (4485 lines and 0 chars) strlen 8369 (4485 lines and 150846 chars) mb_strlen 6906 (4485 lines and 140932 chars) iconv_strlen 34092 (4485 lines and 140932 chars) ----- PHP 5.2.4 (Apache 2 version) iconv 2.7 for glibc mbstring regex oniguruma 4.4.4 (string engine libmbfl) on an Apple Mac Book Pro (II), at 2.1GHz dummy_strlen 38218 (4485 lines and 0 chars) strlen 35356 (4485 lines and 150846 chars) mb_strlen 35804 (4485 lines and 140932 chars) iconv_strlen 66275 (4485 lines and 140932 chars) And second execution: dummy_strlen 52164 (4485 lines and 0 chars) strlen 34219 (4485 lines and 150846 chars) mb_strlen 35706 (4485 lines and 140932 chars) iconv_strlen 65080 (4485 lines and 140932 chars) Tenth execution: dummy_strlen 59885 (4485 lines and 0 chars) strlen 35341 (4485 lines and 150846 chars) mb_strlen 36179 (4485 lines and 140932 chars) iconv_strlen 66632 (4485 lines and 140932 chars)The evident conclusion from this set of results is that iconv is quite slower than mbstring, although we do not have enough insight to tell if this is due to a compilation difference or anything like that. The idea is that anyway, we have configurations identical to what most people will have, so it is very likely that the results will be the same in many cases, and as such iconv should be avoided for efficiency reasons, unless some of its very specific functions are required. You can download the text file used in this example from here:http://mirror3.mirrors.tds.net/pub/gutenberg.org/1/3/0/8/13083/13083-utf8.txt You will have to copy and paste the script above (there is no way to upload a simple text file here).
Comments
Good article Yannick: let's check if we are the only ones curious about this!
It is worth to mention the empty callback (dummy_strlen) is a lot slower than what I was expecting: using builtin functions has a great advantage.
Also it's logical to get some overhead because of the more complex encoding, but mb_strlen() is so quick that the difference respect plain strlen() is negligible.
[...] on his blog Yannick has done some benchmarking comparing mbstring and iconv in PHP 5.2.4 release.Following up on my previous post about the differences between the mbstring [...]
Good. Now we have a good reason to use mbstring rather than using iconv in cases when they serve the same. I also prefer using mbstring though i didn't know it would be more faster than iconv :).
[...] on his blog Yannick has done some benchmarking comparing mbstring and iconv in PHP 5.2.4 [...]
[...] mbstring vs iconv benchmarking « Dokeos lead developer’s Weblog [...]
THx for comparision!