rolisz's site

PHP benchmarking

While working on my framework, I got to a point when I had to store quite a lot of data during the execution of the script. My first thought was to use an as­so­cia­tive array. But then I thought maybe objects are faster/use less memory.  So I decided to test this. I used XDebug to get in­for­ma­tion about memory usage and I am running WampServer x64 on Windows 7. I quickly whipped up a script to fill an array with 1000 keys and arbitrary values and another one to create an object and create 1000 properties for it.

<?php<br ?> //Initial memory
$memory1 = xdebug_memory_usage( );

$data = array();
for ($i=0; $i\<100; $i++) {
for ($j =0; $j\<=1000; $j++) {
$data[$i][md5($j)] = microtime();
}
}
$array = xdebug_memory_usage() -$memory1;
$time1 = microtime(TRUE);
for ($i=0; $i\<100; $i++) {
for ($j =0; $j\<=1000; $j++) {
$var = md5(rand(0,1000));
$var = $data[$i][$var];
}
}
$time2 = microtime(TRUE) - $time1;
echo $array.PHP_EOL;
echo $time2.PHP_EOL;
?\>

And for objects it's very similar:

<?php<br ?> $data= array();
$memory2 = xdebug_memory_usage();
for ($i=0; $i\<100; $i++) {
$data[$i] = new stdClass;
for ($j =0; $j\<=1000; $j++) {
$prop = md5($j);
$data[$i]-\>$prop = microtime();
}
}
$object = xdebug_memory_usage() - $memory2;
$time3 = microtime(TRUE);
for ($i=0; $i\<100; $i++) {
for ($j =0; $j\<=1000; $j++) {
$var = md5(rand(0,1000));
$var = $data[$i]-\>$var;
}
}
$time4 = microtime(TRUE) - $time3;
echo $object.PHP_EOL;
echo $time4.PHP_EOL;
?\>

In this tests I create 10 arrays (then objects) and give them 1000 values that are the current time (in string format) and 16 byte key (an MD5 hash). I measured the memory before and after, and the amount of memory used by the array or object is the difference. Then I do another loop, and copy a value from the array each time. This is to test the read per­for­mance of arrays and objects. The two values are printed on two different lines. However, one value is useless by itself. Maybe my computer had an extra load for the duration of a test (running a background AV check), maybe the test hit a rare bottleneck in the memory allocation etc. So to do a proper benchmark, you have to run each test multiple times and then take the average and calculate the standard deviation. To run the tests multiple times I decided to use cURL to remotely load the URL's and parse the results into an array.

<?php
if (!isset($_GET['q'])) {
echo 'You must give a URL to test';
die();
}
$url = $_GET['q'];
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$data = array();
$nr = isset($_GET['nr'])?$_GET['nr']:10;
for ($i=0; $i\<10; $i++) {
$val = curl_exec($ch);
$val = explode(PHP_EOL,trim($val));
foreach ($val as $key=\>$el) {
$data[$key][] = $el;
}
}

// close cURL resource, and free up system resources
curl_close($ch);

foreach ($data as $key=\>$values) {
$key++;
echo "Param {$key}:
";
echo 'Maximum is: '.max($values).'
';
echo 'Minimum is: '.min($values).'
';
echo 'Arithmetic mean is: '.arithmetic_mean($values).'
';
echo 'Median is: '.median($values).'
';
echo 'Population standard deviation is:
'.standard_deviation($values).'
';
echo 'Sample standard deviation is: '.sd($values).'
';
echo '
';
}

function arithmetic_mean($a) {
return array_sum($a)/count($a);
}

function median($a) {
sort($a,SORT_NUMERIC);
return (count($a) % 2) ?
$a[floor(count($a)/2)] :
($a[floor(count($a)/2)] + $a[floor(count($a)/2) - 1]) / 2;
}

function standard_deviation($aValues, $bSample = false)
{
$fMean = array_sum($aValues) / count($aValues);
$fVariance = 0.0;
foreach ($aValues as $i)
{
$fVariance += pow($i - $fMean, 2);
}
$fVariance /= ( $bSample ? count($aValues) - 1 : count($aValues)
);
return (float) sqrt($fVariance);
}

function sd_square($x, $mean) {
return pow($x - $mean,2);
}

function sd($array) {
return sqrt(array_sum(array_map("sd_square", $array,
array_fill(0,count($array), (array_sum($array) / count($array)) ) )
) / (count($array)-1) );
}
?\>

Don't ask me about the two functions two calculate the standard deviation. I found them in the PHP manual (because the Stats extension has no Windows binaries X( ). This script loads the URL passed in the q GET parameter and loads it several times (10 times by default, or the value of nr GET parameter if it exists). Then, for each value outputed on a new line, it calculates the average, the maximum value, the minimum value and two types of standard deviations. The results I got for 30 runs is:

As you can see, there is virtually no difference between the memory usage (0.01601% difference). Reading from the array seems to be slightly faster (4.623% difference). After this I ran another series of tests, to see which method is faster for looping through as­so­cia­tive arrays: foreach, while, or a simple for loop. And I was quite shocked. I created a single array, similarly to above:

$data = array();
for ($i=0; $i\<100000; $i++) {
$data[md5($i)] = microtime();
}

And then I did the following three tests: Foreach:

foreach ($data as $key\>$value) {
$data[$key].='a';
}

While:

while (list($key) = each ($data)) {
$data[$key].='a';
}

For:

$key = array_keys($data);
$size = sizeOf($key);
for ($i=0; $i\<$size; $i++) {
$data[$key[$i]] .= "a";
}

(I left the timing bits out. They're the same as above) The results are in­ter­est­ing:

While is the slowest, by faaaar, and for is the fastest than foreach with 38%. Quite a difference. I will continue to do some benchmarks so that the rolisz framework can be one of the fastest ones around :D