PHP benchmarking

While working on my framework, I got to a point when I had to store quite a lot of data during the execution of the script. My first thought was to use an as­so­cia­tive array. But then I thought maybe objects are faster/use less memory.  So I decided to test this. I used XDebug to get in­for­ma­tion about memory usage and I am running WampServer x64 on Windows 7. I quickly whipped up a script to fill an array with 1000 keys and arbitrary values and another one to create an object and create 1000 properties for it.

<?php 
//Initial memory
$memory1 = xdebug_memory_usage( );

$data = array();
for ($i=0; $i\<100; $i++) {
    for ($j =0; $j\<=1000; $j++) {
        $data[$i][md5($j)] = microtime();
    }
}
$array = xdebug_memory_usage() -$memory1;
$time1 = microtime(TRUE);
for ($i=0; $i\<100; $i++) {
    for ($j =0; $j\<=1000; $j++) {
        $var = md5(rand(0,1000));
        $var = $data[$i][$var];
    }
}
$time2 = microtime(TRUE) - $time1;
echo $array.PHP_EOL;
echo $time2.PHP_EOL;
?>

And for objects it's very similar:

<?php 
$data= array();
$memory2 = xdebug_memory_usage();
for ($i=0; $i\<100; $i++) {
    $data[$i] = new stdClass;
    for ($j =0; $j\<=1000; $j++) {
        $prop = md5($j);
        $data[$i]-\>$prop = microtime();
    }
}
$object = xdebug_memory_usage() - $memory2;
$time3 = microtime(TRUE);
for ($i=0; $i\<100; $i++) {
    for ($j =0; $j\<=1000; $j++) {
        $var = md5(rand(0,1000));
        $var = $data[$i]-\>$var;
    }
}
$time4 = microtime(TRUE) - $time3;
echo $object.PHP_EOL;
echo $time4.PHP_EOL;
?>

In this tests I create 10 arrays (then objects) and give them 1000 values that are the current time (in string format) and 16 byte key (an MD5 hash). I measured the memory before and after, and the amount of memory used by the array or object is the difference. Then I do another loop, and copy a value from the array each time. This is to test the read per­for­mance of arrays and objects. The two values are printed on two different lines. However, one value is useless by itself. Maybe my computer had an extra load for the duration of a test (running a background AV check), maybe the test hit a rare bottleneck in the memory allocation etc. So to do a proper benchmark, you have to run each test multiple times and then take the average and calculate the standard deviation. To run the tests multiple times I decided to use cURL to remotely load the URL's and parse the results into an array.

<?php
if (!isset($_GET['q'])) {
echo 'You must give a URL to test';
die();
}
$url = $_GET['q'];
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$data = array();
$nr = isset($_GET['nr'])?$_GET['nr']:10;
for ($i=0; $i\<10; $i++) {
    $val = curl_exec($ch);
    $val = explode(PHP_EOL,trim($val));
    foreach ($val as $key=\>$el) {
        $data[$key][] = $el;
    }
}

// close cURL resource, and free up system resources
curl_close($ch);

foreach ($data as $key=\>$values) {
    $key++;
    echo "Param {$key}: ";
    echo 'Maximum is: '.max($values).' ';
    echo 'Minimum is: '.min($values).' ';
    echo 'Arithmetic mean is: '.arithmetic_mean($values).' ';
    echo 'Median is: '.median($values).' ';
    echo 'Population standard deviation is: '.standard_deviation($values).' ';
    echo 'Sample standard deviation is: '.sd($values).' ';
    echo ' ';
}

function arithmetic_mean($a) {
    return array_sum($a)/count($a);
}

function median($a) {
    sort($a,SORT_NUMERIC);
    return (count($a) % 2) ?
        $a[floor(count($a)/2)] :
        ($a[floor(count($a)/2)] + $a[floor(count($a)/2) - 1]) / 2;
}

function standard_deviation($aValues, $bSample = false)
{
    $fMean = array_sum($aValues) / count($aValues);
    $fVariance = 0.0;
    foreach ($aValues as $i)
    {
        $fVariance += pow($i - $fMean, 2);
    }
    $fVariance /= ( $bSample ? count($aValues) - 1 : count($aValues));
    return (float) sqrt($fVariance);
}

function sd_square($x, $mean) {
    return pow($x - $mean,2);
}

function sd($array) {
    return sqrt(array_sum(array_map("sd_square", $array,
        array_fill(0,count($array), (array_sum($array) / count($array)) ) )
        ) / (count($array)-1) );
}
?>

Don't ask me about the two functions two calculate the standard deviation. I found them in the PHP manual(because the Stats extension has no Windows binaries X( ). This script loads the URL passed in the q GET parameter and loads it several times (10 times by default, or the value of nr GET parameter if it exists). Then, for each value outputed on a new line, it calculates the average, the maximum value, the minimum value and two types of standard deviations. The results I got for 30 runs is:

  • Arrays:
  • Memory usage:     Maximum is: 14838656     Minimum is: 14838656     Arithmetic mean is: 14838656     Median is: 14838656     Population standard deviation is: 0     Sample standard deviation is: 0
  • Time spent reading:     Maximum is: 0.56820487976074     Minimum is: 0.3705358505249     Arithmetic mean is: 0.46606066226959     Median is: 0.46564090251923     Population standard deviation is: 0.094189880474856     Sample standard deviation is: 0.099284851613188
  • Objects:
  • Memory usage:     Maximum is: 14841032     Minimum is: 14841032     Arithmetic mean is: 14841032     Median is: 14841032     Population standard deviation is: 0     Sample standard deviation is: 0
  • Time spent looping:     Maximum is: 0.59881019592285     Minimum is: 0.38148784637451     Arithmetic mean is: 0.48865029811859     Median is: 0.50128650665284     Population standard deviation is: 0.095435870065359     Sample standard deviation is: 0.10059823996214

As you can see, there is virtually no difference between the memory usage (0.01601% difference). Reading from the array seems to be slightly faster (4.623% difference). After this I ran another series of tests, to see which method is faster for looping through as­so­cia­tive arrays: foreach, while, or a simple for loop. And I was quite shocked. I created a single array, similarly to above:

$data = array();
for ($i=0; $i<100000; $i++) {
    $data[md5($i)] = microtime();
}

And then I did the following three tests: Foreach:

foreach ($data as $key>$value) {
    $data[$key].='a';
}

While:

while (list($key) = each ($data)) {
    $data[$key].='a';
}

For:

$key = array_keys($data);
$size = sizeOf($key);
for ($i=0; $i<$size; $i++) {
    $data[$key[$i]] .= "a";
}

(I left the timing bits out. They're the same as above)The results are in­ter­est­ing:

  • Foreach     Maximum is: 0.14385986328125     Minimum is: 0.097252130508423     Arithmetic mean is: 0.12040016651153     Median is: 0.12920439243317     Population standard deviation is: 0.019063768371205     Sample standard deviation is: 0.020094976279628
  • While     Maximum is: 0.33422708511353     Minimum is: 0.22083210945129     Arithmetic mean is: 0.25493462085724     Median is: 0.22828495502472     Population standard deviation is: 0.041753633338567     Sample standard deviation is: 0.044012193979137
  • For     Maximum is: 0.11151194572449     Minimum is: 0.070990085601807     Arithmetic mean is: 0.086783051490784     Median is: 0.080439567565918     Population standard deviation is: 0.015436115114854     Sample standard deviation is: 0.01627109399583

While is the slowest, by faaaar, and for is the fastest than foreach with 38%. Quite a difference. I will continue to do some benchmarks so that the rolisz frameworkcan be one of the fastest ones around :D