programming

PHP benchmarking

While working on my framework, I got to a point when I had to store quite a lot of data during the execution of the script. My first thought was to use an associative array. But then I thought maybe objects are faster/use less memory. So I decided to test this. I used XDebug to get information about memory usage and I am running WampServer x64 on Windows 7. I quickly whipped up a script to fill an array with 1000 keys and arbitrary values and another one to create an object and create 1000 properties for it.

<?php 
//Initial memory
$memory1 = xdebug_memory_usage( );

$data = array();
for ($i=0; $i\<100; $i++) {
    for ($j =0; $j\<=1000; $j++) {
        $data[$i][md5($j)] = microtime();
    }
}
$array = xdebug_memory_usage() -$memory1;
$time1 = microtime(TRUE);
for ($i=0; $i\<100; $i++) {
    for ($j =0; $j\<=1000; $j++) {
        $var = md5(rand(0,1000));
        $var = $data[$i][$var];
    }
}
$time2 = microtime(TRUE) - $time1;
echo $array.PHP_EOL;
echo $time2.PHP_EOL;
?>

And for objects it's very similar:

<?php 
$data= array();
$memory2 = xdebug_memory_usage();
for ($i=0; $i\<100; $i++) {
    $data[$i] = new stdClass;
    for ($j =0; $j\<=1000; $j++) {
        $prop = md5($j);
        $data[$i]-\>$prop = microtime();
    }
}
$object = xdebug_memory_usage() - $memory2;
$time3 = microtime(TRUE);
for ($i=0; $i\<100; $i++) {
    for ($j =0; $j\<=1000; $j++) {
        $var = md5(rand(0,1000));
        $var = $data[$i]-\>$var;
    }
}
$time4 = microtime(TRUE) - $time3;
echo $object.PHP_EOL;
echo $time4.PHP_EOL;
?>

In this tests I create 10 arrays (then objects) and give them 1000 values that are the current time (in string format) and 16 byte key (an MD5 hash). I measured the memory before and after, and the amount of memory used by the array or object is the difference. Then I do another loop, and copy a value from the array each time. This is to test the read performance of arrays and objects. The two values are printed on two different lines. However, one value is useless by itself. Maybe my computer had an extra load for the duration of a test (running a background AV check), maybe the test hit a rare bottleneck in the memory allocation etc. So to do a proper benchmark, you have to run each test multiple times and then take the average and calculate the standard deviation. To run the tests multiple times I decided to use cURL to remotely load the URL's and parse the results into an array.

<?php
if (!isset($_GET['q'])) {
echo 'You must give a URL to test';
die();
}
$url = $_GET['q'];
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$data = array();
$nr = isset($_GET['nr'])?$_GET['nr']:10;
for ($i=0; $i\<10; $i++) {
    $val = curl_exec($ch);
    $val = explode(PHP_EOL,trim($val));
    foreach ($val as $key=\>$el) {
        $data[$key][] = $el;
    }
}

// close cURL resource, and free up system resources
curl_close($ch);

foreach ($data as $key=\>$values) {
    $key++;
    echo "Param {$key}: ";
    echo 'Maximum is: '.max($values).' ';
    echo 'Minimum is: '.min($values).' ';
    echo 'Arithmetic mean is: '.arithmetic_mean($values).' ';
    echo 'Median is: '.median($values).' ';
    echo 'Population standard deviation is: '.standard_deviation($values).' ';
    echo 'Sample standard deviation is: '.sd($values).' ';
    echo ' ';
}

function arithmetic_mean($a) {
    return array_sum($a)/count($a);
}

function median($a) {
    sort($a,SORT_NUMERIC);
    return (count($a) % 2) ?
        $a[floor(count($a)/2)] :
        ($a[floor(count($a)/2)] + $a[floor(count($a)/2) - 1]) / 2;
}

function standard_deviation($aValues, $bSample = false)
{
    $fMean = array_sum($aValues) / count($aValues);
    $fVariance = 0.0;
    foreach ($aValues as $i)
    {
        $fVariance += pow($i - $fMean, 2);
    }
    $fVariance /= ( $bSample ? count($aValues) - 1 : count($aValues));
    return (float) sqrt($fVariance);
}

function sd_square($x, $mean) {
    return pow($x - $mean,2);
}

function sd($array) {
    return sqrt(array_sum(array_map("sd_square", $array,
        array_fill(0,count($array), (array_sum($array) / count($array)) ) )
        ) / (count($array)-1) );
}
?>

Don't ask me about the two functions two calculate the standard deviation. I found them in the PHP manual(because the Stats extension has no Windows binaries X( ). This script loads the URL passed in the q GET parameter and loads it several times (10 times by default, or the value of nr GET parameter if it exists). Then, for each value outputed on a new line, it calculates the average, the maximum value, the minimum value and two types of standard deviations. The results I got for 30 runs is:

Arrays:
Memory usage: Maximum is: 14838656 Minimum is: 14838656 Arithmetic mean is: 14838656 Median is: 14838656 Population standard deviation is: 0 Sample standard deviation is: 0
Time spent reading: Maximum is: 0.56820487976074 Minimum is: 0.3705358505249 Arithmetic mean is: 0.46606066226959 Median is: 0.46564090251923 Population standard deviation is: 0.094189880474856 Sample standard deviation is: 0.099284851613188
Objects:
Memory usage: Maximum is: 14841032 Minimum is: 14841032 Arithmetic mean is: 14841032 Median is: 14841032 Population standard deviation is: 0 Sample standard deviation is: 0
Time spent looping: Maximum is: 0.59881019592285 Minimum is: 0.38148784637451 Arithmetic mean is: 0.48865029811859 Median is: 0.50128650665284 Population standard deviation is: 0.095435870065359 Sample standard deviation is: 0.10059823996214

As you can see, there is virtually no difference between the memory usage (0.01601% difference). Reading from the array seems to be slightly faster (4.623% difference). After this I ran another series of tests, to see which method is faster for looping through associative arrays: foreach, while, or a simple for loop. And I was quite shocked. I created a single array, similarly to above:

$data = array();
for ($i=0; $i<100000; $i++) {
    $data[md5($i)] = microtime();
}

And then I did the following three tests: Foreach:

foreach ($data as $key>$value) {
    $data[$key].='a';
}

While:

while (list($key) = each ($data)) {
    $data[$key].='a';
}

For:

$key = array_keys($data);
$size = sizeOf($key);
for ($i=0; $i<$size; $i++) {
    $data[$key[$i]] .= "a";
}

(I left the timing bits out. They're the same as above)The results are interesting:

Foreach Maximum is: 0.14385986328125 Minimum is: 0.097252130508423 Arithmetic mean is: 0.12040016651153 Median is: 0.12920439243317 Population standard deviation is: 0.019063768371205 Sample standard deviation is: 0.020094976279628
While Maximum is: 0.33422708511353 Minimum is: 0.22083210945129 Arithmetic mean is: 0.25493462085724 Median is: 0.22828495502472 Population standard deviation is: 0.041753633338567 Sample standard deviation is: 0.044012193979137
For Maximum is: 0.11151194572449 Minimum is: 0.070990085601807 Arithmetic mean is: 0.086783051490784 Median is: 0.080439567565918 Population standard deviation is: 0.015436115114854 Sample standard deviation is: 0.01627109399583

While is the slowest, by faaaar, and for is the fastest than foreach with 38%. Quite a difference. I will continue to do some benchmarks so that the rolisz frameworkcan be one of the fastest ones around :D

PHP benchmarking

Read more

To AI and back - Platonic Bliss

To AI and back - part 1

TIL: pytz can return ancient timezone

A wild rolisz reappears