rolisz's site

URL routing with PHP

I still remember the first website I made. I shudder when I think back at those days. Lots and lots of copy-paste code, inexistent or crappy doc­u­men­ta­tion ("paginarea efectiv and don't mess with it"), poorly named variables ($cv, $cva, $x). But con­sid­er­ing I was the only developer and probably the only guy who saw the code, it wasn't that bad, because I was the one regretting it later on, when I had to upgrade the site :)). But there is one thing that was awful for the users too: the URLs. At the time (4 years ago) I didn't know about pretty urls. I could filter the content on the site by 6-7 variables and I didn't set any default values, so each URL contained all the values, even if they were the default ones. The site had url query strings like : ?ord=DESC+&vb=Test+&an=2010+&poz=0+&q=1+&conf=1. Ugly, isn't it?

Since then however, through contact with Drupal and Wordpress, I have learned about making URL's pretty, like /DESC/Test/2010/0/1/1. Much better, isn't it?

My first ideea of doing this was to use mod­_rewrite and modify the URL so that it contained proper queries with the content between the slashes. You would have to create a .htaccess file in your webroot and put the following there

Options +FollowSymlinks RewriteEngine on RewriteRule ^files/([^/]+)/([^/]+) /index.php?section=$1&file=$2 [NC] This would redirect /files/test/part2 to /index.php?section=test&file=part2. All this happens on the server-side, the user wouldn't see any of this. This has the dis­ad­van­tage of being highly unflexible. If you need 5 variables in your URL, you need a regex with 5 catches. You need 10 variables? An 80 character regex. And also, your variables are fixed. The first thing between slashed will always go to a section $_GET variable. If you want to redirect for example the url's that start with /blog/123 to blog.php?q=123 and the urls that start with /projects/flock/ to projects.php?q=flock then you need two rewrite rules.

But there is another way: you do the url in­ter­pre­ta­tion in PHP (that's my favorite server-side language, but it's similar in other languages too). You redirect all url's to index.php with the following .htaccess:

RewriteEngine On
RewriteBase /framework/
RewriteCond %{REQUEST_FILENAME} !-l
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .* index.php [L,QSA]`

From here on, it's all in PHP.  We get the part of the url that interests us and store it in the $url variable.

$url = explode ('/',trim(substr($_SERVER['REQUEST_URI'],strlen(preg_replace('/(.*)/.+/','$1',$_SERVER['SCRIPT_NAME']))),' /')); We will define routes for our URL with the function route($pattern, $functions) that accepts as the first parameter a string containing the URL we want and either a string with the name of functions and files we want to execute when that URL is called or an anonymous function (for PHP>5.3).

    function route($pattern, $funcs) {
    global $routes;
    // Valid functions
    if (is_string($funcs)) {
    // String passed
    foreach (explode('|',$funcs) as $func) {
    // Not a lambda function
    if (substr_count($func,'.php')!=0) {
    // Run external PHP script
    $file = strstr($func,'.php',TRUE).'.php';
    if (!is_file($file)) {
    // Invalid route handler
    trigger_error($file." is not a valid file");
    return;
    }
    }
    elseif (!is_callable($func)) {
    // Invalid route handler
    trigger_error($func.' is not a valid function');
    return;
    }
    }
    }
    elseif (!is_callable($funcs)) {
    // Invalid function
    trigger_error($func.' is not a valid function');
    return;
    }

    $routes[$pattern] = $funcs;
    }

The first part of the function checks if the functions are valid and callable, or if they are files and have to be included. If everything checks out, the pattern is stored in the $routes array, with the URL being the key and the functions being the value. To check for valid matches, we use the following array, that is explodes by the '/' both the current URL and the route that's given to it and then proceeds to compare the two. If it reaches the end of both arrays suc­ces­ful­ly, we have a match. If it reaches a wildcard character in the route, we have a match no matter what. Else it fails.

    function matches($route) {
    global $url,$args;
    $pattern = $route;
    $route = explode('/',$route);
    $i = 0;

    while (isset($route[$i]) && isset($url[$i])) {
    if ($route[$i]=='\*') {
    return true;
    }
    elseif ($route[$i][0]==':') {
    $args[$pattern][] = $url[$i];
    $i++;
    }
    elseif ($route[$i] == $url[$i]) {
    $i++;
    }
    else {
    unset($args[$pattern]);
    return false;
    }
    }
    if (isset($route[$i])==isset($url[$i])) {
    return true;
    }
    elseif (!isset($url[$i]) && $route[$i]=='\*') {
    return true;
    }
    else {
    unset($args[$pattern]);
    return false;
    }
    }

If we have a ":" in the route, that means that place is a variable and should be passed as an argument to the called functions. The function that calls the functions we defined for a route and passes them our arguments is quite simple, it just has to deal with three cases:

    function call($funcs,$args) {
    if (!is_array($args)) {
    $args=array($args);
    }
    if (is_string($funcs)) {
    // Call each code segment
    foreach (explode('|',$funcs) as $func) {
    if (substr_count($func,'.php')!=0) {
    // Run external PHP script
    $file = strstr($func,'.php',TRUE).'.php';
    $functions = substr(strstr($func,'.php'),5);
    include $file;
    $functions = explode(':',$functions);
    foreach ($functions as $function) {
    if (!is_callable($function)) {
    // Invalid route handler
    trigger_error($function.' is not a valid function');
    return;
    }
    call_user_func_array($function,$args);
    }
    }
    else {
    // Call lambda function
    call_user_func_array($func,$args);
    }
    }
    }
    else
    // Call lambda function
    call_user_func_array($funcs,$args);
    }

To run our ap­pli­ca­tion we use the function run() at the end. It just loops over all the routes and in the end calls the first match's functions with it's arguments.

    function run() {
    global $routes,$url,$args;
    $allroutes = array_keys($routes);
    // Process routes
    if (!isset($allroutes)) {
    trigger_error('No routes set!');
    return;
    }

    $found=FALSE;

    $valid_routes = array();
    // Check for matching routes
    foreach ($allroutes as $route) {
    if (matches($route)) {
    $valid_routes [] = $route;
    }
    }
    rsort($valid_routes);
    $args = array_unique($args);
    if (!empty($valid_routes)) {
    $found=TRUE;
    }

    if (!$found) {
    trigger_error('Page not found!');
    }
    else {
    if (!isset($args[$valid_routes[0]])) {
    $args[$valid_routes[0]] = array();
    }
    //Remaining part of URL is passed to functions as arguments
    call($routes[$valid_routes[0]],$args[$valid_routes[0]]);
    return;
    }
    }

And that's it. How to a simple URL routing in PHP, with variables, wildcards. This can be extended of course, to provide named URLs for easier writing of URLs and it can be made into a nice little class. That is how I use it, by in­te­grat­ing it into my own framework. But that is a matter for another time, because it's not ready yet. But hopefully, in a week or two, you will all be able to see the results of my work ;;) The source code with a little test can be downloaded here.