November 9, 2007 at 10:22AM URL path parsing
Over the years I’ve built up a fairly substantial library of code I call AFK, which includes a tiny framework (we’re talking in the region of 20-30 lines here) that’s built on top of the libraries. I wrote it mostly because I wanted to try out stuff that no other PHP framework appeared to do or do well, and partly out of inevitable developer vanity. It’s served me well over the years and has morphed an awful lot since I started it.
One of the things it includes is a class called AFK_Routes, which, as you might expect, does URL routing. The route format is more-or-less the same as Joe Gregorio’s URI templates, so you’d specify a path like this:
/weblog/{year}/{month}/{entry}
Simple enough (and that’s relative to the root of the application, BTW). When specifying a template, you can also give a list of arbitrary data to be combined with the values extracted from the template variables, and a list of patterns that the various variables must adhere to. For example, if I was giving the routing specification for a weblog, the code might be:
function routes() {
$patterns = array('year' => '\d{4}', 'month => '[1-9]|1[012]');
$r = new AFK_Routes();
$r->route('/', array('_view' => 'frontpage'));
$r->route('/{year}/, array('_view' => 'year'), $patterns);
$r->route('/{year}/{month}/', array('_view' => 'month'), $patterns);
$r->route('/{year}/{month}/{entry}', array('_view' => 'entry'), $patterns);
return $r;
}
This has problems. Firstly, it means there’s no effective way to cache compiled templates until they’re updated again. Secondly, the PHP code is mostly noise and obscures the routing information. Thirdly, it doesn’t play well with version control systems.
My solution to these is to move the routing information out to a configuration file. No XML crap mind you, just a simple plaintext list of templates. While I was thinking about this, it struck me that I could improved the actual templates themselves by allowing the patterns to be inlined.
In this new world order, the template variable syntax would be extended by giving the option of adding the pattern after the variable name, with the two separated with =:
/
_view = frontpage
/{year=\d{4}}/
_view = year
/{year=\d{4}}/{month=[1-9]|1[012]}/
_view = month
/{year=\d{4}}/{month=[1-9]|1[012]}/{entry}
[Lines starting with ‘/’, ‘#’, and ‘@’ are special, marking the start of a template specification, a comment, and a metadata specification respectively.]
Here’s the code for compiling a template to a regular expression:
function compile($template) {
preg_match_all(
'/{([_a-z][_a-z0-9]*)(?:=((?:[^{}]+(?:{\d*,?\d*})?)+))?}/i',
$template, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
$start = 0;
$pattern = '`^';
foreach ($matches as $m) {
$patt_len = strlen($m[0][0])
$pos = $m[0][1];
$name = $m[1][0];
$pattern .=
preg_quote(substr($template, $start, $pos - $start), '`') .
"(?P<$name>";
$start = $pos + $patt_len;
if (isset($m[2])) {
$pattern .= $m[2][0];
} elseif ($start < strlen($template)) {
// If there's no pattern specified, it delimits this pattern by
// the next character following it. This is usually what you want
// in all but 1% of cases.
$delim = substr($template, $start, 1);
if ($delim == '{') {
throw new AFK_RouteParsingException(
'If you put two placeholders next to one another, ' .
'the first of the two must have a pattern.');
}
$pattern .= '[^' . preg_quote($delim, '`') . ']*';
} else {
$pattern .= '.*';
}
$pattern .= ')';
}
$pattern .= preg_quote(substr($template, $start), '`') . '$`';
return $pattern;
}
function parse($pattern, $path) {
return preg_match($pattern, $path, $matches) != 0 ? $matches : false;
}
Mind, I hacked this together during Prison Break last night, so there’s probably plenty of opportunities to improve it.
And here’s a demonstration it in use:
$t = '/adfafsf{foo}fds/{bar=fds|af?ds}/{baz}/';
$re = compile($t)
print_r(parse($re, '/adfafsf---fds/ads/4444/'));
Which gives us:
Array (
[foo] => ---
[bar] => ads
[baz] => 4444
)
This system give you most of the power of regular expressions, with the simplicity of Rails-style route specifications.