Search Engine Friendly URLs

Search Engine Friendly URLs with mod_rewrite

Implementation:

mod_rewrite is an Apache module that allows you to use regular expressions within an .htaccess file to rewrite your URLs. It is certainly the most powerful and most flexible of the methods, but there is a bit of a learning curve to using it.

Regular expressions are how computers do complex pattern matching. Explaining them in detail is beyond the scope of this article, but suffice it to say they can appear cryptic at first and that can be intimidating. Also, they teach you in programming never to use regular expressions when you do not need to. The reason is that regular expressions require more computer power than other functions and so you don't want to use them unless absolutely necessary.

Now, I've never done any benchmarks on these methods, and I do not know of anyone who has, however my gut tells me that mod_rewrite, because of it's reliance on regular expressions, is likely a little slower than the other methods. That's just something to keep in mind.

Most Apache installatinos use mod_rewrite, but your hosting company might not, so be sure to check first before you put work into this method.

So again, if you had a URL like http://www.example.com/article.php?article=999&page_num=12 and you wanted to turn it into http://www.example.com/article/999/12 you could do this with mod_rewrite.

To do this you'd simply put the following in your .htaccess file:

RewriteEngine On
RewriteRule ^article/(.*)/(.*)/ /article.php?article=$1&page_num=$2

The first line turns mod_rewrite on. The second line is your first rule (you can have as many as needed). Each rule takes two arguments separated by a space. The first argument contains the URL format you want to use, and the second argument contains the URL format your backend requires.

First lets examine the first argument, ^article/(.*)/(.*)/. The carat, ^, simply marks the beginning, then the important things within this argument are inside the parenthesis. With regular expressions parenthesis indicate tagged expressions, or matches you want to use later. They are given numbers based on order of appearance, so the first set of parenthesis becomes tagged expression 1, the second #2.

Within these tagged expressions is a . and a *. The period indicates that the expression matches any character, and the asterisk indicates that it can match any number of times. So basically there are no limits with the string you can put in there. In short, with regular expressions periods are wildcards for any character, and asterisks mean unlimited matches.

So, our first argument is saying to match URLs that contain article then a backslash, then any number of characters tagged as expression 1, then another backslash, then any number of characters tagged as expressions 2, then another backslash.

Our second argument, /article.php?article=$1&page_num=$2, is just like any other query string except you have access to the tagged expressions from argument #1 and you can feed these tagged expressions to your query string like normal variables as I marked in bold above. The first one is accessed as $1, the second one is accessed as $2. Its that easy.

The beauty of this method is that you have to do no special coding within your files themselves, it is all done within .htaccess. You just code your internal site links to use the URL format you want, then you write the rewrite code to take that format and turn it into a normal query string.

Drawback:

The main drawback of this method, and some will dispute that it is even a drawback, has already been mentioned above. Also, while mod_rewrite can be made to do practically anything, the fact is that the code required to do more complicated rewriting is much more complicated and novices will likely be confused and intimidated by it.