Search Engine Friendly URLs

Search Engine Friendly URLs with Custom Error Pages

Implementation:

This method takes advantage of .htaccess' ability to do error handling. In your .htaccess file in whichever directory you wish to use this method simply put the following line:

ErrorDocument 404 /processor.php

Now make a script called processor.php and put it in that same directory. That's all you have to do. Lets say you have the following URL: http://www.domain.com/999/12/. And again in this example 999 and 12 do not exist, however since you do not specify a script anywhere in the directory path Apache will create a 404 error. Instead of sending a generic 404 header back to the browser, Apache sees the ErrorDocument command in the .htaccess file and it will call up processor.php.

So now we're in processor.php. In the first example we used the $PATH_INFO variable, but that won't work this time. Instead we need to use the $REQUEST_URI variable. The $REQUEST_URI variable contains everything in the URL after the domain. So in this case: /999/12/.

The first thing you need to do in your processor.php is send a new HTTP header. Remember Apache thought this was a 404 error and so it wants to tell the browser that it couldn't find a page.

Put the following line first thing in your processor.php:

header("HTTP/1.1 200 OK");

At this time I need to point out an important fact. In the first example you could specify what script processed your URL. In this example all URLs must be processed by the same script, processor.php, which makes things a little different. Instead of creating different URLs depending on what you wanted to do, such as article.php/999/12 or printarticle.php/999/12 or category.php/13 you only have 1 script that must do both.

So your processor.php must decide what to do with the information it gets. Usually you can do this by counting how many parameters are passed. For instance on one of my sites, http://www.online-literature.com, I use this method to generate my pages. I know if there is just one parameter, such as http://www.online-literature.com/shakespeare/, that I need to load an author information page, if there are 2 parameters, such as http://www.online-literature.com/shakespeare/hamlet/, I know that I need to load a book information page, and finally if there are 3 parameters, such as http://www.online-literature.com/shakespeare/hamlet/3/, I know I need to load a chapter viewing page. Alternatively, you can just use the first parameter to indicate the type of page to display, and then process the remaining parameters based on that.

There are 2 ways you can accomplish this task of counting parameters. First, as in the PATH_INFO method, you need to use PHP's explode function to divide up the $REQUEST_URI variable. So if $REQUEST_URI = /shakespeare/hamlet/3/:

$var_array = explode("/",$REQUEST_URI);

Now note because of the positioning of the /'s there are actually 5 elements in this array. The first element, element 0, is blank because it contains the information before the first /. The fifth element, element 4, is also blank because it contains the information after the last /.

So now we need to count the elements in our $var_array. PHP has two functions that let us do this. We can use the sizeof() function as in this example:

$num = sizeof($var_array); // 5

You'll notice that the sizeof() function counts every item in the array regardless if it is empty. The other function is count(), which is an alias for the sizeof() function. This meants we cannot use this function...

You see, some search engines, like AOL, will automatically remove the trailing / from your URL, and this can cause a problem if you're using these functions to count your array. For instance http://www.online-literature.com/shakespeare/hamlet/ becomes http://www.online-literature.com/shakespeare/hamlet and since there are 3 total elements in the array our processor.php would load an author page instead of a book page. Additionally users will do it too by accident, and you do not want to leave them hanging. So you need to be able to accurately count the number of elements in the array.

The solution is to create a function that will count only the elements in an array that hold data. This will allow you to leave off the ending / or to allow any links that leave off the trailing slash still get to the right place.

An example of such a function is below.

function count_all($arg) { 

// skip if argument is empty 
	if ($arg) { 

// not an array, return 1 (base case) 
	if(!is_array($arg)) 
	return 1; 

// else call recursively for all elements $arg 
	foreach($arg as $key => $val) 
		$count += count_all($val); 

	return $count; 
	} 
}

Then to get your count you can access the function like this:

$num = count_all($url_array);

Once you know how many parameters are needed you can define them like this:

$author=$var_array[1]; $book=$var_array[2]; $chapter=$var_array[3];

And then you can use includes to call up the appropriate script that will query your database and set up your page.

Also if you get a result you're not expecting you can simply create your own error page for display to the browser.

Drawback:

The main drawback of this method is that you effective destroy any usefulness your error logs have as every page view will result in another line in the error log. However, this can be offset by the main benefit of this method which is that you can build it directly off your root. All the other methods require an intermediary script, like article or category, with this one you can start right off the root like I do on my literature site and I think that can provide a cleaner look (and certainly shorter URLs) depending on how you structure your content.