Search Engine Friendly URLs

I first wrote this article in 2001 for SitePoint.com, then when this site launched in 2003 it was republished here. It has since become what is probably the most popular search engine friendly URLs article on the Internet. Unfortunately things change, and this article hasn't... until now. I present to you the newly updated and expanded search engine friendly URLs article.

On today's Internet database driven or dynamic sites are very popular. Unfortunately the easiest way to pass information between your pages is with a query string. In case you don't know a query string is a string of information tacked onto the end of a URL after a question mark. For instance in the following URL, http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=GGLD,GGLD:2004-44,GGLD:en&q=search+engine+friendly+urls, the bolded portion is a query string.

So, what's the problem with that? Well originally most search engines simply would not index such URLs. This has changed, and now all the major search engines do index such URLs. However they recognize such URLs as dynamic and they both do not want to crash your server, or get stuck crawling and endless string of dynamic URLs. So if your URLs appear dynamic they may either crawl them more slowly, and or only crawl them if they have sufficient incoming links. The longer the URL the greater chance that it won't get crawled.

There is also the issue of session IDs, which won't really be covered here but need mentioned. Many software systems, such as forums or shopping carts, require the ability to track visitors. This is most often accomplished with cookies, but when cookies are not possible (such as when the user turns them off or when search engine robots visit), the software falls back on appending a long string of numbers to the URL. This is a session ID. Session IDs are not only unfriendly, but they also tend to change rapidly for search engines, causing them to see the same content at multiple URLs. In fact session IDs in your URL can still effectively kill your chance of being fully indexed by search engines. So the solution to this is to identify search engines by their HTTP_USER_AGENT and turn sessions IDs off for them. Most better software pages made by search engine conscious developers include this functionality now, but you should always verify that it works.

You should also be concerned with user friendliness. URLs with long query strings are not only unfriendly to search engines, but also to users. A user will have a far easier time remembering a URL with real words in it rather than a URL with cryptic numeric data & variable names. Search engine friendly URLs also easily allow you to hide the type of server side programming you're using by removing the tell tale extension (such as .php) from your URLs. This means you will be able to change your backend software at a later date without changing your URLs, and it may even provide a security bonus.

Making your URLs search engine friendly is the practice of removing your query string and using meaningful identifiers for your pages. There are a few popular ways to go about passing information to your pages without using a query string so that search engines will still fully index your site. I plan on covering 4 of these. They will all work with Linux, Apache, & PHP. They may work in other environments but I cannot confirm that they do. The list of methods covered is as follows:

All 4 methods potentially make use of Apache's .htaccess file. If you do not know .htaccess is a file used to administer Apache access options for whichever directory it is placed in. The server administrator has a better method of doing this using his configuration files but since most of us do not own our own server we do not have access to what the server administrator does. Now the server admin can configure what users can do with their .htaccess file so this may not work on your server, however in most cases it will. If it does not work you should contact your server administrator. Additionally in Linux files that start with a period are considered hidden. In order to view such files in SSH or in your FTP program you may need to use the -a or ls -a command that indicates you want to see all files, including hidden ones. Also please remember that in Linux case matters, so be sure to copy the commands exactly.