PDA

View Full Version : Regex to match search engines, keywords, browsers?



Dan Grossman
03-14-2006, 04:43 PM
I'm writing a bit of a log analyzer and before I spend a ton of time doing this myself, was wondering if anyone had some info or knew of a script I could look at that already does what I need (so I can use the code as an example, I'm still writing something new).

I have referring URLs to a website, and I'd like to be able to identify if that URL is of a major search engine. If so, I'd also like to be able to get the keywords used in the search.

Getting the keywords should be pretty simple just by building myself a list of the parameter names used by each search engine for the keywords (such as q= for Google) and using PHP's functions for parsing query strings. I need to find which SE first.

I also need to get the browser and operating system out of the user agent string, which I can either log from the client by JavaScript or try to access through $_SERVER['HTTP_USER_AGENT'].

So again... what I'm asking is if anyone's already written similar code, or knows of a script I can take a look at that has some of these expressions already written, it'd save me a load of time. Thanks

Dan Grossman
03-15-2006, 12:06 PM
Thought I'd share what I've found since writing this:

PHP has a handy get_browser() function that hands you all kinds of information when passed a useragent string.

It uses a browscap.ini file for patterns to match different browsers and systems and the info about them. Very regularly updated copies are available here (http://www.garykeith.com/browsers/downloads.asp).

That solves the browser/OS parsing problem.

As for identifying search engines and keywords, I found a nice list somewhere (that I've lost already) of search engine URLs and the names of their query parameter, which I hand-wrote into an array to loop through and match against. I might end up rewriting that later to something more flexible.