Traffic Analysis Primer

What to do Every Day

Once you have your tracking software setup you can start using it, but what do you actually look for? There are a variety of things you should check every day, the following describes them:

Referrers
The first thing you should check daily is your referrers. I know from personal experience that if you have a popular site your referrers can number in the thousands and so reading through that list every day can be a chore, but it's a must. When you're looking at your referrers you're looking at two things, where in the search engines are people finding you, and on what other pages are people finding you. Specifically this will help you check to see if you're maintaining your search engine positions and it will help you identify new people linking to you. When I find a new person linking to me I submit their site to Google so that Google spiders their site, sees the link, and gives me an increase to my link popularity. Some people would advise against doing this on the grounds that it is unethical to submit another person's site, I disagree. In some search engines it used to be that if you oversubmitted a site you could get banned, however Google was never, and is still not, doing this. If a page is already submitted your request is just ignored. So because nothing bad will come of submitting their page I don't find it unethical, especially since many of the people who own the pages may not even know how to submit their site. However you should make your own decision on this.

IP Addresses and User Agents
The second thing you need to do is check IP addresses and user agents of your visitors. This will tell you two things:

The first is important because unless you know when your site was spidered you cannot effectively troubleshoot your search engine listings if they appear outdated or if they do not appear at all. Many people will remember when they submitted to the search engines, but if you ask them when they were spidered they won't have a clue. Knowing when a search engine spiders and then when they update will allow you to predict when your listings will change.

The second is important because there are a lot of evil people out there and there are many ways you can abuse a website. One way is to write a script that rips content off a website to display on your own. For instance there are many scripts that rip news headlines off sites like CNN.com, the ripping page then displays the headlines and a link to CNNs site, and while technically it is wrong to copy their headlines it is easily forgiven as you are using the headlines to link to them. However it is just as easy to write a script that steals articles from a site and displays them on your own. If someone is doing either of these things you can usually tell because there should be a large number of requests from their IP address (which should resolve to a web server) and hits from a user agent called "PHP," "Perl," or another scripting language. Sometimes people will download your entire site and then republish it on their server, however they sometimes forget to recode some links resulting in hits from their version of your site to your site. One SitePoint Forum advisor recently discovered this exact thing happening because he monitored his referrers.

On the topic of downloading an entire site there are also site rippers out there. Often benignly named "offline browsers," much in the way some Trojans are named "remote administration tools," these are programs which can be used to download your entire site, which not only steals your site but can crash or severely slow down your server. Depending on the size of your site these can be detected by looking at IP addresses and if you see hundreds or thousands of impressions from one address chances are its one of these programs. You can also look for their user agents, some popular ones are Wget, Teleport, HTTrack, and Web Reaper. I should mention that Wget is a valid program used on unix servers to download files, such as patches or drivers. However unless you provide such downloads on your site then anyone using it on your site is stealing.

Yet another way someone can abuse your site is to harvest emails off of it, this is especially important if you run a community site where users often post their emails. Just like site rippers you can often identify email harvesters via their user agent.

The final way someone can abuse your site is to block your advertisements. Some consider this a right of the surfer, and they're entitled to their opinion. However I feel that it is stealing. A webmaster puts up advertisements expecting you will view them in conjunction with the content you are viewing for free. If you block the advertisements then ethically I don't think you should visit the site at all. Some webmasters will redirect people using ad blocking programs to a page asking them to pay for site access, and that is basically how I feel. You either pay with your wallet or with your eyeballs. Like the aforementioned examples this can be detected by monitoring user agent.

Once you identify the IP addresses or user agents of those abusing your site you can ban them using .htaccess if you are using Apache. However explaining how to do so is beyond the scope of this article.