Okay, I'm now testing my new tweaks to my badbot countermeasures.
As before, I use bot traps to catch bad bots and initially rely on MySQL to keep tabs on things. This allows me to have a "safety" for real people that for some dumb reason stumble into my bot trap. It also allows me to clear a specific IP address from my bot trap once requests from that IP address have stopped for a specific period of time.
The new addition is that after a specific number of requests by a flagged IP address they are added to a bad bot text file and are totally banned until the IP address is manually removed from the ban list. A second bad bot log text file is used to record more details about each request by a banned bot to help me analyze whether to make the ban for an IP permanent.
What I have observed is that almost all "bots" that fall into my bot trap request 11 pages then go on their merry way without causing me any problems. Only a very small percentage of IPs that fall into my bot trap request more than 12 pages (maybe less than 5%).
The code I initially use once my bad bot ban threshold has crossed is:
Code:
$file="BadBotBlackList.txt";
$handle = fopen($file, "a");
$record="'$REMOTE_ADDR'|".date("r")."\n";
fwrite($handle, $record);
fclose($handle);
$file="BadBotLog.txt";
$handle = fopen($file, "a");
$record="'$REMOTE_ADDR'|".date("r")."|'$HTTP_USER_AGENT'|'$HTTP_ACCEPT'\n";
fwrite($handle, $record);
fclose($handle);
Each time a page request is made and before any database connections are created, I use the following code to check to see if a IP address is in my block list and log any request by blocked IPs:
Code:
$contents = file_get_contents("BadBotBlackList.txt");
if(stristr($contents,"'".$REMOTE_ADDR."'")){
$file="BadBotLog.txt";
$handle = fopen($file, "a");
$record="'$REMOTE_ADDR'|".date("r")."|'$HTTP_USER_AGENT'|'$HTTP_ACCEPT'\n";
fwrite($handle, $record);
fclose($handle);
header("HTTP/1.1 403 Service Unavailable");
echo "<html><head><title>403</title></head><body><h1>Access Denied</h1><p>Access to this site has been denied for requests coming from the IP address $REMOTE_ADDR due to abuses of our system resources. Spidering, indexing or caching this site for purposes other than publicly available Internet search engines (e.g. Google, MSN Search, etc.) is strictly prohibited.</p></body></html>";
exit();
}
Bookmarks