Results 1 to 8 of 8

Thread: How do I write and retrieve strings from a text file using PHP?

  1. #1
    Site Contributor KLB's Avatar
    Join Date
    Feb 2006
    Location
    Saco Maine
    Posts
    1,181

    How do I write and retrieve strings from a text file using PHP?

    Okay I have a "bot trap" that writes offending IPs to a database table and then uses the table to decide to allow or deny requests to my site. Every now and then, however, a really bad bot will come along and take down my database server by making requests so fast that they exceed the number of connections I am allowed at any one point in time.

    I could go to a dedicated database server but this costs $$$ I'd rather save. What I want to do is program a "safety switch" that writes the offending IP address to a new line in a text file along with the timestamp (e.g. 123.123.123.123 | 2007-07-18 01:10:02 EDT) if the database has recorded a certain number of requests for an offending IP address. Then before any database calls are made, my script would verify the IP address against the text file and deny the IP address in question access to my site.

    So my question is, using PHP how do I write a string to a text file and then how do I query that text file (e.g. like a database table) to see if an IP address is listed in it?

    Thanks
    Ken Barbalace - EnvironmentalChemistry.com (Environmental Careers, Blog)
    InternetSAR.org: Volunteers Assisting Search and Rescue via the Internet
    My Firefox Theme Classic Compact: Based onFirefox's classic theme but uses much less window space

  2. #2
    Administrator Chris's Avatar
    Join Date
    Feb 2003
    Location
    East Lansing, MI USA
    Posts
    7,055
    Why don't you just directly write the ips to your .htaccess for denial?
    Chris Beasley - My Guide to Building a Successful Website[size=1]
    Content Sites: ABCDFGHIJKLMNOP|Forums: ABCD EF|Ecommerce: Swords Knives

  3. #3
    Administrator Chris's Avatar
    Join Date
    Feb 2003
    Location
    East Lansing, MI USA
    Posts
    7,055
    In anycase this is how you load:

    //load list of ips into iplist array where url is location of text file
    $iplist = file("$url");

    Here is how you write:

    $handle = fopen("/full/path/to/file.txt", "w");

    if (fwrite($handle, $string) === FALSE) {
    echo "Cannot write to file";
    exit;
    }

    echo "Success, wrote content to file";

    fclose($handle);

    }

    in the fopen bit the last character, W, is to write, if you want to append (add to the bottom instead of overwriting), use "a".
    Chris Beasley - My Guide to Building a Successful Website[size=1]
    Content Sites: ABCDFGHIJKLMNOP|Forums: ABCD EF|Ecommerce: Swords Knives

  4. #4
    Site Contributor KLB's Avatar
    Join Date
    Feb 2006
    Location
    Saco Maine
    Posts
    1,181
    Quote Originally Posted by Chris
    Why don't you just directly write the ips to your .htaccess for denial?
    I do, but it isn't fast enough. I need an automated process that detects bad bots and shuts them down until I can manually review the IP address. Last night while I was asleep I had a bad bot make over 6000 requests in a short period of time exceeding my simultaneous database connection limit and denying others the ability to use my database for about 40 minutes.

    I again I could go to a dedicated database server, but why should I do this just to support some bad bots. I'd rather trap and block them using automated processes.

    Thanks for the code, I'll work with it. I'm thinking of offloading my entire bad bot detection methods from the database server and using a txt file for this purpose (if it doesn't put too much work on the webserver).
    Ken Barbalace - EnvironmentalChemistry.com (Environmental Careers, Blog)
    InternetSAR.org: Volunteers Assisting Search and Rescue via the Internet
    My Firefox Theme Classic Compact: Based onFirefox's classic theme but uses much less window space

  5. #5
    Administrator Chris's Avatar
    Join Date
    Feb 2003
    Location
    East Lansing, MI USA
    Posts
    7,055
    I do, but it isn't fast enough.
    Writing to your .htaccess file automatically would be much much faster than any method you currently use.
    Chris Beasley - My Guide to Building a Successful Website[size=1]
    Content Sites: ABCDFGHIJKLMNOP|Forums: ABCD EF|Ecommerce: Swords Knives

  6. #6
    Site Contributor KLB's Avatar
    Join Date
    Feb 2006
    Location
    Saco Maine
    Posts
    1,181
    For safety reasons, I prefer not to allow automated processes to write to my .htaccess file.

    What I'm working on is figuring out how to create a text file that replaces my need for the database table I use for detecting and tracking bad bots. If I can figure out how to change a single line of text in a text file and how to build queries for text file, I'll be able to short circuit the entire MySQL connection process for bad bots.

    Currently I have a special bait file that my pages link to via hidden links that users would never see and would know not to click on even if they did see them. Legitimate bots that obey the robots.txt file and would not access the files in question. Bad bots, however, would see these links and normally ignore the robots.txt file so that fall into the trap. The trap records the bad bot's IP address, UA string and date/time of request.

    Each time a page request is made the bad bot table is checked. If there is a recent entry for the IP address the table is updated and the IP is denied access to the page. This helps me block IPs for only a short period of time in case it is a dynamic IP and manually find repeat offenders that might be worthy of a permanent ban.

    The thing is too many users share IP addresses or use dynamic IPs I don't want to block innocent users just because they share an IP with someone who is doing things they shouldn't.
    Ken Barbalace - EnvironmentalChemistry.com (Environmental Careers, Blog)
    InternetSAR.org: Volunteers Assisting Search and Rescue via the Internet
    My Firefox Theme Classic Compact: Based onFirefox's classic theme but uses much less window space

  7. #7
    Site Contributor KLB's Avatar
    Join Date
    Feb 2006
    Location
    Saco Maine
    Posts
    1,181
    Okay, I'm now testing my new tweaks to my badbot countermeasures.

    As before, I use bot traps to catch bad bots and initially rely on MySQL to keep tabs on things. This allows me to have a "safety" for real people that for some dumb reason stumble into my bot trap. It also allows me to clear a specific IP address from my bot trap once requests from that IP address have stopped for a specific period of time.

    The new addition is that after a specific number of requests by a flagged IP address they are added to a bad bot text file and are totally banned until the IP address is manually removed from the ban list. A second bad bot log text file is used to record more details about each request by a banned bot to help me analyze whether to make the ban for an IP permanent.

    What I have observed is that almost all "bots" that fall into my bot trap request 11 pages then go on their merry way without causing me any problems. Only a very small percentage of IPs that fall into my bot trap request more than 12 pages (maybe less than 5%).

    The code I initially use once my bad bot ban threshold has crossed is:
    Code:
    $file="BadBotBlackList.txt";
    $handle = fopen($file, "a");
    $record="'$REMOTE_ADDR'|".date("r")."\n";
    fwrite($handle, $record);
    fclose($handle);
    
    $file="BadBotLog.txt";
    $handle = fopen($file, "a");
    $record="'$REMOTE_ADDR'|".date("r")."|'$HTTP_USER_AGENT'|'$HTTP_ACCEPT'\n";
    fwrite($handle, $record);
    fclose($handle);
    Each time a page request is made and before any database connections are created, I use the following code to check to see if a IP address is in my block list and log any request by blocked IPs:
    Code:
    $contents = file_get_contents("BadBotBlackList.txt");
    if(stristr($contents,"'".$REMOTE_ADDR."'")){
    	$file="BadBotLog.txt";
    	$handle = fopen($file, "a");
    	$record="'$REMOTE_ADDR'|".date("r")."|'$HTTP_USER_AGENT'|'$HTTP_ACCEPT'\n";
    	fwrite($handle, $record);
    	fclose($handle);
    	
    	header("HTTP/1.1 403 Service Unavailable");
    	echo "<html><head><title>403</title></head><body><h1>Access Denied</h1><p>Access to this site has been denied for requests coming from the IP address $REMOTE_ADDR due to abuses of our system resources. Spidering, indexing or caching this site for purposes other than publicly available Internet search engines (e.g. Google, MSN Search, etc.) is strictly prohibited.</p></body></html>";
    	
    	exit();
    	}
    Ken Barbalace - EnvironmentalChemistry.com (Environmental Careers, Blog)
    InternetSAR.org: Volunteers Assisting Search and Rescue via the Internet
    My Firefox Theme Classic Compact: Based onFirefox's classic theme but uses much less window space

  8. #8
    Junior Registered
    Join Date
    Jun 2008
    Posts
    1

    Offending IPs

    I am also faced with the same problem, and surprisingly enough came to the exact same conclusion. Given that you faced this problem you have almost 2 years head start on me, and I'd like to learn from your experiences. Can you please share?

    Can we, or is there already an Open Source Database of blacklisted IPs that we can read from and add to? I'd love to get your inputs.

    Thanks!!

    Quote Originally Posted by KLB View Post
    Okay, I'm now testing my new tweaks to my badbot countermeasures.

    As before, I use bot traps to catch bad bots and initially rely on MySQL to keep tabs on things. This allows me to have a "safety" for real people that for some dumb reason stumble into my bot trap. It also allows me to clear a specific IP address from my bot trap once requests from that IP address have stopped for a specific period of time.

    The new addition is that after a specific number of requests by a flagged IP address they are added to a bad bot text file and are totally banned until the IP address is manually removed from the ban list. A second bad bot log text file is used to record more details about each request by a banned bot to help me analyze whether to make the ban for an IP permanent.

    What I have observed is that almost all "bots" that fall into my bot trap request 11 pages then go on their merry way without causing me any problems. Only a very small percentage of IPs that fall into my bot trap request more than 12 pages (maybe less than 5%).

    The code I initially use once my bad bot ban threshold has crossed is:
    Code:
    $file="BadBotBlackList.txt";
    $handle = fopen($file, "a");
    $record="'$REMOTE_ADDR'|".date("r")."\n";
    fwrite($handle, $record);
    fclose($handle);
    
    $file="BadBotLog.txt";
    $handle = fopen($file, "a");
    $record="'$REMOTE_ADDR'|".date("r")."|'$HTTP_USER_AGENT'|'$HTTP_ACCEPT'\n";
    fwrite($handle, $record);
    fclose($handle);
    Each time a page request is made and before any database connections are created, I use the following code to check to see if a IP address is in my block list and log any request by blocked IPs:
    Code:
    $contents = file_get_contents("BadBotBlackList.txt");
    if(stristr($contents,"'".$REMOTE_ADDR."'")){
    	$file="BadBotLog.txt";
    	$handle = fopen($file, "a");
    	$record="'$REMOTE_ADDR'|".date("r")."|'$HTTP_USER_AGENT'|'$HTTP_ACCEPT'\n";
    	fwrite($handle, $record);
    	fclose($handle);
    	
    	header("HTTP/1.1 403 Service Unavailable");
    	echo "<html><head><title>403</title></head><body><h1>Access Denied</h1><p>Access to this site has been denied for requests coming from the IP address $REMOTE_ADDR due to abuses of our system resources. Spidering, indexing or caching this site for purposes other than publicly available Internet search engines (e.g. Google, MSN Search, etc.) is strictly prohibited.</p></body></html>";
    	
    	exit();
    	}

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •