Page 1 of 3 123 LastLast
Results 1 to 15 of 36

Thread: WebReaper & other programs

  1. #1
    Registered
    Join Date
    Dec 2003
    Posts
    16

    Question WebReaper & other programs

    Hello,

    I am new here and hope to find some help. I recently re-designed my site, added a ton of new information, plus added a forum. This morning while checking my logs I discovered that a user downloaded my entire site including the forum using WebReaper. According to the logs, the user did not obtain admin or user information (as far as I can tell).

    Is there a way to prevent people from downloading my entire site using programs like this?

    Any help is appreciated.

    TIA,
    chelle

  2. #2
    Administrator Chris's Avatar
    Join Date
    Feb 2003
    Location
    East Lansing, MI USA
    Posts
    7,055
    There most certainly is. Its not fool-proof, since many of these programs can pretend to be legit browsers, but there are things you can do.

    What type of server do you have and what technologies are you using to generate your site? (HTML, ASP, PHP, etc?)
    Chris Beasley - My Guide to Building a Successful Website[size=1]
    Content Sites: ABCDFGHIJKLMNOP|Forums: ABCD EF|Ecommerce: Swords Knives

  3. #3
    Registered
    Join Date
    Dec 2003
    Posts
    16
    Hello Chris,

    Thank you so much for replying.

    We have an apache server, using php and mysql. The forum is generated using php 4.0.6 and mysql, and the main site itself is htm.

    If you need to look at the site you can find it here: http://finarama.com/home.htm and there is a link to the forum from that page.

    If I were to convert the main site to php and make it to where users had to log in to view the material, would that prevent these programs from downloading the entire site? I was also wondering if I made the forum available to only the logged in user, would that also help?

    Thank you again for your response.

    chelle

  4. #4
    Administrator Chris's Avatar
    Join Date
    Feb 2003
    Location
    East Lansing, MI USA
    Posts
    7,055
    Here is what I do on one site.

    Code:
    $agent = strtolower($HTTP_USER_AGENT);
    if ((strstr($agent, "rip")) ||
    (strstr($agent, "get")) ||
    (strstr($agent, "icab")) ||
    (strstr($agent, "wget")) ||
    (strstr($agent, "ninja")) ||
    (strstr($agent, "reap")) ||
    (strstr($agent, "subtract")) ||
    (strstr($agent, "offline")) ||
    (strstr($agent, "xaldon")) ||
    (strstr($agent, "ecatch")) ||
    (strstr($agent, "msiecrawler")) ||
    (strstr($agent, "rocketwriter")) ||
    (strstr($agent, "httrack")) ||
    (strstr($agent, "track")) ||
    (strstr($agent, "teleport")) ||
    (strstr($agent, "teleport pro")) ||
    (strstr($agent, "webzip")) ||
    (strstr($agent, "extractor")) ||
    (strstr($agent, "lepor")) ||
    (strstr($agent, "copier")) ||
    (strstr($agent, "disco")) ||
    (strstr($agent, "capture")) ||
    (strstr($agent, "anarch")) ||
    (strstr($agent, "snagger")) ||
    (strstr($agent, "superbot")) ||
    (strstr($agent, "strip")) ||
    (strstr($agent, "block")) ||
    (strstr($agent, "saver")) ||
    (strstr($agent, "webdup")) ||
    (strstr($agent, "webhook")) ||
    (strstr($agent, "webdup")) ||
    (strstr($agent, "pavuk")) ||
    (strstr($agent, "interarchy")) ||
    (strstr($agent, "blackwidow")) ||
    (strstr($agent, "w3mir")) ||
    (strstr($agent, "plucker")) ||
    (strstr($agent, "cherry"))){
    header("Location: http://www.example.com/banned/banned.php");
    exit();
    }
    Including that on every page should help.

    Another thing you can do is after you catch someone doing this you can ban their IP address using .htaccess
    Chris Beasley - My Guide to Building a Successful Website[size=1]
    Content Sites: ABCDFGHIJKLMNOP|Forums: ABCD EF|Ecommerce: Swords Knives

  5. #5
    Registered GCT13's Avatar
    Join Date
    Aug 2003
    Location
    NYC
    Posts
    480
    That's a nice little snipet of code.

    Do you run into this problem with regularity Chirs?
    ....

  6. #6
    Administrator Chris's Avatar
    Join Date
    Feb 2003
    Location
    East Lansing, MI USA
    Posts
    7,055
    Yes, mostly on my literature site but also on other sites. I hate "offline browsers."
    Chris Beasley - My Guide to Building a Successful Website[size=1]
    Content Sites: ABCDFGHIJKLMNOP|Forums: ABCD EF|Ecommerce: Swords Knives

  7. #7
    Registered
    Join Date
    Dec 2003
    Posts
    16
    Hi Chris,

    Thank you so much. Can you explain how I insert this into my htm pages? Also, below is a normal line from my log. I notice it uses the "get" command. Is that the same as the "get" command in the code you just posted? How will it affect my pages when loading?

    Again,
    Thank you so much,

    chelle

    Code:
    207.213.164.43 - - [04/Dec/2003:14:14:46 -0800] "GET /tba/identification.htm HTTP/1.1" 200 41468 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; yie6_SBC; .NET CLR 1.1.4322)"
    When previewing, the code makes it go off the page. Sorry about this.

  8. #8
    Administrator Chris's Avatar
    Join Date
    Feb 2003
    Location
    East Lansing, MI USA
    Posts
    7,055
    That only works on PHP pages. To install it just include it at the top (and register_globals needs to be turned on).

    a "Get" method is the normal method for pulling up webpages. Nothing abnormal about that.
    Chris Beasley - My Guide to Building a Successful Website[size=1]
    Content Sites: ABCDFGHIJKLMNOP|Forums: ABCD EF|Ecommerce: Swords Knives

  9. #9
    Senior Member chromate's Avatar
    Join Date
    Aug 2003
    Location
    UK
    Posts
    2,348
    My site's just been ripped as well.

    This is one to add to the list...

    WebCopier v3.6

  10. #10
    Administrator Chris's Avatar
    Join Date
    Feb 2003
    Location
    East Lansing, MI USA
    Posts
    7,055
    Its on the list.

    Anything with the word "copier" is.
    Chris Beasley - My Guide to Building a Successful Website[size=1]
    Content Sites: ABCDFGHIJKLMNOP|Forums: ABCD EF|Ecommerce: Swords Knives

  11. #11
    Registered
    Join Date
    Dec 2003
    Posts
    16
    I'm wondering if it is possible to make this into an external file and link to it from all pages?

  12. #12
    Senior Member chromate's Avatar
    Join Date
    Aug 2003
    Location
    UK
    Posts
    2,348
    Originally posted by Chris
    Its on the list.

    Anything with the word "copier" is.
    Oh yeah, I missed it.

  13. #13
    Senior Member chromate's Avatar
    Join Date
    Aug 2003
    Location
    UK
    Posts
    2,348
    Originally posted by chelle
    I'm wondering if it is possible to make this into an external file and link to it from all pages?
    Yes, you can do that and just include(...) it on each page. Make sure it's the very first thing on the page though before anything gets sent to the browser. Otherwise it will fail if it tries to send the header.

  14. #14
    Registered
    Join Date
    Dec 2003
    Posts
    16
    I took one of my htm pages into dreamweaver and resaved it as a php page rather than htm. I know absolutely nothing about php and when I looked at the source it looks the same as any regular htm, html page. I'm assuming that this is correct.

    To insert this code correctly using an external file it would like something like this? If not, I'm sorry, but I'll need to be told exactly how to do this correctly.

    Code:
    (link rel="$agent" type="text/txt" href="anydirectory/nameoffile.txt")
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
    Thanks for your help.

    chelle

  15. #15
    Senior Member chromate's Avatar
    Join Date
    Aug 2003
    Location
    UK
    Posts
    2,348
    No, you're getting really confused here

    PHP is a scripting language. It's what we call a "server side" language. This means that it runs on the server, before the results are sent to the user's browser, as standard html. This is why when you look at the source code it looks like regular html.

    To illustrate this by refering to the PHP code we're dealing with here, here's what happens:

    The user arrives at your site. The user's browser makes a request for one of your site's pages. The web server then recieves that request for the page. If it's a .php file (which it is in this case) any php code within the file is run. In our case, the user's "user agent" (browser) is checked to make sure it's not a site ripper. If it is a site ripper the server forwards the user to another page informing them that they can't access the site. If not, then the normal page content is sent to the browser.

    Hope that makes how it works a little clearer.

    Now, we want to include the "anti site ripper" php code in every page of the site. So use this php include code at the top of each of your pages:

    PHP Code:
    <?php include("anti_rip.php"); ?>
    This assumes that the anti-rip.php file contains the exact code Chris pasted above. It also assumes that the ani_rip.php file is in the same directory as the php file that's trying to include it. This will most likely not be convenient, so just include the path to the anti_rip.php file in the quoted section of the function above.

    What's happening here is the page is being constructed by "including" other files, and parsing (running) them before being sent to the browser as HTML.
    Last edited by chromate; 12-05-2003 at 09:25 AM.

Similar Threads

  1. Review: SearchFeed & RevenuePilot
    By Chris in forum Advertising Networks
    Replies: 22
    Last Post: 03-28-2008, 07:15 AM
  2. Website To Harddrive programs...
    By incka in forum General Chat
    Replies: 8
    Last Post: 04-28-2004, 12:09 AM
  3. Does anybody know of any Game related Affiliate Programs?
    By john190 in forum Advertising & Affiliate Programs
    Replies: 4
    Last Post: 03-18-2004, 05:08 PM
  4. Replies: 5
    Last Post: 01-09-2004, 12:44 PM
  5. Referral programs...
    By Stevens in forum Advertising & Affiliate Programs
    Replies: 3
    Last Post: 11-08-2003, 02:50 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •