Results 1 to 11 of 11

Thread: Does Googlebot hit your AWS Sites Hard?

  1. #1
    Banned
    Join Date
    Dec 2003
    Posts
    152

    Does Googlebot hit your AWS Sites Hard?

    So far today, Googlebot has hit my AWS sites about 45,000 times. It just extensively crawled some of these sites a few days ago. I've checked the logs, and Googlebot is often requesting pages one or more times per second, which is out of line. When Googlebot is crawling (which always seems to be during peak hours), it definitely loads the CPU. I'd contact Google, but am somewhat afraid I might incur the "thin affiliate" penalty. I'm looking into using mod_bwshare or bw_mod. Anybody else getting hit hard by Googlebot with their AWS sites?

  2. #2
    I am made of sand
    Join Date
    Aug 2005
    Posts
    33
    I've heard that this is called "Google Bombing". I've had the "Become.com" robot do the same thing to me. I put a snippit in my robots.txt file that slows it down to a request every 10 seconds to ease the load. I bet you could do the same to the google bot.

    This is what I used:

    User-agent: BecomeBot
    Crawl-Delay: 10

    This is strait off of Become.com's website about thier bot.

  3. #3
    Banned
    Join Date
    Dec 2003
    Posts
    152
    Hmm, I searched a bit and it doesn't appear that Googlebot recognizes "crawl-delay". Yahoo does, but it appears pretty well-behaved.

  4. #4
    Banned
    Join Date
    Dec 2003
    Posts
    152
    msnbot does too, but that's well-behaved too (aside from an isolated bombing of a rss feed).

  5. #5
    Banned
    Join Date
    Dec 2003
    Posts
    152
    This might be something to try:
    Make sure your web server supports the If-Modified-Since HTTP header. This feature allows your web server to tell Google whether your content has changed since we last crawled your site. Supporting this feature saves you bandwidth and overhead.
    http://www.google.com/webmasters/guidelines.html

    Would avoid calling AWS for whatever time period.

  6. #6
    Administrator Chris's Avatar
    Join Date
    Feb 2003
    Location
    East Lansing, MI USA
    Posts
    7,055
    Google Bombing is something else entirely. It refers to a massive number of indentical anchor text links (usually from blogs) pushing an unrelated page to the top of the search results.
    Chris Beasley - My Guide to Building a Successful Website[size=1]
    Content Sites: ABCDFGHIJKLMNOP|Forums: ABCD EF|Ecommerce: Swords Knives

  7. #7
    I'm the oogie boogie man! James's Avatar
    Join Date
    Aug 2004
    Location
    Canada
    Posts
    1,566
    Chris is right. We all remember 'miserable failure' and that had nothing to do with Google crashing the President's biography page.

    Currently, my 2 sites of which I have access to the stats of which have lots of pages are
    http://buy-video-games.net
    http://cheatfire.com

    They get quite a bit of search engine crawls (or at least, what I consider quite a bit)
    http://toolazytoblog.com/temp/buyvideogamesspiders.gif
    http://toolazytoblog.com/temp/cheatfirespiders.gif

    But I've never had it even slow down the server that I've ever noticed. Everything runs fine, and I've never seen it crash.

  8. #8
    Registered
    Join Date
    Jul 2005
    Posts
    43
    Have you also noticed a change of the number of pages that are indexed in Google? It visits every page, but at the end google reduces the number of indexed pages of AWS sites

  9. #9
    Banned
    Join Date
    Dec 2003
    Posts
    152
    I switched from SOAP to REST calls - seems a bit quicker. I considered caching of various sorts, but just implemented checking for the If-Modified-Since header and sending "Not Modified" if it's within a certain time period. Should hopefully minimize the impact a great deal.

  10. #10
    Does Googlebot hit your AWS Sites Hard?
    Like a pimp smacking his ho's.

  11. #11
    I'm the oogie boogie man! James's Avatar
    Join Date
    Aug 2004
    Location
    Canada
    Posts
    1,566
    Like a pimp smacking his ho's.
    OH SNAP! DAS OFF DA HIZZEH!

Similar Threads

  1. Feedback for network of sites
    By newsniche in forum General Chat
    Replies: 1
    Last Post: 05-25-2005, 07:46 AM
  2. Googlebot crawling behavior
    By ASP-Hosting.ca in forum Search Engine Optimization
    Replies: 5
    Last Post: 03-29-2005, 07:11 PM
  3. Teen/student sites wanted
    By tony in forum General Promotion
    Replies: 4
    Last Post: 01-14-2005, 10:20 AM
  4. Checking for corrupted data in hard drives
    By johnn in forum Website Programming & Databases
    Replies: 3
    Last Post: 10-04-2004, 04:12 PM
  5. Content Sites vs. ECommerce sites
    By ASP-Hosting.ca in forum Advertising & Affiliate Programs
    Replies: 45
    Last Post: 05-10-2004, 08:07 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •