Page 1 of 2 12 LastLast
Results 1 to 15 of 16

Thread: AskJeeves Raped my AWS Site

  1. #1

    AskJeeves Raped my AWS Site

    AskJeeves raped one of my AWS sites to the tune of 2.8 Gigs so far this month, and according to my stats has delivered no clicks. Anyone else seen anything like this? What would be the best way to stop it?

  2. #2
    Administrator Chris's Avatar
    Join Date
    Feb 2003
    Location
    East Lansing, MI USA
    Posts
    7,053
    If you really want to forgo being listed there you can block it with robots.txt
    Chris Beasley - My Guide to Building a Successful Website[size=1]
    Content Sites: ABCDFGHIJKLMNOP|Forums: ABCD EF|Ecommerce: Swords Knives

  3. #3
    Do you think that would be a good idea to do that? I just put the site up a couple weeks ago, if it takes a Gig of transfer per week at this rate, and delivers no traffic, I probably should block it. Has Jeeves done anything like that to your sites? Does it eventually produce traffic, thus making it worthwhile?

  4. #4
    Web Monkey MarkB's Avatar
    Join Date
    Nov 2003
    Location
    London, UK
    Posts
    1,783
    Jeeves sends SOME traffic my way, but all in all it's a waste of a search engine.

    (IMO)
    Stepping On Wires - the new blog

  5. #5
    Administrator Chris's Avatar
    Join Date
    Feb 2003
    Location
    East Lansing, MI USA
    Posts
    7,053
    I would wait and see if you eventually get traffic.

    Personally I have 5 terabytes of bandwidth monthly I only use a fraction of that so this isn't something I really pay attention to.
    Chris Beasley - My Guide to Building a Successful Website[size=1]
    Content Sites: ABCDFGHIJKLMNOP|Forums: ABCD EF|Ecommerce: Swords Knives

  6. #6
    Chronic Entrepreneur
    Join Date
    Nov 2003
    Location
    Tulsa, Oklahoma, USA
    Posts
    1,112
    You could try putting this in your robots.txt file:

    User-Agent: teoma
    crawl-delay: 60

    This makes Ask Jeeves wait 60 seconds between page requests. You can experiment with changing this number until you slow it down enough.

    More details about controlling the teoma spider can be found here: http://sp.ask.com/docs/about/tech_teoma.html

  7. #7
    Registered
    Join Date
    Dec 2004
    Posts
    37
    Jeeves made love to one of my sites last week. Spent two days on the site, shows almost 9k hits in my web log:

    AskJeeves 8872+21 31.07 MB 14 Jan 2005 - 08:26

    I have blocked out most all bots from my site, except google, yahoo, msn, and (for now) Jeeves. I've been checking their index every few days to see if any results show up. So far none. I'm guessing it may take a while for them to appear.

    I don't want every bot in creation indexing my sites. I've had to block a few by IP because they fail to comply with the robots.txt standard and continue right on indexing even though they are in my robots.txt file.

    I'm waiting to see if any results from Jeeves show up over the next month or two. If not, they will be added to the list.

  8. #8
    Registered Xander's Avatar
    Join Date
    Oct 2004
    Location
    UK
    Posts
    263
    I've had a similar problem with AskJeeves on my forum. It had about 10 or 20 concurrent sessions running, one of the moderators decided to ban the IP and we've not had trouble from it again(but we've never had much traffic from them so it was no problem). But I have noticed recently as the forum has grown a lot, there are atleast two of the majors search engines bots living at the site(which is ok as I have the spare capacity) but is strange.

  9. #9
    Registered User davesplace1's Avatar
    Join Date
    Oct 2003
    Location
    Seaside, Oregon
    Posts
    192
    Hey I got 3 visitors from AskJeeves yesterday, time to run out and buy a new SUV . Not much traffic from these minor search engines, but it still is traffic. I have never had a bot eat up a lot of bandwith, but my sites are mosly text anyway.

  10. #10
    Registered Member moonshield's Avatar
    Join Date
    Aug 2004
    Location
    Charlotte
    Posts
    1,281
    hey jeeves come on over.

  11. #11
    4x4
    Join Date
    Oct 2004
    Posts
    1,043
    Why ban something that may eventually pay itself off in the future. You wont know today, and most likely not next week but a month or two down the road "what if" Jeeve's was giving you a sale a day and you were now missing it?

    1 or 2gb per-week is NOTHING. If this cost a lot for you it's possibly time to look into a new webhosting provider, or if you have the funds and wish a dedicated server or vps.

    Some things to think about.

  12. #12
    Registered Member moonshield's Avatar
    Join Date
    Aug 2004
    Location
    Charlotte
    Posts
    1,281
    yea, search engines should be allowed to play as much as they please for they are what help drive the sites.

  13. #13
    Registered
    Join Date
    Dec 2004
    Posts
    37
    Here is why you want to control your site with robots.txt and control the ones you want there and the ones you don't, get rid of.

    If you have a small seven page site with some text and graphics, it would not be a problem. But if you have a site that generates dynamic content and has thousands of pages of categories that lead to an incredible amount of individual pages, then letting bots have a free run is not a good idea.


    This is a small capture of hundreds of pages it was running through every 9-10 seconds:

    Code:
    Host: 133.9.238.77
    		
    /category-140-2.html
    	Http Code: 403 	Date: Jan 19 15:05:35 	Http Version: HTTP/1.1 	Size in Bytes: -
    	Referer: -
    	Agent: e-SocietyRobot(http://www.yama.info.waseda.ac.jp/~yamana/es/)
    	|
    	|
    	|
    		
    /category-140-3.html
    	Http Code: 403 	Date: Jan 19 15:05:45 	Http Version: HTTP/1.1 	Size in Bytes: -
    	Referer: -
    	Agent: e-SocietyRobot(http://www.yama.info.waseda.ac.jp/~yamana/es/)
    	|
    	|
    	|
    		
    /category-140-4.html
    	Http Code: 403 	Date: Jan 19 15:05:56 	Http Version: HTTP/1.1 	Size in Bytes: -
    	Referer: -
    	Agent: e-SocietyRobot(http://www.yama.info.waseda.ac.jp/~yamana/es/)
    	|
    	|
    	|
    		
    /category-140-5.html
    	Http Code: 403 	Date: Jan 19 15:06:06 	Http Version: HTTP/1.1 	Size in Bytes: -
    	Referer: -
    	Agent: e-SocietyRobot(http://www.yama.info.waseda.ac.jp/~yamana/es/)
    	|
    	|
    	|
    I have no use for this bot. Never seen it, never heard of it. It came along while google and msn both were crawling my site early in the morning. This log is from this afternoon when it came back to keep on going.

  14. #14
    Registered Member moonshield's Avatar
    Join Date
    Aug 2004
    Location
    Charlotte
    Posts
    1,281
    yea, but sometimes the unknown bots dont follow the robots.txt.

  15. #15
    4x4
    Join Date
    Oct 2004
    Posts
    1,043
    Quote Originally Posted by thepoorman
    yea, but sometimes the unknown bots dont follow the robots.txt.
    And sometimes they are for new upcoming search engines. If you want to deny yourself a possible head start over others who are not being spydered yet, well, then you should try to block them. Myself, the bandwidth is not noticable and would be worth it in the long run even if I paid a few extra bucks a month.

Similar Threads

  1. Your .Community - Part 1
    By MarkB in forum Community Management
    Replies: 12
    Last Post: 06-06-2004, 03:59 AM
  2. Global Travel site is having a question
    By CoolCult in forum General Management Issues
    Replies: 1
    Last Post: 06-02-2004, 01:09 AM
  3. Advertise on GalacNet! ( 500,000 impressions monthly! )
    By galacnet in forum The Marketplace
    Replies: 0
    Last Post: 05-04-2004, 02:25 AM
  4. Enhancing your site with Web Mail
    By MarkB in forum Community Management
    Replies: 2
    Last Post: 04-17-2004, 12:48 PM
  5. Stickiest Site?
    By BlueRoom in forum Advertising & Affiliate Programs
    Replies: 3
    Last Post: 02-20-2004, 08:47 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •