AskJeeves raped one of my AWS sites to the tune of 2.8 Gigs so far this month, and according to my stats has delivered no clicks. Anyone else seen anything like this? What would be the best way to stop it?
AskJeeves raped one of my AWS sites to the tune of 2.8 Gigs so far this month, and according to my stats has delivered no clicks. Anyone else seen anything like this? What would be the best way to stop it?
Peter T Davis [/URL]
Do you think that would be a good idea to do that? I just put the site up a couple weeks ago, if it takes a Gig of transfer per week at this rate, and delivers no traffic, I probably should block it. Has Jeeves done anything like that to your sites? Does it eventually produce traffic, thus making it worthwhile?
Peter T Davis [/URL]
Jeeves sends SOME traffic my way, but all in all it's a waste of a search engine.
(IMO)
Stepping On Wires - the new blog
I would wait and see if you eventually get traffic.
Personally I have 5 terabytes of bandwidth monthly I only use a fraction of that so this isn't something I really pay attention to.
You could try putting this in your robots.txt file:
User-Agent: teoma
crawl-delay: 60
This makes Ask Jeeves wait 60 seconds between page requests. You can experiment with changing this number until you slow it down enough.
More details about controlling the teoma spider can be found here: http://sp.ask.com/docs/about/tech_teoma.html
Jeeves made love to one of my sites last week. Spent two days on the site, shows almost 9k hits in my web log:
AskJeeves 8872+21 31.07 MB 14 Jan 2005 - 08:26
I have blocked out most all bots from my site, except google, yahoo, msn, and (for now) Jeeves. I've been checking their index every few days to see if any results show up. So far none. I'm guessing it may take a while for them to appear.
I don't want every bot in creation indexing my sites. I've had to block a few by IP because they fail to comply with the robots.txt standard and continue right on indexing even though they are in my robots.txt file.
I'm waiting to see if any results from Jeeves show up over the next month or two. If not, they will be added to the list.
I've had a similar problem with AskJeeves on my forum. It had about 10 or 20 concurrent sessions running, one of the moderators decided to ban the IP and we've not had trouble from it again(but we've never had much traffic from them so it was no problem). But I have noticed recently as the forum has grown a lot, there are atleast two of the majors search engines bots living at the site(which is ok as I have the spare capacity) but is strange.
Hey I got 3 visitors from AskJeeves yesterday, time to run out and buy a new SUV . Not much traffic from these minor search engines, but it still is traffic. I have never had a bot eat up a lot of bandwith, but my sites are mosly text anyway.
Why ban something that may eventually pay itself off in the future. You wont know today, and most likely not next week but a month or two down the road "what if" Jeeve's was giving you a sale a day and you were now missing it?
1 or 2gb per-week is NOTHING. If this cost a lot for you it's possibly time to look into a new webhosting provider, or if you have the funds and wish a dedicated server or vps.
Some things to think about.
yea, search engines should be allowed to play as much as they please for they are what help drive the sites.
Here is why you want to control your site with robots.txt and control the ones you want there and the ones you don't, get rid of.
If you have a small seven page site with some text and graphics, it would not be a problem. But if you have a site that generates dynamic content and has thousands of pages of categories that lead to an incredible amount of individual pages, then letting bots have a free run is not a good idea.
This is a small capture of hundreds of pages it was running through every 9-10 seconds:
I have no use for this bot. Never seen it, never heard of it. It came along while google and msn both were crawling my site early in the morning. This log is from this afternoon when it came back to keep on going.Code:Host: 133.9.238.77 /category-140-2.html Http Code: 403 Date: Jan 19 15:05:35 Http Version: HTTP/1.1 Size in Bytes: - Referer: - Agent: e-SocietyRobot(http://www.yama.info.waseda.ac.jp/~yamana/es/) | | | /category-140-3.html Http Code: 403 Date: Jan 19 15:05:45 Http Version: HTTP/1.1 Size in Bytes: - Referer: - Agent: e-SocietyRobot(http://www.yama.info.waseda.ac.jp/~yamana/es/) | | | /category-140-4.html Http Code: 403 Date: Jan 19 15:05:56 Http Version: HTTP/1.1 Size in Bytes: - Referer: - Agent: e-SocietyRobot(http://www.yama.info.waseda.ac.jp/~yamana/es/) | | | /category-140-5.html Http Code: 403 Date: Jan 19 15:06:06 Http Version: HTTP/1.1 Size in Bytes: - Referer: - Agent: e-SocietyRobot(http://www.yama.info.waseda.ac.jp/~yamana/es/) | | |
yea, but sometimes the unknown bots dont follow the robots.txt.
And sometimes they are for new upcoming search engines. If you want to deny yourself a possible head start over others who are not being spydered yet, well, then you should try to block them. Myself, the bandwidth is not noticable and would be worth it in the long run even if I paid a few extra bucks a month.Originally Posted by thepoorman
Bookmarks