PDA

View Full Version : Anyone know what spider this might be?



dc dalton
03-03-2006, 11:03 AM
I'm getting a strange input into one of my sites log files, about 10 or so a day. Here is the log file input:

[ip address removed] - - [02/Mar/2006:00:54:47 -0500] "GET /robots.txt HTTP/1.1" 500 1022 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1"

When I take that IP and put it into a browser I get a secure log in for "WT54RG" and of course cant get it.

It been hitting this site quite hard all of a sudden and I'm not sure what the heck it may be. It hits the robots.txt file and then runs thru a bunch of pages.

Aren't legit spiders supposed to identify themselves? I know it's a spider because the tracker isnt picking it up.

stymiee
03-03-2006, 11:06 AM
WT54RG - looks like a linksys router

dc dalton
03-03-2006, 11:09 AM
Yeah not this is getting WEIRD ..... I did a whois on the IP and it points to my isp but thats NOT my IP addy. What in the heck would this be about?

Could this be my router someway doing this?

KLB
03-03-2006, 11:09 AM
Have you done a lookup using something like http://domainsdb.net/

Chris
03-03-2006, 11:13 AM
Or arin.net

I'm guessing its a private user trying to download your entire site with an offline browser (aka site ripper). This happens daily on my literature site. I ban them when I find them since it really puts a load on the server.

KLB
03-03-2006, 11:15 AM
I use a bot trap to catch this type of activity and block it.

paul
03-03-2006, 12:59 PM
I use a bot trap to catch this type of activity and block it.

Is this something you coded yourself or is it available somewhere?

KLB
03-03-2006, 01:08 PM
I coded it myself. Basically I created a webpage that records the UA and IP address of anything that hits it. I then linked to this page from the header are of my website and via faux links and excluded the file via the robots.txt file. Any robot that obeys the robots.txt file will not hit this page and real users will not see links to it. Site caching programs on the other hand fall into the trap very quickly.

Each time any page is loaded a query is run to see if the UA/IP address combination had hit my bot trap recently. If it had it is denied access to the webpage in question. To prevent false positives, the first few hits after hitting the bot trap are allowed but slowed down via sleep commands such that the page takes like 30 - 45 seconds to load. Normal users would only slightly notice this, but it slows down site cachers to a crawl.

Masetek
03-03-2006, 04:26 PM
That's a smart script Ken, nice one :)

Fortunately I don't have this problem with any of my sites.

KLB
03-03-2006, 04:37 PM
My site has something like 20,000 - 50,000 theroretical pages. One site caching program can put a world of hurt on my bandwidth and performance especially if it isn't polite and tries to pull too much too fast. Before implementing my bot trap, I had instances where one users would suck down 1gb of bandwidth. Can you say ouch. I also had instances where caching programs knocked my site offline because they spidered too many pages too quickly and ground the DB server to a halt.

Chris
03-03-2006, 04:48 PM
I'm in the same situation with my literature site, though my solution isn't so elegant as Ken's. Another article idea there Ken ;)

Things like this and ad blocking are the type of articles I like to both read & write. Kinda like my SE friendly urls article or my recent article on building a cache.

Its half coding, half management. It identifies a problem and provides a step by step guide to implementing a solution. In one word its practical.

dc dalton
03-04-2006, 01:40 PM
OK, I think I have that one narrowed down to my house ... dont know what machines is doing it but it's my freaking IP BUT

I have a new one that is appearing every day now:


152.31.229.45 - - [04/Mar/2006:01:49:16 -0500] "POST /xmlrpc.php HTTP/1.1" 500 1864 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)"
152.31.229.45 - - [04/Mar/2006:01:49:17 -0500] "POST /blog/xmlrpc.php HTTP/1.1" 500 1864 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)"
152.31.229.45 - - [04/Mar/2006:01:49:18 -0500] "POST /blog/xmlsrv/xmlrpc.php HTTP/1.1" 500 1864 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)"
152.31.229.45 - - [04/Mar/2006:01:49:19 -0500] "POST /blogs/xmlsrv/xmlrpc.php HTTP/1.1" 500 1864 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)"
152.31.229.45 - - [04/Mar/2006:01:49:20 -0500] "POST /drupal/xmlrpc.php HTTP/1.1" 500 1864 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)"
152.31.229.45 - - [04/Mar/2006:01:49:21 -0500] "POST /phpgroupware/xmlrpc.php HTTP/1.1" 500 1864 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)"
152.31.229.45 - - [04/Mar/2006:01:49:23 -0500] "POST /wordpress/xmlrpc.php HTTP/1.1" 500 1864 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)"
152.31.229.45 - - [04/Mar/2006:01:49:24 -0500] "POST /xmlrpc.php HTTP/1.1" 500 1864 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)"
152.31.229.45 - - [04/Mar/2006:01:49:25 -0500] "POST /xmlrpc/xmlrpc.php HTTP/1.1" 500 1864 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)"
152.31.229.45 - - [04/Mar/2006:01:49:26 -0500] "POST /xmlsrv/xmlrpc.php HTTP/1.1" 500 1864 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)"


The IP is Blue Ridge community college in NC but why is it being accessed like this?

Obviously its coming from a wordpress install (which I know nothing about) .... does anyone get what this thing is doing?

Dan Grossman
03-04-2006, 01:45 PM
It's probably someone scanning your web server for PHP scripts with known vulnerabilities.

dc dalton
03-04-2006, 01:50 PM
It's probably someone scanning your web server for PHP scripts with known vulnerabilities.

Really? Well they will have fun with that ... the sites runs in server-side java. No PHP at all!

KLB
03-04-2006, 02:04 PM
Block the IP address for a couple of weeks. This will at least clean up the server log.

dc dalton
03-04-2006, 03:17 PM
Block the IP address for a couple of weeks. This will at least clean up the server log.

One step ahead of you, just blocked this one and one from Japan that was running a cgi script across the server (that HAD to be bad)

Grant29
03-10-2006, 11:58 AM
OK, I think I have that one narrowed down to my house ... dont know what machines is doing it but it's my freaking IP BUT

I have a new one that is appearing every day now:



The IP is Blue Ridge community college in NC but why is it being accessed like this?

Obviously its coming from a wordpress install (which I know nothing about) .... does anyone get what this thing is doing?


Most definately scanning for a vulnerability. Many CMSs were using using these files. You probably just got hit by a random scan.

Link: http://isc.sans.org/diary.php?storyid=823

Grant