Results 1 to 9 of 9

Thread: Insecurity of robots.txt ?

  1. #1
    Registered
    Join Date
    Mar 2006
    Posts
    156

    Insecurity of robots.txt ?

    I just realized that anybody can access your robots.txt file and see all the pages/directories that you are listing there.
    Is not that robots.txt is really insecure, but from a Web Security perspective it's giving too much info to a potential attacker. The only thing i have to do to find, let's say the "Admin Panel" of a site, is to check their robots.txt file and see if it's there.

    It's not that someone will hack your site by using robots.txt (of course not!), but it's a quick and easy way for an outsider to start gathering info for their attack.
    I like the concept of "Security through obscurity". It's not perfect, but i think it's better if outsiders don't even know what my admin panel is, right?

    The question is...is it worth putting our "Admin Panels" url (or other important directories) in robots.txt?

  2. #2
    Registered
    Join Date
    May 2005
    Posts
    81
    If you like security through obscurity, you would never even link to your admin URL anywhere on your site, giving you no reason to even list it in robots.txt (since google wouldn't know about it)

    You could alternatively block if via <meta headers> if you think they somehow still get access to it (google)

    Anyway, just make sure all your files are protected and what not and you'll be fine.

  3. #3
    Registered
    Join Date
    Apr 2006
    Location
    Michigan
    Posts
    99
    Yeah, the admin panel should have authentication which would keep the robots out anyway. There is no need to list it in robots.txt.

  4. #4
    Administrator Chris's Avatar
    Join Date
    Feb 2003
    Location
    East Lansing, MI USA
    Posts
    7,055
    Or, if you really wanted to list it, but it in a secondary directory

    /blocked/admin128/

    Deny access to /blocked/ in your robots.txt file and no need to specify admin128
    Chris Beasley - My Guide to Building a Successful Website[size=1]
    Content Sites: ABCDFGHIJKLMNOP|Forums: ABCD EF|Ecommerce: Swords Knives

  5. #5
    Registered
    Join Date
    Mar 2006
    Posts
    156
    Quote Originally Posted by polspoel View Post
    If you like security through obscurity, you would never even link to your admin URL anywhere on your site, giving you no reason to even list it in robots.txt (since google wouldn't know about it)
    Sure, but i see lots of people doing it, that maybe haven't realized that it's not the best thing to list their admins or other secret directories in robots.txt.


    That's a great solution Chris. I didn't think of that.


    Also, is it posible that a page that's not in your robots.txt and it's not listed in your site gets indexed? I don't know how...maybe by some hosting logs or something? Because if that's the case (i don't think so), just not listing your admin is not enough. In that case, Chris solution could work.

  6. #6
    Chronic Entrepreneur
    Join Date
    Nov 2003
    Location
    Tulsa, Oklahoma, USA
    Posts
    1,112
    Quote Originally Posted by Nico View Post
    Also, is it posible that a page that's not in your robots.txt and it's not listed in your site gets indexed? I don't know how...maybe by some hosting logs or something? Because if that's the case (i don't think so), just not listing your admin is not enough. In that case, Chris solution could work.
    I think that Google also discovers new URL's by people visiting them with the Google toolbar installed. I don't have any proof of this, just anecdotal evidence of new pages not known by anyone and not linked to from anywhere on the web showing up in search results. The only way I can think of that Google finds them is through the toolbar reporting back when I visit the page.

  7. #7
    Registered
    Join Date
    Mar 2006
    Posts
    350
    Quote Originally Posted by Westech View Post
    I think that Google also discovers new URL's by people visiting them with the Google toolbar installed. I don't have any proof of this, just anecdotal evidence of new pages not known by anyone and not linked to from anywhere on the web showing up in search results. The only way I can think of that Google finds them is through the toolbar reporting back when I visit the page.
    That's certainly an interesting theory. I'm not going to refute it, simply because I don't have evidence against it -- I've just never experienced that nor have I read about something like that happening.

    Regarding in the initial question: I never block access to pages such as the admin panel. Considering it's not linked anywhere, I see no reason to do so. That said, any page that displays private information should have some sort of authentication system to begin with. I think enough visits to the page with Alexa's toolbar installed may cause Alexa to pick up on it and list it on the site's Alexa listing.
    Last edited by MaxS; 04-16-2007 at 03:00 PM.
    Max

  8. #8
    Senior Member Kyle's Avatar
    Join Date
    Jun 2003
    Location
    Chicago
    Posts
    840
    Quote Originally Posted by MaxS View Post
    That's certainly an interesting theory. I'm not going to refute it, simply because I don't have evidence against it -- I've just never experienced that nor have I read about something like that happening.

    Regarding in the initial question: I never block access to pages such as the admin panel. Considering it's not linked anywhere, I see no reason to do so. That said, any page that displays private information should have some sort of authentication system to begin with. I think enough visits to the page with Alexa's toolbar installed may cause Alexa to pick up on it and list it on the site's Alexa listing.
    Alexa will index the page from your toolbar, I learned thsi the hard way with a non-password protected admin area (for an insignificant site of mine).

    But I agree, I don't see a reason to put your admin area in your robots file.

    How would google find it?
    Kyle

  9. #9
    Registered
    Join Date
    Mar 2007
    Posts
    7
    Well-behaved search engine programs called robots ("bots") or spiders are supposed to fetch the robots.txt file in your Web site root and follow the rules in it when they spider your site to index its content. You should probably have one, if only to reduce the 404 errors in your log files (even if it is blank or just put a rule in there saying “go to town, here’s my home page index.asp”, whatever you like). However, be careful. As you try to use robots.txt, there may be some considerations you didn't think of at first.

    Inspection of most robots.txt files indicates that many admins try to keep spiders out of certain areas. Take a look at ibm.com/robots.txt. In this case, IBM doesn’t want you in the images, cgi-bin, scripts, etc. However, they also list out /Admin, /webmaster, etc. If you blindly put in all the places you don't want people to go in a robots.txt file, it is just a broadcast to someone who has bad intentions as to where they should direct their attention. Hey, here’s my control panel, NixGuy666. Caveat admintor.

    Now, not to say in IBM's case they made such a basic mistake -- more likely they have a tripwire here. For example, say you have a directory in your robots.txt called /controlpanel-- this suggests it has some site backend. We might instead make /controlpanel a tripwire and have it cookie the user and then log a potential threatening visitor. This ‘hidden’ directory is only being looked at by those who know it is there when they learned about it from robots.txt sniffing. You might even consider putting a honeypot-style login page there to keep the potential snoop busy. In this case, false entries serve as the wire with some cans on it: the intruder comes stumbling in and alerts you to their potential bad intentions.

    Consider adding a robots.txt file, if only to clean your logs. If you do, be smart and avoid revealing extra information and even consider adding tripwire URLs to monitor for. Someday, we hope to see true bot control. For now, we'll use robots.txt for its intended use… and other uses as well.

Similar Threads

  1. robots.txt question
    By webvista in forum Website Programming & Databases
    Replies: 2
    Last Post: 06-12-2006, 01:30 AM
  2. Robots.txt help please
    By Blue Cat Buxton in forum Search Engine Optimization
    Replies: 2
    Last Post: 07-20-2005, 06:57 AM
  3. Robots.txt
    By incka in forum Search Engine Optimization
    Replies: 2
    Last Post: 04-10-2004, 06:59 PM
  4. robots.txt
    By delpino in forum Search Engine Optimization
    Replies: 2
    Last Post: 01-23-2004, 02:13 AM
  5. robots.txt question
    By Kyle in forum Search Engine Optimization
    Replies: 6
    Last Post: 11-04-2003, 03:13 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •