Page 1 of 2 12 LastLast
Results 1 to 15 of 18

Thread: Stop the google madness!

  1. #1
    4x4
    Join Date
    Oct 2004
    Posts
    1,043

    Talking Stop the google madness!

    Ok.. other than disallow in robots.txt what else can be done to BLOCK google from spydering my site?

    I want/need the adsense bot on index.php but not index.php?ANYTHING is this possible?


    This is not a joke.
    Last edited by Todd W; 02-06-2007 at 12:44 AM.

  2. #2
    Registered
    Join Date
    Nov 2003
    Posts
    100
    i think this is what you are looking for:

    User-agent: *
    Disallow: /index.php?

  3. #3
    4x4
    Join Date
    Oct 2004
    Posts
    1,043
    Quote Originally Posted by kdb003 View Post
    i think this is what you are looking for:

    User-agent: *
    Disallow: /index.php?
    Before I do that I really want to make sure it willonly block ?* and not index.php too.

    Also, if google comes from say www.yoursite.com to www.mysite.com/index.php does it STILL read robots.txt or does it ony read robots.txt if it's starting on my site and spydering my pages?

  4. #4
    4x4
    Join Date
    Oct 2004
    Posts
    1,043
    To remove dynamically generated pages, you'd use this robots.txt entry:

    User-agent: Googlebot
    Disallow: /*?

    Looks like it will work.. hmm I got the above from: http://www.google.com/support/webmas...y?answer=35303

    Yet when I goto

    http://services.google.com:8882/urlconsole/controller (Google URL remover)

    They then say:
    URLs cannot have wild cards in them (e.g. "*"). The following line contains a wild card:
    DISALLOW /*?


    So one place google says "do this" and in another they say "it wont work".

    Hmmm
    Last edited by Todd W; 02-06-2007 at 01:11 AM.

  5. #5
    Registered
    Join Date
    Nov 2003
    Posts
    100
    I am very confident that this is the correct robots.txt. I suggest you use google's robots.txt checker in the webmaster tools to be absolutely sure.

    http://www.google.com/webmasters/sitemaps/

    and yes google will check robots.txt when it spiders your pages no matter where it comes from

  6. #6
    4x4
    Join Date
    Oct 2004
    Posts
    1,043
    Quote Originally Posted by kdb003 View Post
    I am very confident that this is the correct robots.txt. I suggest you use google's robots.txt checker in the webmaster tools to be absolutely sure.

    http://www.google.com/webmasters/sitemaps/

    and yes google will check robots.txt when it spiders your pages no matter where it comes from
    Well I used waht google suggested first for blocking dynamic content from being spydered and then use the robots.txt analysis tool and I get.
    URL Googlebot
    http://www.mysite.com/ Allowed


  7. #7
    4x4
    Join Date
    Oct 2004
    Posts
    1,043
    I tried:
    User-agent: *
    Disallow: /index.php?

    Like you said and it too said googlebot allowed

  8. #8
    Registered
    Join Date
    Nov 2003
    Posts
    100
    Quote Originally Posted by ToddW View Post
    Well I used waht google suggested first for blocking dynamic content from being spydered and then use the robots.txt analysis tool and I get.
    URL Googlebot
    http://www.mysite.com/ Allowed

    I am confused by your smiley. Aren't you trying to allow mysite.com/ and trying to disallow mysite.com/index.php?...

    Try adding some extra lines of urls you want to test.

    either robots.txt should work.

  9. #9
    4x4
    Join Date
    Oct 2004
    Posts
    1,043
    Quote Originally Posted by kdb003 View Post
    I am confused by your smiley. Aren't you trying to allow mysite.com/ and trying to disallow mysite.com/index.php?...

    Try adding some extra lines of urls you want to test.

    either robots.txt should work.
    Yes trying to block inde.php?*

    It's not saying anything is blocked...

  10. #10
    4x4
    Join Date
    Oct 2004
    Posts
    1,043
    I adde:

    User-agent: Googlebot-Image
    Disallow: /

    And now it says:

    Googlebot-Image
    Allowed Syntax not understood


    Me thinks it's not working.

  11. #11
    Registered
    Join Date
    Nov 2003
    Posts
    100
    Quote Originally Posted by ToddW View Post
    Yes trying to block inde.php?*

    It's not saying anything is blocked...
    ok in the robots.txt checker add a few extra lines under http://mysite.com

    ie
    http://mysite.com/index.php
    http://mysite.com/index.php?q=3439345983457

    the last one should be the only one that is blocked.

  12. #12
    4x4
    Join Date
    Oct 2004
    Posts
    1,043
    I got it to work now.

    It's PICKY on the URLS you want to test

  13. #13
    4x4
    Join Date
    Oct 2004
    Posts
    1,043
    For the record this works perfectly:

    User-agent: Googlebot
    Disallow: /*?

    User-agent: Googlebot-Image
    Disallow: /

    I was not typing a full valid URL to test.

    Index.php works
    idnex.php? anything is blocked.

  14. #14
    4x4
    Join Date
    Oct 2004
    Posts
    1,043
    For the record... the two sites I put this on had google-bot re-attempt to download robots.txt this morning one 30minuts ago and one 2 hours ago.

    Status 404 (Not found)

    Yet if I goto /robots.txt it's clearly there... wow google!! Stop ignoring it!

  15. #15
    Administrator Chris's Avatar
    Join Date
    Feb 2003
    Location
    East Lansing, MI USA
    Posts
    7,055
    Have you tried putting the meta robots tag directly on the pages you do not want indexed? It will not stop Google from viewing the pages, but it should stop them from indexing the pages.
    Chris Beasley - My Guide to Building a Successful Website[size=1]
    Content Sites: ABCDFGHIJKLMNOP|Forums: ABCD EF|Ecommerce: Swords Knives

Similar Threads

  1. Is any other method to get Website PR ?
    By number7 in forum Search Engine Optimization
    Replies: 10
    Last Post: 01-08-2019, 03:33 AM
  2. Review: Google Adsense
    By Chris in forum Advertising Networks
    Replies: 16
    Last Post: 12-02-2017, 02:32 AM
  3. google rant. please read all before taking their side in a reply.
    By Kyle in forum Search Engine Optimization
    Replies: 22
    Last Post: 08-08-2006, 07:39 AM
  4. More Google Maps madness
    By Cutter in forum General Chat
    Replies: 4
    Last Post: 09-02-2005, 12:44 PM
  5. Local Rank stuff...
    By chromate in forum Search Engine Optimization
    Replies: 41
    Last Post: 02-07-2004, 03:53 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •