Ok.. other than disallow in robots.txt what else can be done to BLOCK google from spydering my site?
I want/need the adsense bot on index.php but not index.php?ANYTHING is this possible?
This is not a joke.
Ok.. other than disallow in robots.txt what else can be done to BLOCK google from spydering my site?
I want/need the adsense bot on index.php but not index.php?ANYTHING is this possible?
This is not a joke.
Last edited by Todd W; 02-06-2007 at 12:44 AM.
i think this is what you are looking for:
User-agent: *
Disallow: /index.php?
Before I do that I really want to make sure it willonly block ?* and not index.php too.
Also, if google comes from say www.yoursite.com to www.mysite.com/index.php does it STILL read robots.txt or does it ony read robots.txt if it's starting on my site and spydering my pages?
To remove dynamically generated pages, you'd use this robots.txt entry:
User-agent: Googlebot
Disallow: /*?
Looks like it will work.. hmm I got the above from: http://www.google.com/support/webmas...y?answer=35303
Yet when I goto
http://services.google.com:8882/urlconsole/controller (Google URL remover)
They then say:
URLs cannot have wild cards in them (e.g. "*"). The following line contains a wild card:
DISALLOW /*?
So one place google says "do this" and in another they say "it wont work".
Hmmm
Last edited by Todd W; 02-06-2007 at 01:11 AM.
I am very confident that this is the correct robots.txt. I suggest you use google's robots.txt checker in the webmaster tools to be absolutely sure.
http://www.google.com/webmasters/sitemaps/
and yes google will check robots.txt when it spiders your pages no matter where it comes from
Well I used waht google suggested first for blocking dynamic content from being spydered and then use the robots.txt analysis tool and I get.
URL Googlebot
http://www.mysite.com/ Allowed
I tried:
User-agent: *
Disallow: /index.php?
Like you said and it too said googlebot allowed
I adde:
User-agent: Googlebot-Image
Disallow: /
And now it says:
Googlebot-Image
Allowed Syntax not understood
Me thinks it's not working.
ok in the robots.txt checker add a few extra lines under http://mysite.com
ie
http://mysite.com/index.php
http://mysite.com/index.php?q=3439345983457
the last one should be the only one that is blocked.
I got it to work now.
It's PICKY on the URLS you want to test
For the record this works perfectly:
User-agent: Googlebot
Disallow: /*?
User-agent: Googlebot-Image
Disallow: /
I was not typing a full valid URL to test.
Index.php works
idnex.php? anything is blocked.
For the record... the two sites I put this on had google-bot re-attempt to download robots.txt this morning one 30minuts ago and one 2 hours ago.
Status 404 (Not found)
Yet if I goto /robots.txt it's clearly there... wow google!! Stop ignoring it!
Have you tried putting the meta robots tag directly on the pages you do not want indexed? It will not stop Google from viewing the pages, but it should stop them from indexing the pages.
Bookmarks