PDA

View Full Version : Spiders and websites....



mrVJ
04-15-2006, 10:26 AM
Hey all,

How do spiders recognize that they are still on the same website, for example, for big websites have alot of servers/ip's who host articles on many of them.

Do the crawlers only look at the URL to see if they are still on the same webpage ?

Regards

darker
04-15-2006, 10:39 AM
I would think so yes, cause one ip can have many different domains aswell, so the domain name is what matters.

Chris
04-15-2006, 12:39 PM
For the most part though a crawler doesn't care if it is on the same site or not. It is judging pages, not sites.

mrVJ
04-15-2006, 01:38 PM
Would it be possible to fake a renoun website (ie cnn.com) write an article about a website I want to promote (with links of course) and when the crawler sees that page, a custom httpd displays a fake URL confirming that the article is on CNN.com for example.

So we would be able to have links from any website as long as you spoof a page URL.

Regards

Masetek
04-15-2006, 05:07 PM
That sort of behavior will eventually lead in your site getting banned

Chris
04-15-2006, 05:08 PM
That doesn't work, you cannot change where domains resolve to.

mrVJ
04-15-2006, 05:17 PM
Would a spider keep crawling a website after a page that would have reloaded automaticly ?

KLB
04-16-2006, 11:28 AM
Automatic page reloading normally relies on client side scripting like JavaScript since bots don't run these types of server side scripting the pages would not reload for them.

john190
04-17-2006, 06:54 AM
That is something that I wouldn't consider. It is likely to get your site banned and may even damage the site that you are faking and you are then likely to get into trouble.


Would it be possible to fake a renoun website (ie cnn.com) write an article about a website I want to promote (with links of course) and when the crawler sees that page, a custom httpd displays a fake URL confirming that the article is on CNN.com for example.

So we would be able to have links from any website as long as you spoof a page URL.

Regards