Hi all,
Would anyone be able to tell me the function to "crawl" another website? I know there isn't one specific function, but could you like give me a basic idea of what you need to know.
Thanks,
Mike
Hi all,
Would anyone be able to tell me the function to "crawl" another website? I know there isn't one specific function, but could you like give me a basic idea of what you need to know.
Thanks,
Mike
Don't you just love free internet games ?
I've never done it, but I guess you would have to make an HTTP request and then read the results into a variable. Then use some string functions to find what you're looking for.
Pas mentioned a good book that discusses this in the 4fineart thread I think.
Basically load the content of the page.
Parse out all html links.
Enter the URLs into an array (or db).
Cycle through the array (DB) pulling each page.
Parse out all html links....
So on and so forth.
Would you make a HTTP request, as chromate said, to load the content of the page Chris?
Thanks a lot,
Mike
Don't you just love free internet games ?
If you want to fill up your server do a complete data crawl of wikipedia.org
I may give it ago thenOriginally posted by Chris
Yes.
PHP has a file() function that can fetch the contents of a remote file.
Is it alright to do it? Or could the site owners not like it because it's eating up their bandwidth?
Thanks very much,
Mike
Last edited by Mike; 03-10-2004 at 12:22 PM.
Don't you just love free internet games ?
They probably wouldn't like it, but what can they do?
Why do you need to crawl another site?
If it's just to grab everything on the site and download it, there are products that do that already. Why reinvent the wheel? If you need to be able to search that other site, there are products that do that too.
However, if you really need to parse out specific data and put it in a database, then I suppose you'd need to write your own script.
ExactlyOriginally posted by flyingpylon
However, if you really need to parse out specific data and put it in a database, then I suppose you'd need to write your own script.
Will it be legal to crawl another site then? Or could some consider it as an attack?
Thanks,
Mike
Don't you just love free internet games ?
Google do it, and I don't see them being sued for it...
You are just reading their website - they have put it there for all to read.
What exactly you are doing with it might be a problem - i.e. they cant stop you just fetching their website, but they would have a problem with you just republishing it! Im sure thats not what you were going to do, its just an example...
Yeah, only do it one site that either allow it, like lets say an affiliate program, or a site that is open source, like wikipedia.
There are a couple of Spiders written available on Sourceforge.net...
Thanks for all the replies...
Going off topic a little here, but isn't what google are doing against copyright laws? Like they are displaying part of someone's website content aren't they?
Mike
Don't you just love free internet games ?
Bookmarks