Results 1 to 3 of 3

Thread: Web Crawling?

  1. #1
    Junior Registered
    Join Date
    Jun 2004
    Posts
    1

    Web Crawling?

    Hi,

    I would like to have a program where I can search an entire given web site for a specific keyword and then have it report back to me the word following it. From each page found that has my initial keyword, I would like to search three new keywords on that page and have it report back to me the word following it as well. Then I would like to have all the results put into a speadsheet like format for me to examin on or offline for later use. Does anyone know how I would go about doing this? Thank you.

    -Nick

  2. #2
    Administrator Chris's Avatar
    Join Date
    Feb 2003
    Location
    East Lansing, MI USA
    Posts
    7,055
    Thats quite a project. It seems like the type of thing where if you have to ask how, it is beyond your abilities.

    Basically you'll need to write your own webcrawler. I've done this in PHP but it is very very complex, and doesn't even work that well. You'll need to know alot about regular expressions.

    Basically here is how the logic of it would work.

    You feed the script a URL, it fetches it using PHP's file function. You scan the content of the URL for anchor tags, you parse out all the link URLs. You then weed out all the URLs that aren't for the local site. Put this list somewhere (an array or a database). Then you scan the content for your word. If found you use string functions to get the words after it (basically I'd get 50 characters after the word, plus the word, explode the string on a space so that each word gets it's own spot in an array. Then your word should be array[0] and the next word array[1] etc). Then you could enter these words into your database or whatever.

    Next you would open up your array or database where you stored the link URLs, and load the URL at the top of the list, starting all over.
    Chris Beasley - My Guide to Building a Successful Website[size=1]
    Content Sites: ABCDFGHIJKLMNOP|Forums: ABCD EF|Ecommerce: Swords Knives

  3. #3
    Senior Member chromate's Avatar
    Join Date
    Aug 2003
    Location
    UK
    Posts
    2,348
    There is an open source spider written in PHP. I think it's called PHPSpider or something similar. It's available on sourceforge. Might be worth checking out.

Similar Threads

  1. Crawling Other Sites
    By Mike in forum Website Programming & Databases
    Replies: 25
    Last Post: 04-15-2004, 03:28 PM
  2. external css crawling
    By nohaber in forum Search Engine Optimization
    Replies: 2
    Last Post: 03-05-2004, 07:48 PM
  3. Google is crawling
    By Chris in forum Search Engine Optimization
    Replies: 23
    Last Post: 02-08-2004, 05:37 AM
  4. Google is crawling WP
    By MattM in forum Search Engine Optimization
    Replies: 5
    Last Post: 01-27-2004, 05:21 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •