PDA

View Full Version : Scraping info from pages



ZigE
01-28-2008, 06:19 AM
I have to transfer about 400 unique pages, all with the same page template layout (headings etc) then stick into csv/database.

I have 3 options;

1) create my own php/curl script that will do it (Don't know if I'm experienced enough to do this.)
2) Use some software (like iMacros is this possible?)
3) Copy and paste by hand

Now obviously I'm wanting to avoid number 3. But what I'm asking is has anyone had any experience at doing this, or have any other alternative ideas, before I jump headfirst into it :sick:

Chris
01-28-2008, 06:24 AM
Hire someone to copy & paste by hand. Basic data entry is an easy thing to outsource.

rpanella
01-28-2008, 05:09 PM
You would probably be better off paying for someone to write a quick script to scrap all the data than paying someone to copy and paste.

Depending on the complexity of the template and amount of data you need to pull it should not take someone experienced with this more than an hour or so to write.
________
Avandia Lawyer (http://classactionsettlements.org/)

Chris
01-29-2008, 07:07 AM
There is actually a php function set/library that makes it really easy to scrape sites... I can't think of the name right now though. It allows you to feed in a URL and then access page elements by tag name.

ZigE
02-23-2008, 03:38 PM
Just on this, I ended up hiring someone from getafreelancer. It was about 400 lines of code - it was a bit more complex than first thought. Cost me about $150.

Definatly worth outsourcing stuff like this.

phazex
03-19-2008, 12:54 AM
There is actually a php function set/library that makes it really easy to scrape sites... I can't think of the name right now though. It allows you to feed in a URL and then access page elements by tag name.

Chris, were you referring to Tidy?

Anyway, I did find this http://htmlpurifier.org/

:)