Web Crawler/Page Scraping

chris_meier

Well-Known Member
Joined
Jun 14, 2010
Messages
301
Reaction score
0
Location
Jhb
Hi

Does anyone have any experience with web crawling and page scraping. I'm looking at implementing something to run on one of my suppliers site about once a week instead of manually updating as I currently do. Basically creating a xml file with various details of the product which I would then import into Excel, modify slightly then import into my site software.

I've spent the weekend playing around with various trials, none of which really seems to give me what I want. The open source VietSpider on the other hand seems able to do exactly what I want, but it keeps crashing.

80Legs also look promising, but I couldn't get it configured quite right.

Anyone with experience using VietSpider, or any advice on other software packages I could try?

Thanks in advance
 
Web crawling is easy but do ask permission from your supplier first if its OK. You can get your IP blocked if you don't play nice with them.
 
i just did a site which got price queries off a local supplier's site.
screenscraping relies on dom not changing, so long term it's not ideal.

i used php, curl and simplehtmldom
http://net.tutsplus.com/tutorials/p...en-scraping-with-the-simple-html-dom-library/
if you're familiar with jquery, the methods are very similar.

it took about 15 lines of code in total and i queried the mobisite rather than the full desktop site.
nice and fast.
there probably won't be enough traffic to cause too many problems with the ip being noticed.
 
Top
Sign up to the MyBroadband newsletter
X