Web Spider for Mac?

NameOfBeast

Senior Member
Joined
Nov 9, 2005
Messages
874
Reaction score
0
I have a weeks access to a online mag (HTML format only). I would like to download all issues for offline viewing (I don't feel like viewing each article and saving/printing it).

The site is login protected, and I can't convince blue-crab to submit the authentication form (username and password). Any ideas?

This site is simple --- few images, which I can do without --- just text.
 
Scrapbook, a firefox plugin, did what I wanted (and saved me having to archive/print 2000 pages!).
 
I'm not sure how you expect the "web spyder" to bypass authentication. It's just a http client like your browser.
 
I'm not sure how you expect the "web spyder" to bypass authentication. It's just a http client like your browser.

I didn't expect it to bypass it, I expected it to be able to fill in the authentication form. There is in fact some basic method for allowing forms to be automatically filled in BlueCrab, just not sophisticated enough for the site I was trying to download. Scrapbook was able to get by this by cookie authentication (I logged in via the browser then downloaded).
 
Aah right, I'm with you. wget can happily suck down an html site, and convert the links for local viewing.

On a site without any authentication, the following will grab a static site:

Code:
wget -t0 -c -m -k www.example.com

How the authentication on the site works determines how you go further with this. If it's the type where it pops up a username/password block, then something like this will work:

Code:
wget -t0 -c -m -k --http-user=user --http-password=password www.example.com

It can handle sessions, although I haven't played with this. What is the website? I'd like to see what it looks like - maybe I can figure something else. An OS X build of wget is available here.
 
Aah right, I'm with you. wget can happily suck down an html site, and convert the links for local viewing.

On a site without any authentication, the following will grab a static site:

Code:
wget -t0 -c -m -k www.example.com

How the authentication on the site works determines how you go further with this. If it's the type where it pops up a username/password block, then something like this will work:

Code:
wget -t0 -c -m -k --http-user=user --http-password=password www.example.com

It can handle sessions, although I haven't played with this. What is the website? I'd like to see what it looks like - maybe I can figure something else. An OS X build of wget is available here.

Thanks, I didn't realise that you could get wget to do that! Amazing what commandline UNIX tools can do.

Not enough for the site I was interested in; in part, I think, because the magazine operator doesn't particularly want his site's content slurped off en masse --- The login form is, as a result, an oddity. Anyhoo, I got it all downloaded (I would have gone mad trying to download all of that material page by page!).
 
You might also want to look at Paparazzi!. It takes a screenshot of the web content. Not just what's displayed on the screen, but the entire page, and saves that as a jpeg, iirc. Useful if the article is just one big page.
 
Top
Sign up to the MyBroadband newsletter
X