Harvesting web sheets

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 11530
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Harvesting web sheets

Post by ChrisGreaves »

Forum?

I am looking for a user manual, and come to this page.
To my dismay, I am supposed to spend hours clicking on links, one by one, to expose (and then save) the pages - one by one.

I got into computers to save myself doing boring and repetitive tasks.

Is there a simple application or (firefox) add-on that will let me harvest all pages linked from the main page, or better yet, all pages linked from the main page that are housed on the same domain?

I'd like to click one button and find that all 24 pages were sitting on my hard drive.

I suspect I have a bit of VBA code that could do the trick, but first wondered if there was a proper ap/add-on.
Thanks
Chris
We hate change, but love variety.

User avatar
HansV
Administrator
Posts: 69475
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Harvesting web sheets

Post by HansV »

You might reverse engineer Google... :evilgrin:
Regards,
Hans

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 11530
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Harvesting web sheets

Post by ChrisGreaves »

HansV wrote:You might reverse engineer Google... :evilgrin:
Can't.
Got to pick up my prescriptions this afternoon ... :rofl:

Besides, how would I market something called elgoog?
We hate change, but love variety.

User avatar
HansV
Administrator
Posts: 69475
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Harvesting web sheets

Post by HansV »

elgooG exists already.

Do a Google search for 'link extractor' - you may find something useful.
Regards,
Hans

User avatar
Rudi
gamma jay
Posts: 25169
Joined: 17 Mar 2010, 17:33
Location: Cape Town

Re: Harvesting web sheets

Post by Rudi »

Not sure if I'm yapping up the wrong tree, but might this provide a solution?

You can use ScrapBook for the following purposes:
  • Save a single Web page
  • Save snippet or portion of a single Web page
  • Save an entire Web site
  • Organize the collection in the same way as Bookmarks with folders, sub-folders
  • Full text search and fast filtering search of the entire collection
  • Editing of the collected Web page
  • Text/HTML edit feature resembling Opera’s Notes
Regards,
Rudi

If your absence does not affect them, your presence didn't matter.

User avatar
PaulB
BronzeLounger
Posts: 1475
Joined: 26 Jan 2010, 20:28
Location: Ottawa ON

Re: Harvesting web sheets

Post by PaulB »

HansV wrote:elgooG exists already.
OK, that hurt my brain.
Regards,
Paul



PJ_in_FL
5StarLounger
Posts: 874
Joined: 21 Jan 2011, 16:51
Location: Florida

Re: Harvesting web sheets

Post by PJ_in_FL »

HTTrack - grabs a web site and all related links and stores them locally

https://www.httrack.com/
PJ in (usually sunny) FL

jolas
2StarLounger
Posts: 126
Joined: 02 Feb 2010, 23:58

Re: Harvesting web sheets

Post by jolas »

If you click the the hyperlink of the manual's title and just verify the captcha, the ensuing page will have a pdf download link on the upper left hand of the page.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 11530
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Harvesting web sheets

Post by ChrisGreaves »

Rudi wrote:You can use ScrapBook for the following purposes:
Thanks Rudi. This add-on is churning away as I type this note!
Proof of the pudding will be when I get home and see if I can read the entire manual.
:thankyou: :xhumbug:
We hate change, but love variety.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 11530
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Harvesting web sheets

Post by ChrisGreaves »

PaulB wrote:
HansV wrote:elgooG exists already.
OK, that hurt my brain.
?yhW
We hate change, but love variety.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 11530
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Harvesting web sheets

Post by ChrisGreaves »

HansV wrote:elgooG exists already. Do a Google search for 'link extractor' - you may find something useful.
Thanks Hans.
I was not aware of the technological term! I am experimenting with Rudi's suggestion.
Cheers
Chris

P.S. Thanks for hurting paulB's brain. :rofl:
We hate change, but love variety.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 11530
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Harvesting web sheets

Post by ChrisGreaves »

PJ_in_FL wrote:HTTrack - grabs a web site and all related links and stores them locally. https://www.httrack.com/
Thanks PJ.
This too is humming as we squeak, and the test will be to disconnect and see if I have the complete manual.
:thankyou: :xgrin:
We hate change, but love variety.

User avatar
Rebel
4StarLounger
Posts: 454
Joined: 24 Jan 2010, 16:02
Location: Hanmer, Ontario, Canada

Re: Harvesting web sheets

Post by Rebel »

Wouldn't the simplest solution be the one that Jolas posted?
"If you click the the hyperlink of the manual's title and just verify the captcha, the ensuing page will have a pdf download link on the upper left hand of the page." :scratch: Seems pretty straightforward to me.
John :canada:
A Child's Mind, Once Stretched by Imagination...
Never Regains Its Original Dimensions

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 11530
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Harvesting web sheets

Post by ChrisGreaves »

Rebel wrote:Wouldn't the simplest solution be the one that Jolas posted?
Absolutely!
But unfortunately I had run out of my budgeted time (of the 24 hours I am assigned each day), and had already committed to the two packages.suggested by Rudi and PJ.
Which is why I am back here today, trying Jolas's suggestion.
:thankyou:
We hate change, but love variety.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 11530
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Harvesting web sheets

Post by ChrisGreaves »

jolas wrote:If you click the the hyperlink of the manual's title and just verify the captcha, the ensuing page will have a pdf download link on the upper left hand of the page.
Thank You, Jolas.
I got it to work, but I can't say it is as intuitive as a simple button that reads "Click here to download a PDF manual"!
I couldn't find the pdf by clicking on the title, but I got there at last by clicking on the image/icon off to the left-hand side of the page.
:thankyou:

P.S. Some business, such as http://www.taotealeaf.com/ just do not make a PDF/DOC available.
I went into the store and asked if a full list of teas was available.
Not at all, said the lady.
The store has a computer printed-on-paper list, but there is no way a humble customer can get hold of a document.
I believe that in this case the only solution is to either click-and-save on each web page, or to resort to an add-in/package such as those described by Rudi and PJ.
:cheers:
We hate change, but love variety.