A Batch Download Workflow
I’ve used this hack/technique a couple times now and wanted to share it and document it. I was recently doing some web research on map production guidelines as I’m trying to put some together for my GIS department.
I stumbled across this Map productions guidelines PDF from the Australian Local Government.
It is a really nice document and I intent to use it as a model to develop my own but I noticed that the Map production guidelines are module 9, which implies there’s at least modules 1-8 as well.
I modified the url to confirm http://alga.asn.au/site/misc/alga/downloads/info-technology/09_Spatial_Toolkit_Module9.pdf to http://alga.asn.au/site/misc/alga/downloads/info-technology/08_Spatial_Toolkit_Module8.pdf and that was valid.
So, then I took to excel to build download urls for each module that I could use with the download them all extension for firefox.
In my first column (A), I entered ‘01’, in the second (B), ‘1’ and copied that down about 15 rows.
In the third column (C), I copied the base url into excel up to the first number, alga.asn.au/site/misc/alga/downloads/info-technology/.
In the fourth column (D), the part of the url between the numbers;
‘_Spatial_Toolkit_Module’.
In the fifth column (E) is the formula to create the url for each pdf;
=C3&A3&D3&B3&”.pdf”
which returns the url
alga.asn.au/site/misc/alga/downloads/info-technology/01_Spatial_Toolkit_Module1.pdf
Now the label for my hyperlink in the next column (F);
=A3&”_Spatial_Toolkit_Module”&B3
which returns
01_Spatial_Toolkit_Module1
Now I can build the hyperlink html in the final column (G) with this formula;
=”“&F3&””
which returns
01_Spatial_Toolkit_Module1
I tested the links to see at what the last module is. Another way to do this is to enter the following into a google search;
site:http://alga.asn.au/site/misc/alga/downloads/info-technology pdf
This will return all of the pdf listings at the url after site:.
Once I confirmed the modules were 1-10. I performed the google search above and found an additional pdf with table of contents that did not follow the exact naming convention. I built one more line in excel to cover this url. Here’s what the table of hyperlinks looks like;
Next, i start a new html file in notepad++ and copy/paste the column G contents in.
Then I saved this as an html file and opened it in firefox. Here’s what it looks like in firefox, you can see all of the hyperlinks and even test them out;
One of my favorite extensions for firefox is DownloadThemAll, which allows you to download everything in a given webpage based on filters you set up. Here’s the interface looking at the html I’ve built;
Point the download to the right folder and hit start and all the files are downloaded while you work on something else;
Here’s all the PDF’s ready to be combined and added to the library.
There’s some threshold where this workflow outperforms the manual download workflow and I’ve found it very useful under the right circumstances. The example above can certainly be performed manually and probably wouldn’t take much more time that setting up this batch download workflow. This batch workflow becomes very useful when the downloads are in the hundreds. I’ve used this technique for various GIS data downloads many times, where I had hundreds of individual download links and saved myself many hours of work.