How to download grab an entire website?

How to download grab an entire website? Some clients hosted in wix wants to grab his website but how? You don’t just want an article or an individual image, you want the whole web site. Here we go below are two option you may successfully done it with:

You may need to mirror the website completely, but be aware that some links may really dead.
You can use HTTrack or wget:


1) Wget is a classic command-line tool for this kind of task. It comes with most Unix/Linux systems, and you can get it for Windows too.
http://www.gnu.org/software/wget/

$ wget -r http://jonboy60.com # or whatever

This downloads the pages recursively up to a maximum of 5 levels deep.

5 levels deep might not be enough to get everything from the site. You can use the -l switch to set the number of levels you wish to go to as follows:

$ wget -r -l10 http://jonboy60.com

If you want infinite recursion you can use the following:

$ wget -r -l inf http://jonboy60.com

You can also replace the inf with 0 which means the same thing.

There is still one more problem. You might get all the pages locally but all the links in the pages still point to their original place. It is therefore not possible to click locally between the links on the pages.

You can get around this problem by using the -k switch which converts all the links on the pages to point to their locally downloaded equivalent as follows:

$ wget -r -k http://jonboy60.com

If you want to get a complete mirror of a website you can simply use the following switch which takes away the necessity for using the -r -k and -l switches.

$ wget -m http://jonboy60.com

Therefore if you have your own website you can make a complete backup using this one simple command.

2) HTTrack http://www.httrack.com/, HTTRACK works like a champ for copying the contents of an entire site. This tool can even grab the pieces needed to make a website with active code content work offline. I am amazed at the stuff it can replicate offline.

Firstly we install it:

$ sudo yum install httrack
$ sudo apt-get install httrack

now run it just 1 external link:

$ httrack –ext-depth=1 http://jonboy60.com

This will download the jonboy60 CDN files, but not the files in the files in the files in the whole internet.

Leave a Reply

Your email address will not be published. Required fields are marked *