Shell Tutorial: Downloading data using Wget



Shell Tutorial: Downloading data using Wget

Shell Tutorial: Downloading data using Wget

Want to learn more? Take the full course at https://learn.datacamp.com/courses/data-processing-in-shell at your own pace. More than a video, you’ll learn hands-on coding & quickly apply skills to your daily work.


Welcome back! In this lesson, we will introduce another command line tool for downloading data, called Wget. We will walk through how to install and set up Wget along with some basic usage.

Wget derives its name from World Wide Web and get.

It is a GNU project native to the Linux system, but is compatible across all operating systems.

It is another command line tool that will help you download files via HTTP and FTP.

Compared to curl, Wget is more multi-purpose. It can download a single file, an entire folder, or even a webpage. Most importantly, it makes multiple file downloads possible recursively.

Aside from using man, another way to check if Wget has been installed correctly, is by using which Wget.

This will return the location of where Wget is installed. For example, in the local user bin:

If Wget has not been installed, there will simply be no output.

The official documentation and source code for Wget is listed, but unless you are comfortable compiling from the source code, here are some easier alternatives.

For Linux users, it’s likely Wget is already installed for you. If not, run sudo apt get install wget on the command line.

For Mac users, use homebrew by running brew install wget on the command line.

For Windows users, this will not be a command line install. Rather, visit the link listed on the slide to download as part of the gnuwin32 package.

Once installation is complete, use the man command to print the Wget manual.

Remember to press Enter to scroll and to press q to exit.

The basic syntax for Wget has a similar structure to curl:

Wget, option flags, URL

The URL is also required for the Wget command to run successfully.

Wget supports a large number of protocol calls for data stored on servers.

For a full list of the options available, refer to wget dash-dash-help.

Here are some option flags unique to Wget:

dash-lowercase-b allows your download to run in the background.

dash-lowercase-q turns off the Wget output, which saves some disk space.

dash-lowercase-c is useful to finish up a previously broken download whether by Wget or another program.

Finally, you can link all the option flags together like this.

Wget dash-b-q-c followed by the file location

Running this command on this hypothetical file location will generate the output:

Continuing in background, pid 12345

The pid is a unique process ID assigned to this particular data download job for your reference, in case you need to cancel the process.

In this lesson, we learned another way to download files in the command line using the tool Wget.

Up next, we will put our new knowledge to practice and learn more advanced Wget use cases!

Happy Wget-ing!

#ShellTutorial #DataCamp #Data #Processing #Shell #Wget

Comments are closed.