Shell Tutorial: Advanced downloading using Wget



Shell Tutorial: Advanced downloading using Wget

Shell Tutorial: Advanced downloading using Wget

Want to learn more? Take the full course at https://learn.datacamp.com/courses/data-processing-in-shell at your own pace. More than a video, you’ll learn hands-on coding & quickly apply skills to your daily work.


So far, we’ve learned how to install and do basic file downloads using either curl or Wget. In this lesson, we will focus on getting the most out of Wget by going over more advanced techniques for data downloading.

A common way for data people to handle multiple file downloads is by storing the file locations in a file and pass that meta file to the downloading program like Wget.

In this case, all the URLs for the files we want to download are stored in the file URL underscore list dot txt. Let’s use the cat command to print and preview the URLs really quickly.

After confirming that the URLs are indeed stored in this file, we can now pass this file to Wget. Note that we need to preface this with the dash-lowercase-i option flag, so Wget knows that we are reading URLs from a local or external file.

The command reads:

wget dash lowercase-i url underscore list dot txt

Finally, it’s worth noting not to insert any option flags in between the dash-i and the URL file. If other option flags are needed, put it before dash-i.

Sometimes, it’s useful to make sure Wget does not consume your entire bandwidth with the file download. You can set an upper download bandwidth limit using the dash-dash-limit-rate option.

Set the limit rate equal to a whole number, which will automatically convert to bytes per second.

For example wget dash dash limit rate equal to 200k will make sure your download rate will not exceed 200 kilobytes per second as you download the files saved in the URL list.

For downloading smaller files, enforcing a download bandwidth won’t work as well. To avoid overtaxing the file hosting server, it is more useful to enforce a mandatory wait time between file downloads using dash-dash-wait.

The default time interval is set to seconds.

For example, wget dash dash wait equals two point five creates a 2.5 second pause between downloading each file stored in the URL list file.

As we round out this chapter, it is helpful to do a quick comparison between the two command line tools curl and Wget.

Although both curl and Wget can download files from HTTP, HTTPS, and FTP. Curl alone can download and upload from 20 other protocols.

It is also easier to install across all operating systems, compared to Wget.

Wget’s advantage is its ability to handle multiple file downloads gracefully.

It can also be used to download just about anything, from a full file directory to a HTML page.

With both curl and Wget at your disposal, you’re now an expert at downloading files on the command line! Let’s practice!

#ShellTutorial #DataCamp #Data #Processing #Shell #Wget

Comments are closed.