13.2. Internet Specific Commands

Note that should DNS not be configured correctly on your machine, you need to edit “/etc/resolv.conf” to make things work...

host

Performs a simple lookup of an internet address (using the Domain Name System, DNS). Simply type:

host ip_address

or

host domain_name
dig

The "domain information groper" tool. More advanced then host... If you give a hostname as an argument to output information about that host, including it's IP address, hostname and various other information.

For example, to look up information about “www.amazon.com” type:

dig www.amazon.com

To find the host name for a given IP address (ie a reverse lookup), use dig with the `-x' option.

dig -x 100.42.30.95

This will look up the address (which may or may not exist) and returns the address of the host, for example if that was the address of “http://slashdot.org” then it would return “http://slashdot.org”.

dig takes a huge number of options (at the point of being too many), refer to the manual page for more information.

whois

(now BW whois) is used to look up the contact information from the “whois” databases, the servers are only likely to hold major sites. Note that contact information is likely to be hidden or restricted as it is often abused by crackers and others looking for a way to cause malicious damage to organisation's.

wget

(GNU Web get) used to download files from the World Wide Web.

To archive a single web-site, use the -m or --mirror (mirror) option.

Use the -nc (no clobber) option to stop wget from overwriting a file if you already have it.

Use the -c or --continue option to continue a file that was unfinished by wget or another program.

Simple usage example:

wget url_for_file

This would simply get a file from a site.

wget can also retrieve multiple files using standard wildcards, the same as the type used in bash, like *, [ ], ?. Simply use wget as per normal but use single quotation marks (' ') on the URL to prevent bash from expanding the wildcards. There are complications if you are retrieving from a http site (see below...).

Advanced usage example, (used from wget manual page):

wget --spider --force-html -i bookmarks.html

This will parse the file bookmarks.html and check that all the links exist.

Advanced usage: this is how you can download multiple files using http (using a wildcard...).

Notes: http doesn't support downloading using standard wildcards, ftp does so you may use wildcards with ftp and it will work fine. A work-around for this http limitation is shown below:

wget -r -l1 --no-parent -A.gif http://www.website.com[1]

This will download (recursively), to a depth of one, in other words in the current directory and not below that. This command will ignore references to the parent directory, and downloads anything that ends in “.gif”. If you wanted to download say, anything that ends with “.pdf” as well than add a -A.pdf before the website address. Simply change the website address and the type of file being downloaded to download something else. Note that doing -A.gif is the same as doing -A “*.gif” (double quotes only, single quotes will not work).

wget has many more options refer to the examples section of the manual page, this tool is very well documented.

NoteAlternative website downloaders
 

You may like to try alternatives like httrack. A full GUI website downloader written in python and available for GNU/Linux

curl

curl is another remote downloader. This remote downloader is designed to work without user interaction and supports a variety of protocols, can upload/download and has a large number of tricks/work-arounds for various things. It can access dictionary servers (dict), ldap servers, ftp, http, gopher, see the manual page for full details.

To access the full manual (which is huge) for this command type:

curl -M

For general usage you can use it like wget. You can also login using a user name by using the -u option and typing your username and password like this:

curl -u username:password http://www.placetodownload/file

To upload using ftp you the -T option:

curl -T file_name ftp://ftp.uploadsite.com

To continue a file use the -C option:

curl -C - -o file http://www.site.com

Notes

[1]

This way around the wildcard limitation has been adopted (with a tiny amount of editing) from wget manual page, see [9] in the Bibliography for further information.