Wget

Last updated

Wget
Original author(s) Hrvoje Nikšić
Developer(s) Giuseppe Scrivano, Tim Rühsen, Darshit Shah
Initial releaseJanuary 1996;28 years ago (1996-01)
Stable release
1.21.4 [1]   OOjs UI icon edit-ltr-progressive.svg / 10 May 2023
Repository
Written in C
Platform Cross-platform
Type FTP client / HTTP client
License GPL-3.0-or-later [lower-alpha 1] [2]
Website www.gnu.org/software/wget/

GNU Wget (or just Wget, formerly Geturl, also written as its package name, wget) is a computer program that retrieves content from web servers. It is part of the GNU Project. Its name derives from "World Wide Web" and "get". It supports downloading via HTTP, HTTPS, and FTP.

Contents

Its features include recursive download, conversion of links for offline viewing of local HTML, and support for proxies. It appeared in 1996, coinciding with the boom of popularity of the Web, causing its wide use among Unix users and distribution with most major Linux distributions. Wget is written in portable C, and can be easily installed on any Unix-like system. Wget has been ported to Microsoft Windows, macOS, OpenVMS, HP-UX, AmigaOS, MorphOS and Solaris. Since version 1.14, Wget has been able to save its output in the web archiving standard WARC format. [3]

It has been used as the basis for graphical programs such as GWget.

History

Wget descends from an earlier program named Geturl by the same author, [4] the development of which commenced in late 1995. The name changed to Wget after the author became aware of an earlier Amiga program named GetURL, written by James Burton in AREXX.

Wget filled a gap in the inconsistent web-downloading software available in the mid-1990s. No single program could reliably use both HTTP and FTP to download files. Existing programs either supported FTP (such as NcFTP and dl) or were written in Perl, which was not yet ubiquitous. While Wget was inspired by features of some of the existing programs, it supported both HTTP and FTP and could be built using only the standard development tools found on every Unix system.

At that time many Unix users struggled behind extremely slow university and dial-up Internet connections, leading to a growing need for a downloading agent that could deal with transient network failures without assistance from the human operator.

Features

Robustness

Wget has been designed for robustness over slow or unstable network connections. If a download does not complete due to a network problem, Wget will automatically try to continue the download from where it left off, and repeat this until the whole file has been retrieved. It was one of the first clients to make use of the then-new Range HTTP header to support this feature.

Recursive download

Wget can optionally work like a web crawler by extracting resources linked from HTML pages and downloading them in sequence, repeating the process recursively until all the pages have been downloaded or a maximum recursion depth specified by the user has been reached. The downloaded pages are saved in a directory structure resembling that on the remote server. This "recursive download" enables partial or complete mirroring of web sites via HTTP. Links in downloaded HTML pages can be adjusted to point to locally downloaded material for offline viewing. When performing this kind of automatic mirroring of web sites, Wget supports the Robots Exclusion Standard (unless the option -e robots=off is used).

Recursive download works with FTP as well, where Wget issues the LIST command to find which additional files to download, repeating this process for directories and files under the one specified in the top URL. Shell-like wildcards are supported when the download of FTP URLs is requested.

When downloading recursively over either HTTP or FTP, Wget can be instructed to inspect the timestamps of local and remote files, and download only the remote files newer than the corresponding local ones. This allows easy mirroring of HTTP and FTP sites, but is considered inefficient and more error-prone when compared to programs designed for mirroring from the ground up, such as rsync. On the other hand, Wget does not require special server-side software for this task.

Non-interactiveness

Wget is non-interactive in the sense that, once started, it does not require user interaction and does not need to control a TTY, being able to log its progress to a separate file for later inspection. Users can start Wget and log off, leaving the program unattended. By contrast, most graphical or text user interface web browsers require the user to remain logged in and to manually restart failed downloads, which can be a great hindrance when transferring a lot of data.

Portability

Written in a highly portable style of C with minimal dependencies on third-party libraries, Wget requires little more than a C compiler and a BSD-like interface to TCP/IP networking.[ citation needed ] Designed as a Unix program invoked from the Unix shell, the program has been ported to numerous Unix-like environments and systems, including Microsoft Windows via Cygwin, and macOS. It is also available as a native Microsoft Windows program as one of the GnuWin packages.

Other features

GNU Wget was written by Hrvoje Nikšić with contributions by many other people, including Dan Harkless, Ian Abbott, and Mauro Tortonesi. Significant contributions are credited in the AUTHORS file included in the distribution, and all remaining ones are documented in the changelogs, also included with the program. Wget is currently maintained by Giuseppe Scrivano, Tim Rühsen and Darshit Shah. [5]

The copyright to Wget belongs to the Free Software Foundation, whose policy is to require copyright assignments for all non-trivial contributions to GNU software. [6]

License

GNU Wget is distributed under the terms of the GNU General Public License, version 3 or later, with a special exception that allows distribution of binaries linked against the OpenSSL library. The text of the exception follows: [2]

Additional permission under GNU GPL version 3 section 7

If you modify this program, or any covered work, by linking or combining it with the OpenSSL project's OpenSSL library (or a modified version of that library), containing parts covered by the terms of the OpenSSL or SSLeay licenses, the Free Software Foundation grants you additional permission to convey the resulting work. Corresponding Source for a non-source form of such a combination shall include the source code for the parts of OpenSSL used as well as that of the covered work.

It is expected[ by whom? ] that the exception clause will be removed once Wget is modified to also link with the GnuTLS library.

Wget's documentation, in the form of a Texinfo reference manual, is distributed under the terms of the GNU Free Documentation License, version 1.2 or later. The man page usually distributed on Unix-like systems is automatically generated from a subset of the Texinfo manual and falls under the terms of the same license.

Development

Wget is developed in an open fashion, most of the design decisions typically being discussed on the public mailing list [7] followed by users and developers. Bug reports and patches are relayed to the same list.

Source contribution

The preferred method of contributing to Wget's code and documentation is through source updates in the form of textual patches generated by the diff utility. Patches intended for inclusion in Wget are submitted to the mailing list [7] where they are reviewed by the maintainers. Patches that pass the maintainers' scrutiny are installed in the sources. Instructions on patch creation as well as style guidelines are outlined on the project's wiki. [8]

The source code can also be tracked via a remote version control repository that hosts revision history beginning with the 1.5.3 release. The repository is currently running Git. [9] Prior to that, the source code had been hosted on (in reverse order): Bazaar, [10] Mercurial, Subversion, and via CVS.

Release

When a sufficient number of features or bug fixes accumulate during development, Wget is released to the general public via the GNU FTP site and its mirrors. Being entirely run by volunteers, there is no external pressure to issue a release nor are there enforceable release deadlines.

Releases are numbered as versions of the form of major.minor[.revision], such as Wget 1.11 or Wget 1.8.2. An increase of the major version number represents large and possibly incompatible changes in Wget's behavior or a radical redesign of the code base. An increase of the minor version number designates addition of new features and bug fixes. A new revision indicates a release that, compared to the previous revision, only contains bug fixes. Revision zero is omitted, meaning that for example Wget 1.11 is the same as 1.11.0. Wget does not use the odd-even release number convention popularized by Linux.

Wget makes an appearance in the 2010 Columbia Pictures motion picture release, The Social Network . The lead character, a somewhat fictionalized version of Facebook co-founder Mark Zuckerberg, uses Wget to aggregate student photos from various Harvard University housing-facility directories.

Notable releases

The following releases represent notable milestones in Wget's development. Features listed next to each release are edited for brevity and do not constitute comprehensive information about the release, which is available in the NEWS file distributed with Wget. [4]

Wget2

Wget2
Initial release26 September 2021;2 years ago (2021-09-26)
Stable release
2.1.0 [13]   OOjs UI icon edit-ltr-progressive.svg / 31 August 2023
Repository gitlab.com/gnuwget/wget2
License GPL-3.0-or-later [14]
Website www.gnu.org/software/wget/

GNU Wget2 2.0.0 was released on 26 September 2021. It is licensed under the GPL-3.0-or-later license, and is wrapped around Libwget which is under the LGPL-3.0-or-later license. [14] It has many improvements in comparison to Wget, particularly, in many cases Wget2 downloads much faster than Wget1.x due to support of the following protocols and technologies: [15]


Gwget-1.0.4.png
Screenshot of GWget 1.0.4 in Fedora v12 with GNOME v2.28.2 installed

GWget

GWget is a free software graphical user interface for Wget. It is developed by David Sedeño Fernández based on the GNOME software stack. GWget supports all of the main features that Wget does, as well as parallel downloads. [16]

Cliget

Cliget is an open source Firefox addon downloader that uses Curl, Wget and Aria2. It is developed by Zaid Abdulla. [17] [18] [19]

Clones

There exist clones of GNU Wget intended for embedded systems, which are often limited in memory and storage. They support its most basic options, usually limited to downloading.


See also

Notes

  1. GPL-3.0-or-later with OpenSSL exception.

Related Research Articles

<span class="mw-page-title-main">GNU</span> Free software collection

GNU is an extensive collection of free software, which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operating systems popularly known as Linux. Most of GNU is licensed under the GNU Project's own General Public License (GPL).

<span class="mw-page-title-main">Konqueror</span> Web browser and file manager

Konqueror is a free and open-source web browser and file manager that provides web access and file-viewer functionality for file systems. It forms a core part of the KDE Software Compilation. Developed by volunteers, Konqueror can run on most Unix-like operating systems. The KDE community licenses and distributes Konqueror under GNU GPL-2.0-or-later.

<span class="mw-page-title-main">GNU nano</span> Text editor for Unix-like computing systems

GNU nano is a text editor for Unix-like computing systems or operating environments using a command line interface. It emulates the Pico text editor, part of the Pine email client, and also provides additional functionality. Unlike Pico, nano is licensed under the GNU General Public License (GPL). Released as free software by Chris Allegretta in 1999, nano became part of the GNU Project in 2001. The logo resembles the lowercase form of the Greek letter Eta (η).

<span class="mw-page-title-main">Squid (software)</span> Caching and forwarding HTTP web proxy

Squid is a caching and forwarding HTTP web proxy. It has a wide variety of uses, including speeding up a web server by caching repeated requests, caching World Wide Web (WWW), Domain Name System (DNS), and other network lookups for a group of people sharing network resources, and aiding security by filtering traffic. Although used for mainly HTTP and File Transfer Protocol (FTP), Squid includes limited support for several other protocols including Internet Gopher, Secure Sockets Layer (SSL), Transport Layer Security (TLS), and Hypertext Transfer Protocol Secure (HTTPS). Squid does not support the SOCKS protocol, unlike Privoxy, with which Squid can be used in order to provide SOCKS support.

<span class="mw-page-title-main">OpenSSL</span> Open-source implementation of the SSL and TLS protocols

OpenSSL is a software library for applications that provide secure communications over computer networks against eavesdropping, and identify the party at the other end. It is widely used by Internet servers, including the majority of HTTPS websites.

cURL is a computer software project providing a library (libcurl) and command-line tool (curl) for transferring data using various network protocols. The name stands for "Client for URL".

OpenVPN is a virtual private network (VPN) system that implements techniques to create secure point-to-point or site-to-site connections in routed or bridged configurations and remote access facilities. It implements both client and server applications.

<span class="mw-page-title-main">GnuTLS</span> Free software library implementing TLS

GnuTLS is a free software implementation of the TLS, SSL and DTLS protocols. It offers an application programming interface (API) for applications to enable secure communication over the network transport layer, as well as interfaces to access X.509, PKCS #12, OpenPGP and other structures.

<span class="mw-page-title-main">Cyberduck</span> Software

Cyberduck is an open-source client for FTP and SFTP, WebDAV, and cloud storage, available for macOS and Windows licensed under the GPL. Cyberduck is written in Java and C# using the Cocoa user interface framework on macOS and Windows Forms on Windows. It supports FTP/TLS, using AUTH TLS as well as directory synchronization. The user interacts with the user interface (GUI), including file transfer by drag and drop and notifications via Growl. It is also able to open some files in external text editors.

lftp Free software command-line client for several file transfer protocols

lftp is a command-line program client for several file transfer protocols. lftp is designed for Unix and Unix-like operating systems. It was developed by Alexander Lukyanov, and is distributed under the GNU General Public License.

<span class="mw-page-title-main">FileZilla</span> Free software, cross-platform file transfer protocol application

FileZilla is a free and open-source, cross-platform FTP application, consisting of FileZilla Client and FileZilla Server. Clients are available for Windows, Linux, and macOS. Both server and client support FTP and FTPS, while the client can in addition connect to SFTP servers. FileZilla's source code is hosted on SourceForge.

Filesystem in Userspace (FUSE) is a software interface for Unix and Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a bridge to the actual kernel interfaces.

lighttpd

lighttpd is an open-source web server optimized for speed-critical environments while remaining standards-compliant, secure and flexible. It was originally written by Jan Kneschke as a proof-of-concept of the c10k problem – how to handle 10,000 connections in parallel on one server, but has gained worldwide popularity. Its name is a portmanteau of "light" and "httpd".

<span class="mw-page-title-main">Transmission (BitTorrent client)</span> BitTorrent client

Transmission is a BitTorrent client which features a variety of user interfaces on top of a cross-platform back-end. Transmission is free software licensed under the terms of the GNU General Public License, with parts under the MIT License.

<span class="mw-page-title-main">Komodo Edit</span> Text editor for dynamic programming languages

Komodo Edit is a free and open source text editor for dynamic programming languages. It was introduced in January 2007 to complement ActiveState's commercial Komodo IDE. As of version 4.3, Komodo Edit is built atop the Open Komodo project. Komodo IDE is no longer supported and maintained by developers for Python.

vsftpd,, is an FTP server for Unix-like systems, including Linux. It is the default FTP server in the Ubuntu, CentOS, Fedora, NimbleX, Slackware and RHEL Linux distributions. It is licensed under the GNU General Public License. It supports IPv6, TLS and FTPS.

<span class="mw-page-title-main">Metalink</span> File format that describes one or more computer files available for download

Metalink is an extensible metadata file format that describes one or more computer files available for download. It specifies files appropriate for the user's language and operating system; facilitates file verification and recovery from data corruption; and lists alternate download sources.

<span class="mw-page-title-main">GNU Emacs</span> GNU version of the Emacs text editor

GNU Emacs is a free software text editor. It was created by GNU Project founder Richard Stallman, based on the Emacs editor developed for Unix operating systems. GNU Emacs has been a central component of the GNU project and a flagship project of the free software movement. Its tag line is "the extensible self-documenting text editor."

<span class="mw-page-title-main">GNU General Public License</span> Series of free software licenses

The GNU General Public License is a series of widely used free software licenses or copyleft that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general use and was originally written by Richard Stallman, the founder of the Free Software Foundation (FSF), for the GNU Project. The license grants the recipients of a computer program the rights of the Free Software Definition. These GPL series are all copyleft licenses, which means that any derivative work must be distributed under the same or equivalent license terms. It is more restrictive than the Lesser General Public License and even further distinct from the more widely used permissive software licenses BSD, MIT, and Apache.

Mbed TLS is an implementation of the TLS and SSL protocols and the respective cryptographic algorithms and support code required. It is distributed under the Apache License version 2.0. Stated on the website is that Mbed TLS aims to be "easy to understand, use, integrate and expand".

References

  1. Darshit Shah (11 May 2023). "wget-1.21.4 released" . Retrieved 11 May 2023.
  2. 1 2 "README file" . Retrieved 1 December 2014.
  3. 1 2 Scrivano, Giuseppe (6 August 2012). "GNU wget 1.14 released". GNU wget 1.14 released. Free Software Foundation, Inc. Retrieved 25 February 2016.
  4. 1 2 "GNU Wget NEWS – history of user-visible changes". Svn.dotsrc.org. 20 March 2005. Archived from the original on 13 March 2007. Retrieved 8 December 2012. Wget 1.4.0 [formerly known as Geturl] is an extensive rewrite of Geturl.
  5. "Wget - GNU Project". 30 November 2018. Retrieved 30 November 2018.
  6. "Why the FSF gets copyright assignments from contributors - GNU Project - Free Software Foundation (FSF)". Gnu.org. Retrieved 8 December 2012.
  7. 1 2 "Gmane Loom". News.gmane.org. Archived from the original on 9 January 2013. Retrieved 8 December 2012.
  8. "PatchGuidelines - The Wget Wgiki". Wget.addictivecode.org. 22 September 2009. Retrieved 8 December 2012.
  9. "RepositoryAccess". 31 July 2012. Retrieved 7 June 2013.
  10. "RepositoryAccess". 22 May 2010. Retrieved 20 June 2010.
  11. Niksic, Hrvoje (24 June 1996). "Geturl: Software for non-interactive downloading". comp.infosystems.www.announce. Retrieved 17 November 2016.
  12. "/wget/trunk : contents of NEWS at revision 2608". bzr.savannah.gnu.org.
  13. Tim Rühsen (31 August 2023). "wget2-2.1.0 released" . Retrieved 31 August 2023.
  14. 1 2 "GNU Wget2 2.0.0 released". gnu.org. 26 September 2021.
  15. "wget2". GitLab.
  16. "GWget Home Page". 22 March 2013.
  17. "zaidka/cliget". GitHub. Retrieved 25 August 2016.
  18. "Meet the cliget Developer :: Add-ons for Firefox". addons.mozilla.org. Retrieved 25 August 2016.
  19. "cliget". addons.mozilla.org. Retrieved 25 August 2016.
  20. "git.openwrt.org Git - project/uclient.git/summary". git.openwrt.org. Archived from the original on 6 October 2021. Retrieved 6 October 2021.
  21. 1 2 "landley/toybox". 6 October 2021 via GitHub.