HTTrack

Last updated
Developer(s) Xavier Roche [1]
Initial releaseMay 1998;25 years ago (1998-05) [2]
Stable release
3.49.5 [3] / 27 January 2024;2 months ago (27 January 2024)
Repository
Written in C
Operating system Microsoft Windows, macOS, Linux, FreeBSD and Android [4]
Type Offline browser and Web crawler
License GNU General Public License Version 3
Website www.httrack.com

HTTrack is a free and open-source Web crawler and offline browser, developed by Xavier Roche and licensed under the GNU General Public License Version 3.

Contents

HTTrack allows users to download World Wide Web sites from the Internet to a local computer. [5] [6] By default, HTTrack arranges the downloaded site by the original site's relative link-structure. The downloaded (or "mirrored") website can be browsed by opening a page of the site in a browser.

HTTrack can also update an existing mirrored site and resume interrupted downloads. HTTrack is configurable by options and by filters (include/exclude), and has an integrated help system. There is a basic command line version and two GUI versions (WinHTTrack and WebHTTrack); the former can be part of scripts and cron jobs.

HTTrack uses a Web crawler to download a website. Some parts of the website may not be downloaded by default due to the robots exclusion protocol unless disabled during the program. HTTrack can follow links that are generated with basic JavaScript and inside Applets or Flash, but not complex links (generated using functions or expressions) or server-side image maps.

See also

Related Research Articles

<span class="mw-page-title-main">Konqueror</span> Web browser and file manager

Konqueror is a free and open-source web browser and file manager that provides web access and file-viewer functionality for file systems. It forms a core part of the KDE Software Compilation. Developed by volunteers, Konqueror can run on most Unix-like operating systems. The KDE community licenses and distributes Konqueror under GNU GPL-2.0-or-later.

<span class="mw-page-title-main">Web crawler</span> Software which systematically browses the World Wide Web

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing.

<span class="mw-page-title-main">Googlebot</span> Web crawler used by Google

Googlebot is the web crawler software used by Google that collects documents from the web to build a searchable index for the Google Search engine. This name is actually used to refer to two different types of web crawlers: a desktop crawler and a mobile crawler.

<span class="mw-page-title-main">Wget</span> Computer command line program.

GNU Wget is a computer program that retrieves content from web servers. It is part of the GNU Project. Its name derives from "World Wide Web" and "get". It supports downloading via HTTP, HTTPS, and FTP.

The deep web, invisible web, or hidden web are parts of the World Wide Web whose contents are not indexed by standard web search-engine programs. This is in contrast to the "surface web", which is accessible to anyone using the Internet. Computer scientist Michael K. Bergman is credited with inventing the term in 2001 as a search-indexing term.

Fusker is a type of website or utility that extracts images in bulk from a website by systematically loading and downloading images following a pattern in the website's URL scheme. Fusking or fuskering is often used to extract private and nude photos without consent of the owner.

<span class="mw-page-title-main">Notepad++</span> Text editor and source code editor for Windows

Notepad++ is a free and open-source text and source code editor for use with Microsoft Windows. It supports tabbed editing, which allows working with multiple open files in a single window. The product's name comes from the C postfix increment operator; it is sometimes referred to as npp or NPP.

Mozilla Firefox has features which distinguish it from other web browsers, such as Google Chrome, Safari, and Microsoft Edge.

FlashGot was an add-on for Firefox that allowed interoperability between the Firefox browser and external download managers. It is no longer compatible with later versions of Firefox. It is not itself a download manager but is designed to allow the Firefox interface to be extended to connect to the selected external download manager. This avoided launching the download manager as an independent application and cutting and pasting the across the links of the files that need to be downloaded. Forked browsers like Pale Moon and Waterfox are also supported.

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

Browser hijacking is a form of unwanted software that modifies a web browser's settings without a user's permission, to inject unwanted advertising into the user's browser. A browser hijacker may replace the existing home page, error page, or search engine with its own. These are generally used to force hits to a particular website, increasing its advertising revenue.

Free Download Manager is a download manager for Windows, macOS, Linux and Android.

An offline reader is computer software that downloads e-mail, newsgroup posts or web pages, making them available when the computer is offline: not connected to a server. Offline readers are useful for portable computers and dial-up access.

<span class="mw-page-title-main">Features of the Opera web browser</span> List of software application features

This article details features of the Opera web browser.

IE7Pro is an add-on for Internet Explorer 6, 7, and 8 that aims to enhance the feature set provided by the browser. IE7Pro adds features such as tab enhancement, an ad blocker and flash blocker, mouse gestures, inline search, privacy enhancements, online bookmark service, Greasemonkey-like user script support, and plug-in support. IE7Pro is available in several languages – this is made possible by user translations.

<span class="mw-page-title-main">CCleaner</span> Suite of utilities for cleaning disk and operating system environment

CCleaner, developed by Piriform Software, is a utility used to clean potentially unwanted files and invalid Windows Registry entries from a computer. It is one of the longest-established system cleaners, first launched in 2004. It was originally developed for Microsoft Windows only, but in 2012, a macOS version was released. An Android version was released in 2014.

<span class="mw-page-title-main">Web typography</span> Publishing considerations for the Web

Web typography, like typography generally, is the design of pages – their layout and typeface choices. Unlike traditional print-based typography, pages intended for display on the World Wide Web have additional technical challenges and – given its ability to change the presentation dynamically – additional opportunities. Early web page designs were very simple due to technology limitations; modern designs use Cascading Style Sheets (CSS), JavaScript and other techniques to deliver the typographer's and the client's vision.

A userscript is a program, usually written in JavaScript, for modifying web pages to augment browsing. Uses include adding shortcut buttons and keyboard shortcuts, controlling playback speeds, adding features to sites, and enhancing the browsing history.

<span class="mw-page-title-main">Torch (web browser)</span> Proprietary, adware supported web browser

Torch was a Chromium-based web browser and Internet suite developed by the North Carolina–based Torch Media. As of November 2022, downloads for Torch are no longer available, and upon clicking the download button, users are redirected to the Torch Search extension on the Chrome Web Store.

archive.today is a web archiving site, founded in 2012, that saves snapshots on demand, and has support for JavaScript-heavy sites, such as Google Maps, and progressive web apps, such as Twitter. archive.today records two snapshots: one replicates the original webpage including any functional live links; the other is a screenshot of the page.

References

  1. Credits: Greetings & authors
  2. Roche, Xavier (February 8, 2014). "Re: Full History of HTTrack". HTTrack Forum. Retrieved November 21, 2016. The first release was in May 1998, but only as binaries.
  3. "Release 3.49.5".
  4. HTTrack on Google Play
  5. Engebretson, Patrick (2011). The Basics of Hacking and Penetration Testing. Elsevier. pp. 19–22. ISBN   9781597496568.
  6. Beaver, Kevin (2012). Hacking For Dummies. John Wiley & Sons. pp. 278, 280–281. ISBN   9781118380963.