Archive.today

Last updated

archive.today
Archive.is-Screenshot.png
Screenshot of the archive.today home page
Type of site
Web archiving
Available in Multilingual
URL
RegistrationNo
LaunchedMay 16, 2012;11 years ago (2012-05-16) [2]

archive.today (or archive.is) is a web archiving site, founded in 2012, that saves snapshots on demand, and has support for JavaScript-heavy sites, such as Google Maps, and progressive web apps, such as X. [3] archive.today records two snapshots: one replicates the original webpage including any functional live links; the other is a screenshot of the page. [4]

Contents

The website does not provide information on the identity of the operator(s). [5]

History

Archive.today was founded in 2012. The site originally branded itself as archive.today, but in May 2015, changed the primary mirror to archive.is. [6]

In January 2019, it began to deprecate the archive.is domain in favor of other mirrors. [7]

Features

Functionality

Archive.today can capture individual pages in response to explicit user requests. [8] [9] [10] Since its beginning, it has supported crawling pages with URLs containing the now-deprecated hash-bang fragment (#!). [11]

Archive.today records only text and images, excluding XML, RTF, spreadsheet (xls or ods) and other non-static content. However, videos for certain sites, like Twitter, are saved. [12] It keeps track of the history of snapshots saved, requesting confirmation before adding a new snapshot of an already saved page. [13] [14]

Pages are captured at a browser width of 1,024 pixels. CSS is converted to inline CSS, removing responsive web design and selectors such as :hover and :active. Content generated using JavaScript during the crawling process appears in a frozen state. [15] HTML class names are preserved inside the old-class attribute. When text is selected, a JavaScript applet generates a URL fragment seen in the browser's address bar that automatically highlights that portion of the text when visited again.

Web pages cannot be duplicated from archive.today to web.archive.org as second-level backup, as archive.today places an exclusion for Wayback Machine and does not save its snapshots in WARC format. The reverse—from web.archive.org to archive.today—is possible, [16] but the copy usually takes more time than a direct capture. Some web sites get deleted from Internet Archive's listings retroactively or blocked from being saved due to their robots.txt file, but archive.today does not use this. [10]

The research toolbar enables advanced keywords operators, using * as the wildcard character. A couple of quotation marks address the search to an exact sequence of keywords present in the title or in the body of the webpage, whereas the insite operator restricts it to a specific Internet domain. [17]

Once a web page is archived, it cannot be deleted directly by any Internet user. [18] Removing advertisements, popups or expanding links from archived pages is possible by asking the owner to do it on his blog. [19]

While saving a dynamic list, archive.today search box shows only a result that links the previous and the following section of the list (e.g. 20 links for page). [20] The other web pages saved are filtered, and sometimes may be found by one of their occurrences. [13] [ clarification needed ]

The search feature is backed by Google CustomSearch. If it delivers no results, archive.today attempts to utilize Yandex Search. [21]

While saving a page, a list of URLs for individual page elements and their content sizes, HTTP statuses and MIME types is shown. This list can only be viewed during the crawling process.

One can download archived pages as a ZIP file, except pages archived since 29 November 2019, when archive.today changed their browser engine from PhantomJS to Chromium. [22]

In July 2013, Archive.today began supporting the API of the Memento Project. [23] [24]

Worldwide availability

Australia and New Zealand

In March 2019, the site was blocked for six months by several internet providers in Australia and New Zealand in the aftermath of the Christchurch mosque shootings in an attempt to limit distribution of the footage of the attack. [25] [26] It has since been unblocked[ citation needed ].

China

According to GreatFire.org, archive.today has been blocked in mainland China since March 2016, [27] archive.li since September 2017, [28] archive.fo since July 2018, [29] as well as archive.ph since December 2019. [30]

Finland

On 21 July 2015, the operators blocked access to the service from all Finnish IP addresses, stating on Twitter that they did this in order to avoid escalating a dispute they allegedly had with the Finnish government. [31]

Russia

In Russia, only HTTP access is possible; HTTPS connections are blocked. [32] [33] HTTP is not encrypted, contrary to HTTPS, therefore agents listening on the network can read and modify in-transit the whole communication, including the URL of the page requested, the returned content, and strings that identify the sender device (such as the User-Agent and cookies).

Cloudflare DNS availability

Between May 2018 [34] and May 2022, [35] Cloudflare's 1.1.1.1 DNS service would not resolve archive.today's web addresses, making it inaccessible to users of the Cloudflare DNS service. Both organizations claimed the other was responsible for the issue. Cloudflare staff stated that the problem was on archive.today's DNS infrastructure, as its authoritative nameservers return invalid records when Cloudflare's network systems made requests to archive.today. archive.today countered that the issue was due to Cloudflare requests not being compliant with DNS standards, as Cloudflare does not send EDNS Client Subnet information in its DNS requests. [36] [37] The issue was subsequently resolved.[ citation needed ][ needs update ]

See also

Related Research Articles

<span class="mw-page-title-main">World Wide Web</span> Linked hypertext system on the Internet

The World Wide Web is an information system that enables content sharing over the Internet through user-friendly ways meant to appeal to users beyond IT specialists and hobbyists. It allows documents and other web resources to be accessed over the Internet according to specific rules of the Hypertext Transfer Protocol (HTTP).

<span class="mw-page-title-main">Denial-of-service attack</span> Type of cyber-attack

In computing, a denial-of-service attack is a cyber-attack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host connected to a network. Denial of service is typically accomplished by flooding the targeted machine or resource with superfluous requests in an attempt to overload systems and prevent some or all legitimate requests from being fulfilled. The range of attacks varies widely, spanning from inundating a server with millions of requests to slow its performance, overwhelming a server with a substantial amount of invalid data, to submitting requests with an illegitimate IP address.

<span class="mw-page-title-main">Firefox</span> Free and open-source web browser by Mozilla

Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements current and anticipated web standards. Firefox is available for Windows 10 or later versions, macOS, and Linux. Its unofficial ports are available for various Unix and Unix-like operating systems, including FreeBSD, OpenBSD, NetBSD, illumos, and Solaris Unix. It is also available for Android and iOS. However, as with all other iOS web browsers, the iOS version uses the WebKit layout engine instead of Gecko due to platform requirements. An optimized version is also available on the Amazon Fire TV as one of the two main browsers available with Amazon's Silk Browser.

<span class="mw-page-title-main">Content delivery network</span> Layer in the internet ecosystem addressing bottlenecks

A content delivery network or content distribution network (CDN) is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance by distributing the service spatially relative to end users. CDNs came into existence in the late 1990s as a means for alleviating the performance bottlenecks of the Internet as the Internet was starting to become a mission-critical medium for people and enterprises. Since then, CDNs have grown to serve a large portion of the Internet content today, including web objects, downloadable objects, applications, live streaming media, on-demand streaming media, and social media sites.

Ad blocking or ad filtering is a software capability for blocking or altering online advertising in a web browser, an application or a network. This may be done using browser extensions or other methods.

<span class="mw-page-title-main">HTTP cookie</span> Small pieces of data stored by a web browser while on a website

HTTP cookies are small blocks of data created by a web server while a user is browsing a website and placed on the user's computer or other device by the user's web browser. Cookies are placed on the device used to access a website, and more than one cookie may be placed on a user's device during a session.

WebCite is an on-demand archive site, designed to digitally preserve scientific and educationally important material on the web by taking snapshots of Internet contents as they existed at the time when a blogger or a scholar cited or quoted from it. The preservation service enabled verifiability of claims supported by the cited sources even when the original web pages are being revised, removed, or disappear for other reasons, an effect known as link rot.

Server Name Indication (SNI) is an extension to the Transport Layer Security (TLS) computer networking protocol by which a client indicates which hostname it is attempting to connect to at the start of the handshaking process. The extension allows a server to present one of multiple possible certificates on the same IP address and TCP port number and hence allows multiple secure (HTTPS) websites to be served by the same IP address without requiring all those sites to use the same certificate. It is the conceptual equivalent to HTTP/1.1 name-based virtual hosting, but for HTTPS. This also allows a proxy to forward client traffic to the right server during TLS/SSL handshake. The desired hostname is not encrypted in the original SNI extension, so an eavesdropper can see which site is being requested. The SNI extension was specified in 2003 in RFC 3546

<span class="mw-page-title-main">Google Chrome</span> Web browser developed by Google

Google Chrome is a web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS, iOS, and also for Android, where it is the default browser. The browser is also the main component of ChromeOS, where it serves as the platform for web applications.

<span class="mw-page-title-main">Wayback Machine</span> Digital archive founded by the Internet Archive

The Wayback Machine is a digital archive of the World Wide Web founded by the Internet Archive, an American nonprofit organization based in San Francisco, California. Created in 1996 and launched to the public in 2001, it allows the user to go "back in time" to see how websites looked in the past. Its founders, Brewster Kahle and Bruce Gilliat, developed the Wayback Machine to provide "universal access to all knowledge" by preserving archived copies of defunct web pages.

<span class="mw-page-title-main">Cloudflare</span> American technology company

Cloudflare, Inc. is an American company that provides content delivery network services, cloud cybersecurity, DDoS mitigation, and ICANN-accredited domain registration services. Cloudflare's headquarters are in San Francisco, California. According to The Hill, Cloudflare is used by more than 20 percent of the Internet for its web security services, as of 2022.

Internet censorship circumvention, also referred to as going over the wall or scientific browsing in China, is the use of various methods and tools to bypass internet censorship.

<span class="mw-page-title-main">Perma.cc</span> Web archiving service for legal and academic publications

Perma.cc is a web archiving service for legal and academic citations founded by the Harvard Library Innovation Lab in 2013.

<span class="mw-page-title-main">Domain fronting</span> Technique for Internet censorship circumvention

Domain fronting is a technique for Internet censorship circumvention that uses different domain names in different communication layers of an HTTPS connection to discreetly connect to a different target domain than is discernable to third parties monitoring the requests and connections.

DNS over HTTPS (DoH) is a protocol for performing remote Domain Name System (DNS) resolution via the HTTPS protocol. A goal of the method is to increase user privacy and security by preventing eavesdropping and manipulation of DNS data by man-in-the-middle attacks by using the HTTPS protocol to encrypt the data between the DoH client and the DoH-based DNS resolver. By March 2018, Google and the Mozilla Foundation had started testing versions of DNS over HTTPS. In February 2020, Firefox switched to DNS over HTTPS by default for users in the United States.

EDNS Client Subnet (ECS) is an option in the Extension Mechanisms for DNS that allows a recursive DNS resolver to specify the subnetwork for the host or client on whose behalf it is making a DNS query. This is generally intended to help speed up the delivery of data from content delivery networks (CDNs), by allowing better use of DNS-based load balancing to select a service address near the client when the client computer is not necessarily near the recursive resolver.

DNS over TLS (DoT) is a network security protocol for encrypting and wrapping Domain Name System (DNS) queries and answers via the Transport Layer Security (TLS) protocol. The goal of the method is to increase user privacy and security by preventing eavesdropping and manipulation of DNS data via man-in-the-middle attacks. The well-known port number for DoT is 853.

HTTP/3 is the third major version of the Hypertext Transfer Protocol used to exchange information on the World Wide Web, complementing the widely-deployed HTTP/1.1 and HTTP/2. Unlike previous versions which relied on the well-established TCP, HTTP/3 uses QUIC, a multiplexed transport protocol built on UDP. On 6 June 2022, IETF published HTTP/3 as a Proposed Standard in RFC 9114.

1.1.1.1 is a free Domain Name System (DNS) service by the American company Cloudflare in partnership with APNIC. The service functions as a recursive name server, providing domain name resolution for any host on the Internet. The service was announced on April 1, 2018. On November 11, 2018, Cloudflare announced a mobile application of their 1.1.1.1 service for Android and iOS. On September 25, 2019, Cloudflare released WARP, an upgraded version of their original 1.1.1.1 mobile application.

References

  1. @archiveis (29 October 2019). "a current list of all tor domains and clear net domains" (Tweet) via Twitter.
  2. Archive.is blog (18 February 2014). "When did the Archive-is site originally launch?". Tumblr. Archived from the original on 20 March 2021. Retrieved 10 April 2021.
  3. Brinkmann, Martin (22 April 2015). "Create publicly available web page archives with Archive.is". Ghacks . Archived from the original on 12 April 2019. Retrieved 13 June 2015.
  4. Brunelle, Justin F.; Kelly, Mat; Weigle, Michele C.; Nelson, Michael L. (25 January 2015). "The impact of JavaScript on archivability" (PDF). International Journal on Digital Libraries. 17 (2): 95–117. doi:10.1007/s00799-015-0140-8. S2CID   8433375. Archived (PDF) from the original on 27 May 2019.
  5. Patokallio, Jani (5 August 2023). "archive.today: On the trail of the mysterious guerrilla archivist of the Internet". Gyrovague. Retrieved 1 January 2024.
  6. "Why did you change the URL back from archive-today to archive-is?". Archive.is Blog. 3 May 2015. Archived from the original on 1 June 2015. Retrieved 6 January 2019.
  7. @archiveis (4 January 2019). "Please do not use archive.IS mirror for linking, use others mirrors [.TODAY .FO .LI .VN .MD .PH]. .IS might stop working soon" (Tweet). Archived from the original on 6 January 2019 via Twitter.
  8. Dascalescu, Dan (18 February 2013). "Web page archiving – Dan Dascalescu's Wiki (review)". Wiki.dandascalescu.com. Archived from the original on 22 September 2013. Retrieved 3 October 2013.
  9. Koebler, Jason (29 October 2014). "Dear GamerGate: Please Stop Stealing Our Shit". Motherboard . Archived from the original on 27 May 2019. Retrieved 22 March 2017. There is no way for a website to protect itself from having an Archive.today user mirror the site.
  10. 1 2 "Archive.today FAQ". archive.today. Retrieved 15 February 2019.
  11. "Home page of Archive.is in 2013". Archived from the original on 12 January 2013.
  12. "Archive.today blog". Archived from the original on 7 September 2021.
  13. 1 2 Archiving Websites with the Archive.is, archived from the original on 27 January 2022, retrieved 27 January 2022
  14. "Example snapshot history on archive.is".
  15. JavaScript-generated loading animation of Dailymotion video appearing in a frozen state
  16. "Example: Page saved from Web Archive to Archive.is" (in Spanish). Archived from the original on 20 May 2013. Retrieved 23 October 2019.
  17. For example, the string insite: https://en.wikipedia.org "World Cup" returns the "World+Cup"/ related snapshots
  18. "Some Frequently Asked Question" (blog). archive.is. 24 January 2013. Archived from the original on 26 September 2013. Retrieved 12 November 2018.
  19. "Example user request on the Archive.is blog". Archive.is blog. Archived from the original on 29 April 2022. Retrieved 7 April 2022.
  20. Example of dynamic list: "au:"thomas aquinas"". WorldCat. Archived from the original on 23 March 2019. Retrieved 15 December 2018.
  21. "Just realized that I can search for keywords in the search bar for archive today, was this a recently added feature?". Archive.is blog. 18 January 2022. Archived from the original on 27 January 2022. Retrieved 27 January 2022.
  22. "The "download zip" button has been giving a "Not found" error for quite some time". Archive.is blog. 17 July 2020. Archived from the original on 3 October 2020.
  23. Nelson, Michael L. (9 July 2013). "Archive.is Supports Memento". Research and Teaching Updates. Web Science and Digital Libraries Research Group at Old Dominion University. Archived from the original on 27 July 2013. Retrieved 17 September 2013.
  24. "archive.is". Memento Protocol Information. Memento Development Group. Archived from the original on 15 September 2013. Retrieved 17 September 2013.
  25. "ISPs in AU and NZ start censoring the internet without legal precedent". Private Internet Access. 19 March 2019. Archived from the original on 28 April 2023. Retrieved 20 March 2019.
  26. "New Zealand ISPs Say They're Blocking Sites That Fail To Remove Christchurch Shooting Video". Gizmodo Australia . 19 March 2019. Archived from the original on 18 May 2019. Retrieved 20 March 2019.
  27. "archive.is is 100% blocked in China". GreatFire Analyzer. 12 August 2018. Archived from the original on 12 August 2018.
  28. "archive.li is 100% blocked in China". Great Fire Analyzer. 12 August 2018. Archived from the original on 12 August 2018.
  29. "archive.fo is 100% blocked in China". Great Fire Analyzer. 12 August 2018. Archived from the original on 12 August 2018.
  30. "archive.ph is 100% blocked in China". en.greatfire.org. Archived from the original on 29 April 2022. Retrieved 7 April 2022.
  31. Lapintie, Lassi (22 July 2015). "Suomalaisilta estettiin haktivistien suosimalla verkkosivulla käynti" [Finns' access to website used by hacktivists blocked]. Iltalehti (in Finnish). Archived from the original on 27 May 2019. Retrieved 4 March 2016.
  32. Elistratov, Vladimir (29 January 2016). "Roskomnadzor zablokiroval servis archive.is, khranyashchiy kopii veb-saytov" Роскомнадзор заблокировал сервис archive.is, хранящий копии веб-сайтов. TJournal (in Russian). Archived from the original on 30 August 2017. Retrieved 30 January 2016.
  33. Cushing, Tim (4 February 2016). "Russia Blocks Another Archive Site Because It Might Contain Old Pages About Drugs". Techdirt . Archived from the original on 23 March 2019. Retrieved 26 February 2016.
  34. "Archive.is – Error 1001". Cloudflare Community. 15 May 2018. Archived from the original on 2 December 2021. Retrieved 2 December 2021.
  35. "Archive.today works again on 1.1.1.1 (and archive.{ph,is,li,vn,fo,md})". Cloudflare Community. 22 May 2022. Retrieved 12 March 2023.
  36. @archiveis (16 July 2018). ""Having to do" is not so direct here. Absence of EDNS and massive mismatch (not only on AS/Country, but even on the continent level) of where DNS and related HTTP requests come from causes so many troubles so I consider EDNS-less requests from Cloudflare as invalid" (Tweet) via Twitter.
  37. "Comment by Matthew Prince on Hacker News". Hacker News . 4 May 2019. Archived from the original on 13 May 2022. Retrieved 4 October 2021.