Archive.today

Last updated

archive.today
Archive.is-Screenshot.png
Screenshot of archive.today
Type of site
Web archiving
Available in Multilingual
URL
CommercialYes [1]
RegistrationNo
LaunchedMay 16, 2012;9 years ago (2012-05-16) [2] [3]

archive.today (formerly archive.is) is an archive site which stores snapshots of web pages. [4] It retrieves one page at a time similar to WebCite, smaller than 50MB each, but with support for JavaScript-heavy sites such as Google Maps and progressive web applications such as Twitter.

Contents

Archive.today records simultaneously two different 'snapshots' of a web-page. One is "Webpage" which includes any functional live links that are in the original. The other is "Screenshot" which provides a static and non-interactive visualization of the representation. [5]

Features

Functionality

Archive.today can capture individual pages in response to explicit user requests. [6] [7] [8] Since its beginning, Archive.Today has supported crawling pages with URLs containing a now-deprecated hash-bang fragment (#!). [9]

Archive.today records only text and images, excluding XML, RTF, spreadsheet (xls or ods) and other non-static content. However, videos for certain sites, like Twitter, are saved. [10] It keeps track of the history of snapshots saved, returning to the user a request for confirmation before adding a new snapshot of an already saved Internet address. [11]

Pages are captured with 1024 pixels of browser width. CSS is converted to inline CSS, removing responsive web design and selectors such as :hover and :active. Content generated using JavaScript during the crawling process appears in a frozen state. [12] HTML class names are preserved inside the old-class attribute.

When text is selected, a JavaScript applet[ clarification needed ] generates a URL fragment seen in the browser's URL bar that automatically highlights that portion of the text when visited again.

Web pages cannot be duplicated from archive.is to web.archive.org as second-level backup, as archive.is places an exclusion for Wayback Machine and does not save its snapshots in WARC format. The reverse—from web.archive.org to archive.is—is possible, [13] but the copy usually takes more time than a direct capture. Some web sites get deleted from Internet Archive's listings retroactively or blocked from being saved due to their robots.txt file, but Archive.today does not use this. [14]

The research toolbar enables advanced keywords operators, using * as the wildcard character. A couple of quotation marks address the search to an exact sequence of keywords present in the title or in the body of the webpage, whereas the insite operator restricts it to a specific Internet domain. [15]

Once a web page is archived, it cannot be deleted directly by any Internet user. [16]

While saving a dynamic list, archive.today searchbox shows only a result that links the previous and the following section of the list (e.g. 20 links for page). [17] The other web pages saved are filtered, and sometimes may be found by one of their occurrences.[ citation needed ]

The search feature is backed by Google CustomSearch. If it delivers no results, archive.is attempts to utilize Yandex Search.[ citation needed ]

If a page has already been archived, archive.is asks the user to confirm archiving a new revision, instead of immediately archiving it.[ citation needed ]

While loading a page, a list of URLs to individual page elements among their content sizes, HTTP statuses and MIME types is shown. This list can only be viewed during the crawling process.[ citation needed ]

One can download archived pages as a ZIP file, except pages archived since 29 November 2019, when Archive.Today changed their browser engine from PhantomJS to Chromium. [18]

Since July 2013 archive.today supports the Memento Project application programming interface (API). [19] [20]

History

Archive.today was founded in 2012. The site originally branded itself as archive.today, but in May 2015, changed the primary mirror to archive.is. [21]

In January 2019, it began to deprecate the archive.is domain in favor of the archive.today mirror. [22]

Worldwide availability

Australia

In March 2019, the site was blocked for six months by several Australian internet providers in the aftermath of the Christchurch mosque shootings in an attempt to limit distribution of the footage of the attack. [23] [24] It is still blocked in Australia as of July 2021.

China

According to GreatFire.org, archive.today has been blocked in China since March 2016, [25] archive.li since September 2017, [26] and archive.fo since July 2018. [27]

Finland

On 21 July 2015, the operators blocked access to the service from all Finnish IP addresses, stating on Twitter that they did this in order to avoid escalating a dispute they allegedly had with the Finnish government. [28]

Russia

In Russia, only HTTP access is possible; HTTPS connections are blocked. [29] [30]

See also

Related Research Articles

JavaScript, often abbreviated as JS, is a programming language that conforms to the ECMAScript specification. JavaScript is high-level, often just-in-time compiled and multi-paradigm. It has dynamic typing, prototype-based object-orientation and first-class functions.

World Wide Web System of interlinked hypertext documents accessed over the Internet

The World Wide Web (WWW), commonly known as the Web, is an information system where documents and other web resources are identified by Uniform Resource Locators, which may be interlinked by hyperlinks, and are accessible over the Internet. The resources of the Web are transferred via the Hypertext Transfer Protocol (HTTP), may be accessed by users by a software application called a web browser, and are published by a software application called a web server. The World Wide Web is not synonymous with the Internet, which pre-dated the Web in some form by over two decades and upon the technologies of which the Web is built.

Cross-site scripting Computer security vulnerability

Cross-site scripting (XSS) is a type of security vulnerability that can be found in some web applications. XSS attacks enable attackers to inject client-side scripts into web pages viewed by other users. A cross-site scripting vulnerability may be used by attackers to bypass access controls such as the same-origin policy. Cross-site scripting carried out on websites accounted for roughly 84% of all security vulnerabilities documented by Symantec up until 2007. XSS effects vary in range from petty nuisance to significant security risk, depending on the sensitivity of the data handled by the vulnerable site and the nature of any security mitigation implemented by the site's owner network.

Googlebot Web crawler used by Google

Googlebot is the web crawler software used by Google that collects documents from the web to build a searchable index for the Google Search engine. This name is actually used to refer to two different types of web crawlers: a desktop crawler and a mobile crawler.

Pop-up ads or pop-ups are forms of online advertising on the World Wide Web. A pop-up is a graphical user interface (GUI) display area, usually a small window, that suddenly appears in the foreground of the visual interface. The pop-up window containing an advertisement is usually generated by JavaScript that uses cross-site scripting (XSS), sometimes with a secondary payload that uses Adobe Flash. They can also be generated by other vulnerabilities/security holes in browser security.

Greasemonkey Userscript manager extension for Firefox

Greasemonkey is a userscript manager made available as a Mozilla Firefox extension. It enables users to install scripts that make on-the-fly changes to web page content after or before the page is loaded in the browser.

URL shortening is a technique on the World Wide Web in which a Uniform Resource Locator (URL) may be made substantially shorter and still direct to the required page. This is achieved by using a redirect which links to the web page that has a long URL. For example, the URL "https://example.com/assets/category_B/subcategory_C/Foo/" can be shortened to "https://example.com/Foo", and the URL "https://en.wikipedia.org/wiki/URL_shortening" can be shortened to "https://w.wiki/U". Often the redirect domain name is shorter than the original one. A friendly URL may be desired for messaging technologies that limit the number of characters in a message, for reducing the amount of typing required if the reader is copying a URL from a print source, for making it easier for a person to remember, or for the intention of a permalink. In November 2009, the shortened links of the URL shortening service Bitly were accessed 2.1 billion times.

HTTP cookies are small blocks of data created by a web server while a user is browsing a website and placed on the user's computer or other device by the user’s web browser. Cookies are placed on the device used to access a website, and more than one cookie may be placed on a user’s device during a session.

Bookmark (digital) Internet resource address stored for later retrieval

In the context of the World Wide Web, a bookmark is a Uniform Resource Identifier (URI) that is stored for later retrieval in any of various storage formats. All modern web browsers include bookmark features. Bookmarks are called favorites or Internet shortcuts in Internet Explorer, and by virtue of that browser's large market share, these terms have been synonymous with bookmark since the first browser war. Bookmarks are normally accessed through a menu in the user's web browser, and folders are commonly used for organization. In addition to bookmarking methods within most browsers, many external applications offer bookmark management.

Internet Channel Internet browser made for the Nintendo Wii

The Internet Channel is a version of the Opera 9 web browser for use on the Wii by Opera Software and Nintendo. Opera Software also implemented the Nintendo DS Browser for Nintendo's handheld system.

WebCite is an on-demand archive site, designed to digitally preserve scientific and educationally important material on the web by making snapshots of Internet contents as they existed at the time when a blogger or a scholar cited or quoted from it. The preservation service enables verifiability of claims supported by the cited sources even when the original web pages are being revised, removed, or disappear for other reasons, an effect known as link rot.

Features of the Opera web browser List of software application features

This article details features of the Opera web browser.

Internet Explorer 9 version of Internet Explorer

Internet Explorer 9 or IE9 is the ninth version of the Internet Explorer web browser from Microsoft. It was released to the public on March 14, 2011. It and older versions of Internet Explorer are no longer supported. Microsoft released Internet Explorer 9 as a major out-of-band version that was not tied to the release schedule of any particular version of Windows, unlike previous versions. It is the first version since Internet Explorer 2 not to be bundled with a Windows operating system, although some OEMs have installed it with Windows 7 on their PCs, as well as new Windows 7 laptops.

Mibbit Web based IRC client

Mibbit is a web-based client for web browsers that supports Internet Relay Chat (IRC), Yahoo! Messenger, and Twitter. It is developed by Jimmy Moore and is designed around the Ajax model with a user interface written in JavaScript. It is the IRC application setup by default on Firefox.

Google Chrome Web browser developed by Google

Google Chrome is a cross-platform web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. It was later ported to Linux, macOS, iOS, and Android, where it is the default browser. The browser is also the main component of Chrome OS, where it serves as the platform for web applications.

Wayback Machine Digital archive founded by the Internet Archive

The Wayback Machine is a digital archive of the World Wide Web. It was founded by the Internet Archive, a nonprofit library based in San Francisco, California. Created in 1996 and launched to the public in 2001, it allows the user to go "back in time" and see how websites looked in the past. Its founders, Brewster Kahle and Bruce Gilliat, developed the Wayback Machine to provide "universal access to all knowledge" by preserving archived copies of defunct web pages.

Google+ was a social network owned and operated by Google. The network was launched on June 28, 2011, in an attempt to challenge other social networks, linking other Google products like Google Drive, Blogger and YouTube. The service, Google's fourth foray into social networking, experienced strong growth in its initial years, although usage statistics varied, depending on how the service was defined. Three Google executives oversaw the service, which underwent substantial changes that led to a redesign in November 2015.

Dart is a programming language designed for client development, such as for the web and mobile apps. It is developed by Google and can also be used to build server and desktop applications.

Perma.cc is a web archiving service for legal and academic citations founded by the Harvard Library Innovation Lab in 2013.

References

  1. Archive.today page with ads at the Wayback Machine (archived 2021-03-07)
  2. Archive.is blog—When did the Archive-is site originally launch? at archive.today(archived 20 March 2021)
  3. Archive.is — Викиреальность at archive.today(archived 29 April 2021)
  4. Brinkmann, Martin (22 April 2015). "Create publicly available web page archives with Archive.is". Ghacks . Archived from the original on 12 April 2019. Retrieved 13 June 2015.
  5. Brunelle, Justin F.; Kelly, Mat; Weigle, Michele C.; Nelson, Michael L. (25 January 2015). "The impact of JavaScript on archivability" (PDF). International Journal on Digital Libraries. 17 (2): 95–117. doi:10.1007/s00799-015-0140-8. S2CID   8433375. Archived (PDF) from the original on 27 May 2019.
  6. Dascalescu, Dan (18 February 2013). "Web page archiving – Dan Dascalescu's Wiki (review)". Wiki.dandascalescu.com. Archived from the original on 22 September 2013. Retrieved 3 October 2013.
  7. Koebler, Jason (29 October 2014). "Dear GamerGate: Please Stop Stealing Our Shit". Motherboard . Archived from the original on 27 May 2019. Retrieved 22 March 2017. There is no way for a website to protect itself from having an Archive.today user mirror the site.
  8. "archive.is/faq". archive.is. Retrieved 15 February 2019.
  9. "Home page of Archive.is in 2013". Archived from the original on 12 January 2013. It can save pages from Web 2.0 sites even with hashbang URLs, for example http://twitter.com/#!/medvedevrussia
  10. "Archive.today blog".
  11. "Example snapshot history on archive.is".
  12. JavaScript-generated loading animation of Dailymotion video appearing in a frozen state
  13. "Example: Page saved from Web Archive to Archive.is" (in Spanish). Archived from the original on 20 May 2013. Retrieved 23 October 2019.
  14. "Archive.today FAQ".
  15. For example, the string insite: https://en.wikipedia.org "World Cup" returns the "World+Cup"/ related snapshots
  16. "Some Frequently Asked Question". archive.is blog. 24 January 2013. Archived from the original on 26 September 2013. Retrieved 12 November 2018.
  17. "Example of dynamic list retrieved by Worldcat".
  18. "Archive.is blog". 17 July 2020. Archived from the original on 3 October 2020.
  19. Nelson, Michael L. (9 July 2013). "Archive.is Supports Memento". Research and Teaching Updates. Web Science and Digital Libraries Research Group at Old Dominion University. Archived from the original on 27 July 2013. Retrieved 17 September 2013.
  20. "archive.is". Memento Protocol Information. Memento Development Group. Archived from the original on 15 September 2013. Retrieved 17 September 2013.
  21. "Why did you change the URL back from archive-today to archive-is?". Archive.is Blog. 3 May 2015. Archived from the original on 1 June 2015. Retrieved 6 January 2019.
  22. @archiveis (4 January 2019). "Please do not use archive.IS mirror for linking, use others mirrors [.TODAY .FO .LI .VN .MD .PH]. .IS might stop working soon" (Tweet). Archived from the original on 6 January 2019 via Twitter.
  23. "ISPs in AU and NZ start censoring the internet without legal precedent". Private Internet Access. 19 March 2019. Retrieved 20 March 2019.
  24. "New Zealand ISPs Say They're Blocking Sites That Fail To Remove Christchurch Shooting Video". Gizmodo Australia . 19 March 2019. Archived from the original on 18 May 2019. Retrieved 20 March 2019.
  25. "archive.is is 100% blocked in China". GreatFire Analyzer. 12 August 2018. Archived from the original on 12 August 2018.
  26. "archive.li is 100% blocked in China". Great Fire Analyzer. 12 August 2018. Archived from the original on 12 August 2018.
  27. "archive.fo is 100% blocked in China". Great Fire Analyzer. 12 August 2018. Archived from the original on 12 August 2018.
  28. Lapintie, Lassi (22 July 2015). "Suomalaisilta estettiin haktivistien suosimalla verkkosivulla käynti" [Finns' access to website used by hacktivists blocked]. Iltalehti (in Finnish). Archived from the original on 27 May 2019. Retrieved 4 March 2016.
  29. Elistratov, Vladimir (29 January 2016). "Roskomnadzor zablokiroval servis archive.is, khranyashchiy kopii veb-saytov" Роскомнадзор заблокировал сервис archive.is, хранящий копии веб-сайтов. TJournal (in Russian). Archived from the original on 30 August 2017. Retrieved 30 January 2016.
  30. Cushing, Tim (4 February 2016). "Russia Blocks Another Archive Site Because It Might Contain Old Pages About Drugs". Techdirt . Archived from the original on 23 March 2019. Retrieved 26 February 2016.