Archive.today

Last updated

archive.today
Archive.is-Screenshot.png
Screenshot of archive.today
Type of site
Web archiving
Available in Multilingual
URL
Alexa rankDecrease2.svg 7,641 (September 2019) [1]
CommercialNo
RegistrationNo
Launched2012;8 years ago (2012)

archive.today (formerly archive.is) is an archive site which stores snapshots of web pages. [2] It retrieves one page at a time similar to WebCite, smaller than 50MB each, but with support for JavaScript-heavy sites such as Google Maps and Twitter.

Contents

Archive.today uses headless browsing to record what embedded resources need to be captured to provide a high-quality memento, and creates a PNG image to provide a static and non-interactive visualization of the representation. [3]

Archive.today can capture individual pages in response to explicit user requests. [4] [5] [6]

Since July 2013, archive.today supports the Memento Project application programming interface (API). [7] [8]

History

Archive.today was founded in 2012. The site originally branded itself as archive.today, but in May 2015, changed the primary mirror to archive.is. [9] In January 2019, it began to deprecate the archive.is domain in favor of the archive.today mirror. [10]

Worldwide availability

Australia

In March 2019, the site was blocked for six month by several Australian internet providers in the aftermath of the Christchurch mosque shootings in an attempt to limit distribution of the footage of the attack. [11] [12] [13]

China

According to GreatFire.org, archive.today has been blocked in China since March 2016, [14] archive.li since September 2017, [15] and archive.fo since July 2018. [16]

Finland

On 21 July 2015, the operators blocked access to the service from all Finnish IP addresses, stating on Twitter that they did this in order to avoid escalating a dispute they allegedly had with the Finnish government. [17]

Russia

In Russia, only HTTP access is possible; HTTPS connections are blocked. [18] [19]

Worldwide

Archive.today currently blocks requests from Cloudflare's recursive DNS resolver, 1.1.1.1. [20]

Additionally, since late 2018, Archive.today has implemented a data cap limitation, presumably to help protect against denial-of-service attacks. Individual users can only archive and/or retrieve approximately 10 to 20 megabytes of data per day. After that limitation is reached, their web server blocks the individual user's IP address by no longer responding.[ citation needed ]

Features

Archive.today records only text and images, excluding video, xml, rtf, spreadsheet (xls or ods) and other non-static content. It keeps track of the history of snapshots saved, returning to the user a request for confirmation before adding a new snapshot of an already saved Internet address. [21]

Web pages cannot be duplicated from archive.is to web.archive.org as second-level backup, as archive.is places an exclusion for Wayback Machine[ why? ] [22] and don't save its snapshots in WARC format. The reverse - from web.archive.org to archive.is - is possible, [23] but the copy usually takes more time than a direct capture. Some web sites get deleted from Internet Archive's listings retroactively or blocked from being saved due to their robots.txt file, but Archive.today does not use this.

The research toolbar enables advanced keywords operators, using * as the wildcard character. A couple of quotation marks address the search to an exact sequence of keywords present in the title or in the body of the webpage, whereas the insite operator restricts it to a specific Internet domain. [24]

Once a web page is archived, it cannot be deleted directly by any Internet user. [25]

While saving a dynamic list, archive.today searchbox shows only a result that links the previous and the following section of the list (e.g. 20 links for page). [26] The other web pages saved are filtered, and sometimes may be found by one of their occurrences.

The search feature is backed by Google CustomSearch. If it delivers no results, archive.is attempts to utilize Yandex Search.

If a page has already been archived, archive.is asks the user to confirm archiving a new revision, instead of immediately archiving it.

One can download archived pages as a ZIP file, except pages archived since November 29th 2019, when Archive.Today changed their browser engine from PhantomJS to Chromium.

See also

Related Research Articles

World Wide Web System of interlinked hypertext documents accessed over the Internet

The World Wide Web (WWW), commonly known as the Web, is an information system where documents and other web resources are identified by Uniform Resource Locators, which may be interlinked by hypertext, and are accessible over the Internet. The resources of the WWW are transferred via the Hypertext Transfer Protocol (HTTP) and may be accessed by users by a software application called a web browser and are published by a software application called a web server.

Usage share of web browsers relative market adoption of web browsers

The usage share of web browsers is the proportion, often expressed as a percentage, of visitors to a group of web sites that use a particular web browser.

An HTTP cookie is a small piece of data sent from a website and stored on the user's computer by the user's web browser while the user is browsing. Cookies were designed to be a reliable mechanism for websites to remember stateful information or to record the user's browsing activity. They can also be used to remember arbitrary pieces of information that the user previously entered into form fields such as names, addresses, passwords, and credit-card numbers.

Internet censorship in India overview about the Internet censorship in India

Internet censorship in India is done by both central and state governments. DNS filtering and educating service users in suggested usages is an active strategy and government policy to regulate and block access to Internet content on a large scale. Also measures for removing content at the request of content creators through court orders have become more common in recent years. Initiating a mass surveillance government project like Golden Shield Project is also an alternative discussed over the years by government bodies.

WebCite is an on-demand archive site, designed to digitally preserve scientific and educationally important material on the web by making snapshots of Internet contents as they existed at the time when a blogger, or a scholar or a Wikipedia editor cited or quoted from it. The preservation service enables verifiability of claims supported by the cited sources even when the original web pages are being revised, removed, or disappear for other reasons, an effect known as link rot.

Internet censorship control or suppression of what can be accessed, published, or viewed on the internet

Internet censorship is the control or suppression of what can be accessed, published, or viewed on the Internet enacted by regulators, or on their own initiative. Individuals and organizations may engage in self-censorship for moral, religious, or business reasons, to conform to societal norms, due to intimidation, or out of fear of legal or other consequences.

Most Internet censorship in Thailand prior to the September 2006 military coup d'état was focused on blocking pornographic websites. The following years have seen a constant stream of sometimes violent protests, regional unrest, emergency decrees, a new cybercrimes law, and an updated Internal Security Act. Year by year Internet censorship has grown, with its focus shifting to lèse majesté, national security, and political issues. By 2010, estimates put the number of websites blocked at over 110,000. In December 2011, a dedicated government operation, the Cyber Security Operation Center, was opened. Between its opening and March 2014, the Center told ISPs to block 22,599 URLs.

GitHub, Inc. is a United States-based global company that provides hosting for software development version control using Git. It is a subsidiary of Microsoft, which acquired the company in 2018 for US$7.5 billion. It offers the distributed version control and source code management (SCM) functionality of Git, plus its own features. It provides access control and several collaboration features such as bug tracking, feature requests, task management, and wikis for every project.

Wayback Machine Web archive service

The Wayback Machine is a digital archive of the World Wide Web, founded by the Internet Archive, a nonprofit organization based in San Francisco. It allows the user to go “back in time” and see what websites looked like in the past. Its founders, Brewster Kahle and Bruce Gilliat, developed the Wayback Machine with the intention of providing "universal access to all knowledge" by preserving archived copies of defunct webpages.

Facebook has been replacing traditional media channels since its founding in 2003. Censorship in the media, especially on Facebook is because of a variety of reasons, since Facebook accepts all kinds of content, with little or no moderation, and displays what people post publicly, this practice can sometimes threaten oppressive governments especially in totalitarian regimes while also propelling fake news, hate speech and misinformation thus undermining the credibility of online platforms and social media. Many countries have banned or temporarily limited access to the social networking website Facebook, including Mainland China, Iran, Syria, and North Korea. Use of the website has also been restricted in various ways in other countries. As of May 2016, the only countries to ban access around the clock to the social networking site are China, Iran, Syria, and North Korea. However, since most North Korean residents have no access to the Internet, China and Iran are the only countries where access to Facebook is actively restricted in a wholesale manner.

Censorship of Twitter refers to Internet censorship by governments that block access to Twitter, or censorship by Twitter itself. Twitter censorship also includes governmental notice and take down requests to Twitter, which Twitter enforces in accordance with its Terms of Service when a government or authority submits a valid removal request to Twitter indicating that specific content is illegal in their jurisdiction.

Google+ was a social network owned and operated by Google. The network was launched on June 28, 2011, in an attempt to challenge other social networks, linking other Google products like Google Drive, Blogger and YouTube. The service, Google's fourth foray into social networking, experienced strong growth in its initial years, although usage statistics varied, depending on how the service was defined. Three Google executives oversaw the service, which underwent substantial changes that led to a redesign in November 2015.

Pinterest American photo sharing and publishing website

Pinterest is an American image sharing and social media service designed to enable saving and discovery of information on the World Wide Web using images and, on a smaller scale, GIFs and videos, in the form of pinboards. The site was created by Ben Silbermann, Paul Sciarra, and Evan Sharp and had 300 million monthly active users as of August 2019. It is operated by Pinterest, Inc., based in San Francisco.

Internet censorship circumvention is the use of various methods and tools to bypass internet censorship.

Medium (website) Online publishing platform

Medium is an online publishing platform developed by Evan Williams and launched in August 2012. It is owned by A Medium Corporation. The platform is an example of social journalism, having a hybrid collection of amateur and professional people and publications, or exclusive blogs or publishers on Medium, and is regularly regarded as a blog host.

Perma.cc is a web archiving service for legal and academic citations founded by the Harvard Library Innovation Lab in 2013.

Vivaldi (web browser) freeware web browser using the Blink browser engine

Vivaldi is a freeware, cross-platform web browser developed by Vivaldi Technologies, a company founded by Opera Software co-founder and former CEO Jon Stephenson von Tetzchner and Tatsuki Tomita. It was officially launched on April 6, 2016.

GitHub has been the target of censorship from governments using methods ranging from local Internet service provider blocks, intermediary blocking using methods such as DNS hijacking and man-in-the-middle attacks, and denial-of-service attacks on GitHub's servers from countries including China, India, Russia, and Turkey. In all of these cases, GitHub has been eventually unblocked after backlash from users and technology businesses or compliance from GitHub.

References

  1. "Archive.is Site Info". Site Info. Alexa Internet . Retrieved 14 July 2015.
  2. Martin Brinkmann, Martin (22 April 2015). "Create publicly available web page archives with Archive.is". Ghacks . Archived from the original on 12 April 2019. Retrieved 13 June 2015.
  3. Brunelle, Justin F.; Kelly, Mat; Weigle, Michele C.; Nelson, Michael L. (25 January 2015). "The impact of JavaScript on archivability" (PDF). International Journal on Digital Libraries. 17 (2): 95–117. doi:10.1007/s00799-015-0140-8. Archived (PDF) from the original on 27 May 2019.
  4. Dascalescu, Dan (18 February 2013). "Web page archiving – Dan Dascalescu's Wiki (review)". Wiki.dandascalescu.com. Retrieved 3 October 2013.
  5. Koebler, Jason (29 October 2014). "Dear GamerGate: Please Stop Stealing Our Shit". Motherboard . Archived from the original on 27 May 2019. Retrieved 22 March 2017. There is no way for a website to protect itself from having an Archive.today user mirror the site.
  6. "archive.is/faq". archive.is. Retrieved 15 February 2019.
  7. Nelson, Michael L. (9 July 2013). "Archive.is Supports Memento". Research and Teaching Updates. Web Science and Digital Libraries Research Group at Old Dominion University. Archived from the original on 27 July 2013. Retrieved 17 September 2013.
  8. "archive.is". Memento Protocol Information. Memento Development Group. Archived from the original on 15 September 2013. Retrieved 17 September 2013.
  9. "Why did you change the URL back from archive-today to archive-is?". Archive.is Blog. 3 May 2015. Archived from the original on 1 June 2015. Retrieved 6 January 2019.
  10. "Please do not use archive.IS mirror for linking". archive.today Twitter account. 4 January 2019. Archived from the original on 6 January 2019.
  11. "ISPs in AU and NZ start censoring the internet without legal precedent". Private Internet Access. 19 March 2019. Retrieved 20 March 2019.
  12. Rankovic, Dee (19 March 2019). "Australia joins New Zealand in eroding the digital civil liberties of its people in the wake of the recent terror attack". Reclaim The Net. Archived from the original on 23 March 2019. Retrieved 20 March 2019.
  13. "New Zealand ISPs Say They're Blocking Sites That Fail To Remove Christchurch Shooting Video". Gizmodo Australia . 19 March 2019. Archived from the original on 18 May 2019. Retrieved 20 March 2019.
  14. "archive.is is 100% blocked in China". GreatFire Analyzer. 12 August 2018. Archived from the original on 12 August 2018.
  15. "archive.li is 100% blocked in China". Great Fire Analyzer. 12 August 2018. Archived from the original on 12 August 2018.
  16. "archive.fo is 100% blocked in China". Great Fire Analyzer. 12 August 2018. Archived from the original on 12 August 2018.
  17. Lapintie, Lassi (22 July 2015). "Suomalaisilta estettiin haktivistien suosimalla verkkosivulla käynti" [Finns' access to website used by hacktivists blocked]. Iltalehti (in Finnish). Archived from the original on 27 May 2019. Retrieved 4 March 2016.
  18. Elistratov, Vladimir (29 January 2016). "Роскомнадзор заблокировал сервис archive..., хранящий копии веб-сайтов". TJournal (in Russian). Retrieved 30 January 2016.
  19. Cushing, Tim (4 February 2016). "Russia Blocks Another Archive Site Because It Might Contain Old Pages About Drugs". Techdirt . Archived from the original on 23 March 2019. Retrieved 26 February 2016.
  20. @archiveis (15 July 2018). "'Having to do' is not so direct here. Absence of EDNS and massive mismatch (not only on AS/Country, but even on the continent level) of where DNS and related HTTP requests come from causes so many troubles so I consider EDNS-less requests from Cloudflare as invalid" (Tweet) via Twitter.
  21. "Example snapshot history on archive.is".
  22. https://web.archive.org/save/http://archive.fo/19981202230410/http://google.com/
  23. "Example: Page saved from Web Archive to Archive.is". Archived from the original on 24 March 2019.
  24. For example, the string insite: https://en.wikipedia.org "World Cup" returns the "World+Cup"/ related snapshots
  25. "Some Frequently Asked Question". archive.is blog. 24 January 2013. Archived from the original on 26 September 2013. Retrieved 12 November 2018.
  26. "Example of dynamic list retrieved by Worldcat".