WebCite

Last updated

WebCite
WebCite.svg
Available in English
Owner University of Toronto [1]
Created by Gunther Eysenbach
URL WebCitation.org
CommercialNo
Launched1997;27 years ago (1997)
Current statusView historical archives only, no new archives

WebCite is an on-demand archive site, designed to digitally preserve scientific and educationally important material on the web by taking snapshots of Internet contents as they existed at the time when a blogger or a scholar cited or quoted from it. The preservation service enabled verifiability of claims supported by the cited sources even when the original web pages are being revised, removed, or disappear for other reasons, an effect known as link rot.

Contents

The site no longer accepts new archive requests; old archive snapshots can still be viewed.

Service features

WebCite allowed for preservation of all types of web content, including HTML web pages, PDF files, style sheets, JavaScript and digital images. It also archived metadata about the collected resources such as access time, MIME type, and content length.

WebCite was a non-profit consortium supported by publishers and editors,[ who? ] and it could be used by individuals without charge.[ clarification needed ] It was one of the first services to offer on-demand archiving of pages, a feature later adopted by many other archiving services, such as archive.today and the Wayback Machine. It did not do web page crawling.

History

Conceived in 1997 by Gunther Eysenbach, WebCite was publicly described the following year when an article on Internet quality control declared that such a service could also measure the citation impact of web pages. [2] In the next year, a pilot service was set up at the address webcite.net. Although it seemed that the need for WebCite decreased when Google's short term copies of web pages began to be offered by Google Cache and the Internet Archive expanded their crawling (which started in 1996), [3] WebCite was the only one allowing "on-demand" archiving by users. WebCite also offered interfaces to scholarly journals and publishers to automate the archiving of cited links. By 2008, over 200 journals had begun routinely using WebCite. [4]

WebCite was formerly a member of the International Internet Preservation Consortium. [1] In response a 2012 message on Twitter relating to WebCite's former membership of the consortium, Eysenbach commented that "WebCite has no funding, and IIPC charges €4000 per year in annual membership fees." [5]

WebCite "feeds its content" to other digital preservation projects, including the Internet Archive. [1] Lawrence Lessig, an American academic who writes extensively on copyright and technology, used WebCite in his amicus brief in the Supreme Court of the United States case of MGM Studios, Inc. v. Grokster, Ltd. [6]

Sometime between July 9 and 17, 2019, WebCite stopped accepting new archiving requests. [7] [ non-primary source needed ] In a further outage, between about October 29, 2021 and June 24, 2023, no archived content was available, only the main page worked.

Fundraising

WebCite ran a fund-raising campaign using FundRazr from January 2013 with a target of $22,500, a sum which its operators stated was needed to maintain and modernize the service beyond the end of 2013. [8] This includes relocating the service to Amazon EC2 cloud hosting and legal support. As of 2013 it remained undecided whether WebCite would continue as a non-profit or as a for-profit entity. [9]

Business model

The term "WebCite" is a registered trademark. [10] WebCite did not charge individual users, journal editors and publishers [11] any fee to use their service. WebCite earned revenue from publishers who wanted to "have their publications analyzed and cited webreferences archived". [1] Early support was from the University of Toronto. [1]

WebCite maintained the legal position that its archiving activities [4] are allowed by the copyright doctrines of fair use and implied license. [1] To support the fair use argument, WebCite noted that its archived copies are transformative, socially valuable for academic research, and not harmful to the market value of any copyrighted work. [1] WebCite argued that caching and archiving web pages was not considered a copyright infringement when the archiver offers the copyright owner an opportunity to "opt-out" of the archive system, thus creating an implied license. [1] To that end, WebCite would not archive in violation of Web site "do-not-cache" and "no-archive" metadata, as well as robot exclusion standards, the absence of which creates an "implied license" for web archive services to preserve the content. [1]

In a similar case involving Google's web caching activities, on January 19, 2006, the United States District Court for the District of Nevada agreed with that argument in the case of Field v. Google (CV-S-04-0413-RCJ-LRL), holding that fair use and an "implied license" meant that Google's caching of Web pages did not constitute copyright violation. [1] The "implied license" referred to general Internet standards. [1]

DMCA requests

According to their policy, after receiving legitimate DMCA requests from the copyright holders, WebCite would remove saved pages from public access, as the archived pages are still under the safe harbor of being citations. The pages were removed to a "dark archive" and in cases of legal controversies or evidence requests, there was pay-per-view access of "$200 (up to 5 snapshots) plus $100 for each further 10 snapshots" to the copyrighted content. [12]

See also

Related Research Articles

<span class="mw-page-title-main">World Wide Web</span> Linked hypertext system on the Internet

The World Wide Web is an information system that enables content sharing over the Internet through user-friendly ways meant to appeal to users beyond IT specialists and hobbyists. It allows documents and other web resources to be accessed over the Internet according to specific rules of the Hypertext Transfer Protocol (HTTP).

<span class="mw-page-title-main">World Wide Web Consortium</span> Main international standards organization for the World Wide Web

The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in the development of standards for the World Wide Web. As of 5 March 2023, W3C had 462 members. W3C also engages in education and outreach, develops software and serves as an open forum for discussion about the Web.

<span class="mw-page-title-main">Proxy server</span> Computer server that makes and receives requests on behalf of a user

In computer networking, a proxy server is a server application that acts as an intermediary between a client requesting a resource and the server providing that resource. It improves privacy, security, and performance in the process.

<span class="mw-page-title-main">Internet Archive</span> American nonprofit digital archive

The Internet Archive is an American nonprofit digital library founded in 1996 by Brewster Kahle. It provides free access to collections of digitized materials including websites, software applications, music, audiovisual and print materials. The Archive also advocates for a free and open Internet. As of February 4, 2024, the Internet Archive holds more than 44 million print materials, 10.6 million videos, 1 million software programs, 15 million audio files, 4.8 million images, 255,000 concerts, and over 835 billion web pages in its Wayback Machine. Its mission is committing to provide "universal access to all knowledge".

In the context of the World Wide Web, deep linking is the use of a hyperlink that links to a specific, generally searchable or indexed, piece of web content on a website, rather than the website's home page. The URL contains all the information needed to point to a particular item. Deep linking is different from mobile deep linking, which refers to directly linking to in-app content using a non-HTTP URI.

<span class="mw-page-title-main">Link rot</span> Phenomenon of URLs tending to cease functioning

Link rot is the phenomenon of hyperlinks tending over time to cease to point to their originally targeted file, web page, or server due to that resource being relocated to a new address or becoming permanently unavailable. A link that no longer points to its target, often called a broken, dead, or orphaned link, is a specific form of dangling pointer.

<span class="mw-page-title-main">Content delivery network</span> Layer in the internet ecosystem addressing bottlenecks

A content delivery network or content distribution network (CDN) is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance by distributing the service spatially relative to end users. CDNs came into existence in the late 1990s as a means for alleviating the performance bottlenecks of the Internet as the Internet was starting to become a mission-critical medium for people and enterprises. Since then, CDNs have grown to serve a large portion of the Internet content today, including web objects, downloadable objects, applications, live streaming media, on-demand streaming media, and social media sites.

Google Web Accelerator was a web accelerator produced by Google. It used client software installed on the user's computer, as well as data caching on Google's servers, to speed up page load times by means of data compression, prefetching of content, and sharing cached data between users. The beta, released on May 4, 2005, works with Mozilla Firefox 1.0+ and Internet Explorer 5.5+ on Windows 2000 SP3+, Windows XP, Windows Server 2003, Windows Vista and Windows 7 machines. It was discontinued in October 2008.

The ISC license is a permissive free software license published by the Internet Software Consortium, now called Internet Systems Consortium (ISC). It is functionally equivalent to the simplified BSD and MIT licenses, but without language deemed unnecessary following the Berne Convention.

Web archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Web archivists typically employ web crawlers for automated capture due to the massive size and amount of information on the Web. The largest web archiving organization based on a bulk crawling approach is the Wayback Machine, which strives to maintain an archive of the entire Web.

<span class="mw-page-title-main">Gunther Eysenbach</span> Canadian hesalthcare researcher

Gunther Eysenbach is a German-Canadian researcher on healthcare, especially health policy, eHealth, and consumer health informatics.

Peer-to-peer caching is a computer network traffic management technology used by Internet Service Providers (ISPs) to accelerate content delivered over peer-to-peer (P2P) networks while reducing related bandwidth costs.

<i>Perfect 10, Inc. v. Amazon.com, Inc.</i> 2007 American legal decision

Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146 was a case in the United States Court of Appeals for the Ninth Circuit involving a copyright infringement claim against Amazon.com, Inc. and Google, Inc., by the magazine publisher Perfect 10, Inc. The court held that framing and hyperlinking of original images for use in an image search engine constituted a fair use of Perfect 10's images because the use was highly transformative, and thus not an infringement of the magazine's copyright ownership of the original images.

Field v. Google, Inc., 412 F.Supp. 2d 1106 is a case where Google Inc. successfully defended a lawsuit for copyright infringement. Field argued that Google infringed his exclusive right to reproduce his copyrighted works when it "cached" his website and made a copy of it available on its search engine. Google raised multiple defenses: fair use, implied license, estoppel, and Digital Millennium Copyright Act safe harbor protection. The court granted Google's motion for summary judgment and denied Field's motion for summary judgment.

<span class="mw-page-title-main">Wayback Machine</span> Digital archive by the Internet Archive

The Wayback Machine is a digital archive of the World Wide Web founded by the Internet Archive, an American nonprofit organization based in San Francisco, California. Created in 1996 and launched to the public in 2001, it allows the user to go "back in time" to see how websites looked in the past. Its founders, Brewster Kahle and Bruce Gilliat, developed the Wayback Machine to provide "universal access to all knowledge" by preserving archived copies of defunct web pages.

Webarchiv is a digital archive of important Czech web resources, which are collected with the aim of their long-term preservation.

<span class="mw-page-title-main">International Internet Preservation Consortium</span> Organisation

The International Internet Preservation Consortium is an international organization of libraries and other organizations established to coordinate efforts to preserve internet content for the future. It was founded in July 2003 by 12 participating institutions, and had grown to 35 members by January 2010. As of January 2022, there are 52 members.

archive.today is a web archiving site, founded in 2012, that saves snapshots on demand, and has support for JavaScript-heavy sites, such as Google Maps, and progressive web apps, such as Twitter. archive.today records two snapshots: one replicates the original webpage including any functional live links; the other is a screenshot of the page.

Infodemiology was defined by Gunther Eysenbach in the early 2000s as information epidemiology. It is an area of science research focused on scanning the internet for user-contributed health-related content, with the ultimate goal of improving public health. Later, it is also defined as the science of mitigating public health problems resulting from an infodemic.

<span class="mw-page-title-main">Search engine cache</span>

Search engine cache is a cache of web pages that shows the page as it was when it was indexed by a web crawler. Cached versions of web pages can be used to view the contents of a page when the live version cannot be reached, has been altered or taken down.

References

  1. 1 2 3 4 5 6 7 8 9 10 11 "WebCite Consortium FAQ". WebCitation.org. WebCite. Archived from the original on August 11, 2021. Retrieved May 15, 2018 via Internet Archive.
  2. Eysenbach, Gunther; Diepgen, Thomas L. (November 28, 1998). "Towards quality management of medical information on the internet: evaluation, labelling, and filtering of information". The BMJ . 317 (7171): 1496–1502. doi:10.1136/bmj.317.7171.1496. ISSN   0959-8146. OCLC   206118688. PMC   1114339 . PMID   9831581. BL Shelfmark 2330.000000.
  3. "Fixing Broken Links on the Internet". Internet Archive blog. October 25, 2013.
  4. 1 2 Eysenbach, Gunther; Trudel, Mathieu (2005). "Going, Going, Still There: Using the WebCite Service to Permanently Archive Cited Web Pages". Journal of Medical Internet Research . 7 (5): e60. doi: 10.2196/jmir.7.5.e60 . ISSN   1438-8871. OCLC   107198227. PMC   1550686 . PMID   16403724.
  5. Eysenbach, Gunther [@eysenbach] (June 12, 2012). "@ReaderMeter @sennoma WebCite has no funding, and IIPC charges 4000 Euro/yr in membership fees" (Tweet). Archived from the original on January 3, 2022 via Twitter.
  6. Cohen, Norm (January 29, 2007). "Courts Turn to Wikipedia, but Selectively". The New York Times .
  7. "WebCite 17th July 2019". July 17, 2019. Archived from the original on July 17, 2019. Retrieved January 17, 2021.
  8. "Fund WebCite". Wikimedia Foundation . Retrieved December 6, 2013.
  9. "Conversation between GiveWell and WebCite on 4/10/13" (PDF). GiveWell . Retrieved October 18, 2009. Dr. Eysenbach is trying to decide whether WebCite should continue as a non-profit project or a business with revenue streams built into the system.
  10. "WebCite Legal and Copyright Information". WebCitation.org. WebCite. Archived from the original on July 25, 2008. Retrieved June 16, 2009.
  11. "WebCite Member List". WebCitation.org. WebCite Consortium. Archived from the original on July 25, 2008. Retrieved June 16, 2009. Membership is currently free
  12. "WebCite takedown requests policy". WebCitation.org. WebCite. Archived from the original on April 22, 2021. Retrieved May 14, 2017.