WebCite

Last updated

WebCite
WebCite.svg
Available inEnglish
Owner University of Toronto [1]
Created by Gunther Eysenbach
Website webcitation.org
Alexa rankIncrease Negative.svg 107,988 (December 2018) [2]
CommercialNo
Launched1997;22 years ago (1997)
Current statusOnline

WebCite is an on-demand archive site, designed to digitally preserve scientific and educationally important material on the web by making snapshots of Internet contents as they existed at the time when a blogger, or a scholar or a Wikipedia editor cited or quoted from it. The preservation service enables verifiability of claims supported by the cited sources even when the original web pages are being revised, removed, or disappear for other reasons, an effect known as link rot. [3]

In web archiving, an archive site is a website that stores information on webpages from the past for anyone to view.

Link rot is the process by which hyperlinks on individual websites or the Internet in general tend to point to web pages, servers or other resources that have become permanently unavailable. There is no reliable data on how long web pages and other resources survive: the estimates vary dramatically between different studies, as well as between different sets of links on which these studies are based.

Contents

As of October 2019, WebCite no longer accepts archiving requests.

Service features

All types of web content, including HTML web pages, PDF files, style sheets, JavaScript and digital images can be preserved. It also archives metadata about the collected resources such as access time, MIME type, and content length.

HTML Hypertext Markup Language

Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

The Portable Document Format (PDF) is a file format developed by Adobe in the 1990s to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF was standardized as ISO 32000 in 2008, and no longer requires any royalties for its implementation.

A web style sheet is a form of separation of presentation and content for web design in which the markup of a webpage contains the page's semantic content and structure, but does not define its visual layout (style). Instead, the style is defined in an external style sheet file using a style sheet language such as CSS or XSLT. This design approach is identified as a "separation" because it largely supersedes the antecedent methodology in which a page's markup defined both style and structure.

WebCite is a non-profit consortium supported by publishers and editors,[ who? ] and it can be used by individuals without charge.[ clarification needed ] It was one of the first services to offer on-demand archiving of pages, a feature later adopted by many other archiving services. It does not do web page crawling.

Consortium association of legal entities, usually businesses

A consortium is an association of two or more individuals, companies, organizations or governments with the objective of participating in a common activity or pooling their resources for achieving a common goal.

History

Conceived in 1997 by Gunther Eysenbach, WebCite was publicly described the following year when an article on Internet quality control declared that such a service could also measure the citation impact of web pages. [4] In the next year, a pilot service was set up at the address webcite.net. Although it seemed that the need for WebCite decreased when Google's short term copies of web pages began to be offered by Google Cache and the Internet Archive expanded their crawling (which started in 1996), [5] WebCite was the only one allowing "on-demand" archiving by users. WebCite also offered interfaces to scholarly journals and publishers to automate the archiving of cited links. By 2008, over 200 journals had begun routinely using WebCite. [6]

Gunther Eysenbach Canadian medical researcher

Gunther Eysenbach is a researcher on healthcare, especially health policy, eHealth, and consumer health informatics.

Quality control Project management process making sure produced products are good

Quality control (QC) is a process by which entities review the quality of all factors involved in production. ISO 9000 defines quality control as "A part of quality management focused on fulfilling quality requirements".

Citation impact quantifies the citation usage of scholarly works. It is a result of citation analysis or bibliometrics. Among the measures that have emerged from citation analysis are the citation counts for an individual article, an author, and an academic journal.

WebCite used to be, but is no longer, a member of the International Internet Preservation Consortium. [1] In a 2012 message on Twitter, Eysenbach commented that "WebCite has no funding, and IIPC charges €4000 per year in annual membership fees." [7]

The International Internet Preservation Consortium is an international organization of libraries and other organizations established to coordinate efforts to preserve internet content for the future. It was founded in July 2003 by 12 participating institutions, and had grown to 35 members by January 2010. As of October 2017, there are 54 members.

WebCite "feeds its content" to other digital preservation projects, including the Internet Archive. [1] Lawrence Lessig, an American academic who writes extensively on copyright and technology, used WebCite in his amicus brief in the Supreme Court of the United States case of MGM Studios, Inc. v. Grokster, Ltd. [8]

In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods and technologies, and it combines policies, strategies and actions to ensure access to reformatted and "born-digital" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time. The Association for Library Collections and Technical Services Preservation and Reformatting Section of the American Library Association, defined digital preservation as combination of "policies, strategies and actions that ensure access to digital content over time." According to the Harrod's Librarian Glossary, digital preservation is the method of keeping digital material alive so that they remain usable as technological advances render original hardware and software specification obsolete.

Internet Archive US non-profit organization founded in 1996 providing free public access to archives of digital and digitized media and advocating for a free and open Internet

The Internet Archive is an American digital library with the stated mission of "universal access to all knowledge." It provides free public access to collections of digitized materials, including websites, software applications/games, music, movies/videos, moving images, and millions of public-domain books. In addition to its archiving function, the Archive is an activist organization, advocating for a free and open Internet.

Lawrence Lessig American academic, political activist

Lester Lawrence Lessig III is an American academic, attorney, and political activist. He is the Roy L. Furman Professor of Law at Harvard Law School and the former director of the Edmond J. Safra Center for Ethics at Harvard University. Lessig was a candidate for the Democratic Party's nomination for president of the United States in the 2016 U.S. presidential election, but withdrew before the primaries.

Fundraising

WebCite ran a fund-raising campaign using FundRazr from January 2013 with a target of $22,500, a sum which its operators stated was needed to maintain and modernize the service beyond the end of 2013. [9] This includes relocating the service to Amazon EC2 cloud hosting and legal support. As of 2013 it remained undecided whether WebCite would continue as a non-profit or as a for-profit entity. [10] [11]

Usage

WebCite allows on-demand prospective archiving. It is not crawler-based; pages are only archived if the citing author or publisher requests it. No cached copy will appear in a WebCite search unless the author or another person has specifically cached it beforehand.

To initiate the caching and archiving of a page, an author may use WebCite's "archive" menu option or use a WebCite bookmarklet that will allow web surfers to cache pages just by clicking a button in their bookmarks folder. [12]

One can retrieve or cite archived pages through a transparent format such as

http://webcitation.org/query?url=URL&date=DATE

where URL is the URL that was archived, and DATE indicates the caching date. For example,

http://webcitation.org/query?url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FMain_Page&date=2008-03-04

or the alternate short form http://webcitation.org/5W56XTY5h retrieves an archived copy of the URL http://en.wikipedia.org/wiki/Main_Page that is closest to the date of March 4, 2008. The ID (5W56XTY5h) is the UNIX time in base 62.

WebCite does not work for pages which contain a no-cache tag. WebCite respects the author's request to not have their web page cached.

One can archive a page by simply navigating in their browser to a link formatted like this:

http://webcitation.org/archive?url=urltoarchive&email=youremail

Compared to Wayback Machine

https://web.archive.org/save/urltoarchive

replacing urltoarchive with the full URL of the page to be archived, and youremail with their e-mail address. This is how the WebCite bookmarklet works. [13]

Once archived on WebCite, users can try to create an independent second-level backup copy of the starting URL, saving a second time the new WebCite's domain URL on web.archive.org, and on archive.is. Users can more conveniently do this using a browser add-on for archiving. [14]

Business model

The term "WebCite" is a registered trademark. [15] WebCite does not charge individual users, journal editors and publishers [16] any fee to use their service. WebCite earns revenue from publishers who want to "have their publications analyzed and cited webreferences archived", [1] and accepts donations. Early support was from the University of Toronto. [1]

WebCite maintains the legal position that its archiving activities [6] are allowed by the copyright doctrines of fair use and implied license. [1] To support the fair use argument, WebCite notes that its archived copies are transformative, socially valuable for academic research, and not harmful to the market value of any copyrighted work. [1] WebCite argues that caching and archiving web pages is not considered a copyright infringement when the archiver offers the copyright owner an opportunity to "opt-out" of the archive system, thus creating an implied license. [1] To that end, WebCite will not archive in violation of Web site "do-not-cache" and "no-archive" metadata, as well as robot exclusion standards, the absence of which creates an "implied license" for web archive services to preserve the content. [1]

In a similar case involving Google's web caching activities, on January 19, 2006, the United States District Court for the District of Nevada agreed with that argument in the case of Field v. Google (CV-S-04-0413-RCJ-LRL), holding that fair use and an "implied license" meant that Google's caching of Web pages did not constitute copyright violation. [1] The "implied license" referred to general Internet standards. [1]

DMCA requests

According to their policy, after receiving legitimate DMCA requests from the copyright holders, WebCite removes saved pages from public access, as the archived pages are still under the safe harbor of being citations. The pages are removed to a "dark archive" and in cases of legal controversies or evidence requests there is pay-per-view access of "$200 (up to 5 snapshots) plus $100 for each further 10 snapshots" to the copyrighted content. [17]

See also

Related Research Articles

World Wide Web System of interlinked hypertext documents accessed over the Internet

The World Wide Web (WWW), commonly known as the Web, is an information system where documents and other web resources are identified by Uniform Resource Locators, which may be interlinked by hypertext, and are accessible over the Internet. The resources of the WWW may be accessed by users by a software application called a web browser.

Proxy server server that acts as an intermediate between a client and its destination server

In computer networks, a proxy server is a server that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource available from a different server and the proxy server evaluates the request as a way to simplify and control its complexity. Proxies were invented to add structure and encapsulation to distributed systems.

Bookmarklet bookmark stored in a web browser that contains JavaScript commands that add new features to the browser

A bookmarklet is a bookmark stored in a web browser that contains JavaScript commands that add new features to the browser. Bookmarklets are unobtrusive JavaScripts stored as the URL of a bookmark in a web browser or as a hyperlink on a web page. Bookmarklets are usually JavaScript programs. Regardless of whether bookmarklet utilities are stored as bookmarks or hyperlinks, they add one-click functions to a browser or web page. When clicked, a bookmarklet performs one of a wide variety of operations, such as running a search query or extracting data from a table. For example, clicking on a bookmarklet after selecting text on a webpage could run an Internet search on the selected text and display a search engine results page.

In the context of the World Wide Web, deep linking is the use of a hyperlink that links to a specific, generally searchable or indexed, piece of web content on a website, rather than the website's home page. The URL contains all the information needed to point to a particular item, in this case the "Example" section of the English Wikipedia article entitled "Deep linking", as opposed to only the information needed to point to the highest-level home page of Wikipedia.

A Web cache is an information technology for the temporary storage (caching) of Web documents, such as Web pages, images, and other types of Web multimedia, to reduce server lag. A Web cache system stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met. A Web cache system can refer either to an appliance or to a computer program.

URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened. Similarly, domain redirection or domain forwarding is when all pages in a URL domain are redirected to a different domain, as when wikipedia.com and wikipedia.net are automatically redirected to wikipedia.org.

A persistent uniform resource locator (PURL) is a uniform resource locator (URL) that is used to redirect to the location of the requested web resource. PURLs redirect HTTP clients using HTTP status codes.

Turnitin is an American commercial, Internet-based plagiarism detection service launched in 1997. Universities and high schools typically buy licenses to use the software as a service (SaaS) website, which checks submitted documents against its database and the content of other websites with the aim of identifying plagiarism. Results can identify similarities with existing sources, and can also be used in formative assessment to help students learn to avoid plagiarism and improve their writing.

LOCKSS peer-to-peer network that develops and supports an open source system allowing libraries to collect, preserve and provide their readers with access to material published on the Web

The LOCKSS project, under the auspices of Stanford University, is a peer-to-peer network that develops and supports an open source system allowing libraries to collect, preserve and provide their readers with access to material published on the Web. Its main goal is digital preservation.

The ISC license is a permissive free software license published by the Internet Software Consortium, nowadays called Internet Systems Consortium (ISC). It is functionally equivalent to the simplified BSD and MIT licenses, but without language deemed unnecessary following the Berne Convention.

Web archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Web archivists typically employ web crawlers for automated capture due to the massive size and amount of information on the Web. The largest web archiving organization based on a bulk crawling approach is the Wayback Machine, which strives to maintain an archive of the entire Web.

The Journal of Medical Internet Research is a peer-reviewed open-access medical journal established in 1999 covering eHealth and "healthcare in the Internet age". The editor-in-chief is Gunther Eysenbach. The publisher is JMIR Publications.

Field v. Google, Inc., 412 F.Supp. 2d 1106 is a case where Google Inc. successfully defended a lawsuit for copyright infringement. Field argued that Google infringed his exclusive right to reproduce his copyrighted works when it "cached" his website and made a copy of it available on its search engine. Google raised multiple defenses: fair use, implied license, estoppel, and Digital Millennium Copyright Act safe harbor protection. The court granted Google's motion for summary judgment and denied Field's motion for summary judgment.

Wayback Machine Web archive service

The Wayback Machine is a digital archive of the World Wide Web and other information on the Internet. It was launched in 2001 by the Internet Archive, a nonprofit organization based in San Francisco, California, United States.

A Uniform Resource Locator (URL), colloquially termed a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifier (URI), although many people use the two terms interchangeably. Thus http://www.example.com is a URL, while www.example.com is not.</ref> URLs occur most commonly to reference web pages (http), but are also used for file transfer (ftp), email (mailto), database access (JDBC), and many other applications.

References

  1. 1 2 3 4 5 6 7 8 9 10 11 "WebCite Consortium FAQ". WebCitation.org. WebCite via archive.org.
  2. "Webcitation.org Traffic, Demographics and Competitors - Alexa". www.alexa.com. Retrieved December 2, 2018.
  3. Habibzadeh, P.; Sciences, Schattauer GmbH - Publishers for Medicine and Natural (January 1, 2013). "Decay of References to Web sites in Articles Published in General Medical Journals: Mainstream vs Small Journals". Applied Clinical Informatics. 4 (4): 455–464. doi:10.4338/aci-2013-07-ra-0055. PMC   3885908 . PMID   24454575.
  4. Eysenbach, Gunther; Diepgen, Thomas L. (November 28, 1998). "Towards quality management of medical information on the internet: evaluation, labelling, and filtering of information". The BMJ . 317 (7171): 1496–1502. doi:10.1136/bmj.317.7171.1496. ISSN   0959-8146. OCLC   206118688. PMC   1114339 . PMID   9831581. BL Shelfmark 2330.000000.
  5. Fixing Broken Links on the Internet , Internet Archive blog, October 25, 2013.
  6. 1 2 Eysenbach, Gunther; Trudel, Mathieu (2005). "Going, Going, Still There: Using the WebCite Service to Permanently Archive Cited Web Pages". Journal of Medical Internet Research . 7 (5): e60. doi:10.2196/jmir.7.5.e60. ISSN   1438-8871. OCLC   107198227. PMC   1550686 . PMID   16403724.
  7. "Twitter post". June 11, 2012. Archived from the original on March 5, 2016. Retrieved 2013-03-10.
  8. Cohen, Norm (January 29, 2007). "Courts Turn to Wikipedia, but Selectively". The New York Times .
  9. "Fund WebCite". Wikimedia Foundation . Retrieved December 6, 2013.
  10. "Conversation between GiveWell and Webcite on 4/10/13" (PDF). GiveWell . Retrieved October 18, 2009. Dr. Eysenbach is trying to decide whether Webcite should continue as a non-profit project or a business with revenue streams built into the system.
  11. Compare: Habibzadeh, Parham (July 30, 2015). "Are current archiving systems reliable enough?". International Urogynecology Journal. 26 (10): 1553. doi:10.1007/s00192-015-2805-7. ISSN   0937-3462. PMID   26224384. Besides Perma, there are many other preserving systems. WebCite is another one[...].
  12. WebCite Best Practices Guide .pdf
  13. "WebCite Bookmarklet". WebCitation.org. WebCite. Retrieved 2017-05-14.
  14. "GitHub - rahiel/archiveror: Archiveror will help you preserve the webpages you love". GitHub. Retrieved December 12, 2018.
  15. "WebCite Legal and Copyright Information". WebCitation.org. WebCite. Archived from the original on July 25, 2008. Retrieved June 16, 2009.
  16. "WebCite Member List". WebCitation.org. WebCite Consortium. Archived from the original on July 25, 2008. Retrieved June 16, 2009. Membership is currently free
  17. "WebCite takedown requests policy". WebCitation.org. WebCite. Retrieved 2017-05-14.