Pubget

Last updated
Pubget
Founded Cambridge, MA, USA (2007)
Headquarters Boston, MA, USA
Key people
Ramy Arnaout,
Ian Connor,
Ryan Jones
Parent Copyright Clearance Center
Website www.pubget.com

Pubget Corp was a wholly owned subsidiary of Copyright Clearance Center that developed cloud-based search and content access tools for scientists. It provided advertising services, enterprise search services, and a public search engine. [1] The company was founded in 2007 by Beth Israel Hospital clinical pathologist Ramy Arnaout out of his own need to find papers. [2] [3] [4] Pubget moved its headquarters from Cambridge, Massachusetts to Boston's Innovation District in 2011. [4] [5]

Contents

Pubget.com was a free service for non-profit institutions and their libraries and researchers. The site provided direct access to full-text content from 450 libraries around the world. It was announced in January 2012 that Pubget was acquired by Copyright Clearance Center. [6] The service was closed in 2017.

Products and Services

Search Engine
Pubget's search engine retrieved article citations and full text PDFs from PubMed, ArXiv, Karger, American Society for Microbiology, IEEE, RSS feeds, XML from publishers, and Open Archive sources. [7] The company's search engine contained over 28 million scientific documents and added 10,000 papers each day. Pubget created a link directly from the article citation to the paper itself via a continuously updated database of links. [8] Because of this database, users were directly linked from a citation to the full-text paper.

Access to closed full-text PDFs was granted through the institution's subscriptions. Pubget did not bypass copyright laws and therefore displayed only the abstract of restricted papers if the end user did not have institutional access.

PaperStats
Pubget PaperStats was a usage and spend analysis tool for libraries. PaperStats automatically harvested serials usage statistics delivering consolidated usage, cost, and other reports directly from publishers. Content performance could be assessed through cost-per-view analysis. Upon introduction, PaperStats was beta tested with the USC Norris Medical Library and yielded positive results for Pubget, USC and the library community. [7] [9]

PaperStore
The Pubget PaperStore provided Pubget users the option of purchasing full text papers from thousands of journals on the search engine results page. Content rights and delivery were provided by document delivery vendor, Reprints Desk. [7]

Advertising
Pubget provided several advertising solutions. Customers included Bio-Rad, Agilent, and other scientific brands. Ads were matched with paper content via contextual targeting. For example, manufacturers of a piece of scientific equipment could pay to advertise alongside a paper that mentions using said product. [2] [10] Pubget, however, did not reveal data on individual users and their searches. [2]

Textmining
Pubget's textmining technology allowed research and development teams to uncover specific text strings across large groups of papers. [11]

PaperStream
PaperStream was a web app that allowed lab teams to share, store, and find documents all in one place. [12] PaperStream organized companies’ subscriptions, purchased papers, and internal documents into an automated library database. [13] [14]

API
Pubget's API provided access to its search and linking technology from third-party websites. [15] [16] [ irrelevant citation ]

Related Research Articles

<span class="mw-page-title-main">Streaming media</span> Multimedia delivery method

Streaming media refers to multimedia for playback using an offline or online media player that is delivered through a network. Media is transferred in a "stream" of packets from a server to a client and is rendered in real-time; this contrasts with file downloading, a process in which the end-user obtains an entire media file before consuming the content. Streaming is presently most prevalent in video-on-demand, streaming television, and music streaming services over the Internet.

CiteSeerX is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science.

In the context of the World Wide Web, deep linking is the use of a hyperlink that links to a specific, generally searchable or indexed, piece of web content on a website, rather than the website's home page. The URL contains all the information needed to point to a particular item. Deep linking is different from mobile deep linking, which refers to directly linking to in-app content using a non-HTTP URI.

<span class="mw-page-title-main">JSTOR</span> Distributor of ebooks and other digital media

JSTOR is a digital library of academic journals, books, and primary sources founded in 1994. Originally containing digitized back issues of academic journals, it now encompasses books and other primary sources as well as current issues of journals in the humanities and social sciences. It provides full-text searches of almost 2,000 journals. Most access is by subscription but some of the site is public domain, and open access content is available free of charge.

<span class="mw-page-title-main">Social Science Research Network</span> Repository for preprints

The Social Science Research Network (SSRN) is a repository for preprints devoted to the rapid dissemination of scholarly research in the social sciences, humanities, life sciences, and health sciences, among others. Elsevier bought SSRN from Social Science Electronic Publishing Inc. in May 2016. It is not an electronic journal, but rather an electronic library and search engine.

<span class="mw-page-title-main">Google Scholar</span> Academic search service by Google

Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes peer-reviewed online academic journals and books, conference papers, theses and dissertations, preprints, abstracts, technical reports, and other scholarly literature, including court opinions and patents.

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

<span class="mw-page-title-main">Google Books</span> Service from Google

Google Books is a service from Google that searches the full text of books and magazines that Google has scanned, converted to text using optical character recognition (OCR), and stored in its digital database. Books are provided either by publishers and authors through the Google Books Partner Program, or by Google's library partners through the Library Project. Additionally, Google has partnered with a number of magazine publishers to digitize their archives.

Google News Archive is an extension of Google News providing free access to scanned archives of newspapers and links to other newspaper archives on the web, both free and paid.

BRS/Search is a full-text database and information retrieval system. BRS/Search uses a fully inverted indexing system to store, locate, and retrieve unstructured data. It was the search engine that in 1977 powered Bibliographic Retrieval Services (BRS) commercial operations with 20 databases ; it has changed ownership several times during its development and is currently sold as Livelink ECM Discovery Server by Open Text Corporation.

HathiTrust Digital Library is a large-scale collaborative repository of digital content from research libraries including content digitized via Google Books and the Internet Archive digitization initiatives, as well as content digitized locally by libraries.

<span class="mw-page-title-main">Metadata</span> Data

Metadata is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:

<span class="mw-page-title-main">Digital library</span> Online database of digital objects stored in electronic media formats and accessible via computers

A digital library is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital media formats or a library accessible through the internet. Objects can consist of digitized content like print or photographs, as well as originally produced digital content like word processor files or social media posts. In addition to storing content, digital libraries provide means for organizing, searching, and retrieving the content contained in the collection. Digital libraries can vary immensely in size and scope, and can be maintained by individuals or organizations. The digital content may be stored locally, or accessed remotely via computer networks. These information retrieval systems are able to exchange information with each other through interoperability and sustainability.

In copyright law, the legal status of hyperlinking and that of framing concern how courts address two different but related Web technologies. In large part, the legal issues concern use of these technologies to create or facilitate public access to proprietary media content — such as portions of commercial websites. When hyperlinking and framing have the effect of distributing, and creating routes for the distribution of content (information) that does not come from the proprietors of the Web pages affected by these practices, the proprietors often seek the aid of courts to suppress the conduct, particularly when the effect of the conduct is to disrupt or circumvent the proprietors' mechanisms for receiving financial compensation.

<i>Authors Guild, Inc. v. Google, Inc.</i> U.S. copyright law case, 2015

Authors Guild v. Google 804 F.3d 202 was a copyright case heard in federal court for the Southern District of New York, and then the Second Circuit Court of Appeals between 2005 and 2015. It concerned fair use in copyright law and the transformation of printed copyrighted books into an online searchable database through scanning and digitization. It centered on the legality of the Google Book Search Library Partner project that had been launched in 2003.

Copyright Clearance Center (CCC) is a U.S. company based in Danvers, Massachusetts,, that provides collective copyright licensing services for corporate and academic users of copyrighted materials. CCC procures agreements with rightsholders, primarily academic publishers, and then acts as their agent in arranging collective licensing for institutions and one-time licensing for document delivery services, coursepacks, and other access and uses of texts.

<span class="mw-page-title-main">Trove</span> Australian online library database aggregator

Trove is an Australian online library database owned by the National Library of Australia in which it holds partnerships with source providers National and State Libraries Australia, an aggregator and service which includes full text documents, digital images, bibliographic and holdings data of items which are not available digitally, and a free faceted-search engine as a discovery tool.

An online video platform (OVP) enables users to upload, convert, store, and play back video content on the Internet, often via a private server structured, large-scale system that may generate revenue. Users will generally upload video content via the hosting service's website, mobile or desktop application, or other interfaces (API), and typically provides embed codes or links that allow others to view the video content.

<span class="mw-page-title-main">CORE (research service)</span>

CORE is a service provided by the Knowledge Media Institute based at The Open University, United Kingdom. The goal of the project is to aggregate all open access content distributed across different systems, such as repositories and open access journals, enrich this content using text mining and data mining, and provide free access to it through a set of services. The CORE project also aims to promote open access to scholarly outputs. CORE works closely with digital libraries and institutional repositories.

<span class="mw-page-title-main">Sci-Hub</span> Scientific research paper file sharing website

Sci-Hub is a shadow library website that provides free access to millions of research papers, regardless of copyright, by bypassing publishers' paywalls in various ways. Unlike Library Genesis, it does not provide access to books. Sci-Hub was founded in Kazakhstan by Alexandra Elbakyan in 2011, in response to the high cost of research papers behind paywalls. The site is extensively used worldwide. In September 2019, the site's operator(s) said that it served approximately 400,000 requests per day. In addition to its intensive use, Sci-Hub stands out among other shadow libraries because of its easy use/reliability and because of the enormous size of its collection; a 2018 study estimated that Sci-Hub provided access to 95% of all scholarly publications with issued DOI numbers. On 15 July 2022, Sci-Hub reported that its collection comprised 88,343,822 files. Since December 2020, the site has paused uploads due to legal troubles.

References

  1. "Pubget Everywhere". Pubget. Archived from the original on 16 July 2011. Retrieved 17 June 2011.
  2. 1 2 3 Davies, Kevin (10 June 2009). "Got PubMed? Pubget Searches and Delivers Scientific PDFs". Bio-IT World. Archived from the original on 1 June 2011. Retrieved 17 June 2011.
  3. "Founder's Friday: Pubget". Greenhorn Connect. 7 January 2011. Archived from the original on 3 June 2011. Retrieved 21 June 2011.
  4. 1 2 Goodison, Donna (28 May 2011). "Southie Firm Speeds Up Access to Research Papers". Boston Herald. Archived from the original on 18 June 2011. Retrieved 21 June 2011.
  5. "Welcome home, Pubget". Innovation District. 13 May 2011. Archived from the original on 18 June 2011. Retrieved 16 June 2011.
  6. "Copyright Clearance Center Acquires Pubget". 9 January 2012. Archived from the original on 28 October 2018. Retrieved 11 May 2020.
  7. 1 2 3 Featherstone, Robin; Hersey, Denise (4 October 2010). "The quest for full text: an in-depth examination of Pubget for medical searchers". Medical Reference Services Quarterly. 29 (4): 307–319. doi:10.1080/02763869.2010.518911. PMID   21058175. S2CID   36459379.
  8. Murray, P.E. (4 August 2009). "Analysis of Pubget – An Expedited Fulltext Service for Life Science Journal Articles". Disruptive Library Technology Jester. Archived from the original on 8 July 2011. Retrieved 21 June 2011.
  9. Curran, Megan (2 March 2011). "Debating Beta: Considerations for Libraries". Journal of Electronic Resources in Medical Libraries. 8 (2): 117–125. doi:10.1080/15424065.2011.576604. S2CID   62711345.
  10. "Media Kit: Pubget Ads" (PDF). Pubget, Inc. Archived (PDF) from the original on 26 March 2012. Retrieved 24 June 2011.
  11. "Textmining Fact Sheet" (PDF). Pubget, Inc. Archived (PDF) from the original on 26 March 2012. Retrieved 15 June 2011.
  12. "Pubget PaperStream". Pubget, Inc. Archived from the original on 2 October 2011. Retrieved 24 June 2011.
  13. "Pubget PaperStream For Companies". Pubget, Inc. Archived from the original on 24 June 2011. Retrieved 24 June 2011.
  14. "Pubget PaperStream For Researchers". Pubget, Inc. Archived from the original on 7 October 2011. Retrieved 24 June 2011.
  15. "PubgetCloud" (PDF). Pubget, Inc. Archived from the original (PDF) on March 26, 2012. Retrieved 16 June 2011.
  16. Munger, Dave (10 June 2009). "Pubget – Useful, Growing Resource for Anyone Interested in Research". Researchblogging News. Archived from the original on 14 March 2012. Retrieved 29 June 2011.