Dataverse

Last updated

The Dataverse is an open source web application to share, preserve, cite, explore and analyze research data. [1] [2] Researchers, data authors, publishers, data distributors, and affiliated institutions all receive appropriate credit via a data citation with a persistent identifier (e.g., DOI, or handle).

Contents

A Dataverse repository hosts multiple dataverses. Each dataverse contains dataset(s) or other dataverses, and each dataset contains descriptive metadata and data files (including documentation and code that accompany the data).

In 2019, Dataverse won the Duke's Choice Award for university and higher education. [3]

Background

The Dataverse Project is housed and developed by the Dataverse Team at the Institute for Quantitative Social Science (IQSS) at Harvard University. Coding of the Dataverse (previously known as Dataverse Network) software began in 2006 under the leadership of Mercè Crosas and Gary King. The earlier Virtual Data Center (VDC) project, which spanned 1999-2006, was organized by Micah Altman, Gary King, and Sidney Verba as a collaboration between the Harvard-MIT Data Center (now part of IQSS) and the Harvard University Library. Precursors to the VDC date to 1987, comprising such entities as a stand-alone software guide to local data, preweb software, and tools to transfer cataloging information by FTP to other sites across campus automatically at designated times. [4]

Installations

Harvard Dataverse

A collaboration with the Institute for Quantitative Social Science (IQSS), the Harvard Library, and Harvard University Information Technology (HUIT): the Harvard Dataverse is a repository for sharing, citing, analyzing, and preserving research data. It is open to all scientific data from all disciplines worldwide.

Dataverse in Europe

Dataverse is also installed in the countries of the European Union to preserve data collected by research communities of Netherlands, Germany, France and Finland. The largest Dataverse repository is called DataverseNL [5] and located in the Netherlands providing data management services for 11 Dutch Universities. A similar service is being developed in Norway (cf. DataverseNO [6] ).

Dataverse in Canada

In Canada, [7] Borealis is a national instance of the Dataverse repository hosted by OCUL's Scholars Portal at the University of Toronto. [8] Borealis allows institutions to offer a Dataverse service without operating and maintaining the software themselves. Most academic institutions offering a Dataverse service in Canada subscribe to the Borealis service. The associated community of practice is organized through the Digital Research Alliance of Canada's Network of Experts via the Dataverse North Expert Group, [9] a coordination, collaboration and communication instance.

Dataverse installations around the world

There are several other Dataverse repositories installed in Universities and organizations around the world. Here is a list of some Dataverse repositories:

APIs and interoperability

The Dataverse currently has multiple open APIs available, which allow for searching, depositing and accessing data.

Alternatives and similar projects

DSpace is often compared with Dataverse and is used for storing scientific data. CKAN provides similar functions and is widely used for open data.

See also

Related Research Articles

An institutional repository (IR) is an archive for collecting, preserving, and disseminating digital copies of the intellectual output of an institution, particularly a research institution. Academics also utilize their IRs for archiving published works to increase their visibility and collaboration with other academics However, most of these outputs produced by universities are not effectively accessed and shared by researchers and other stakeholders As a result academics should be involved in the implementation and development of an IR project so that they can learn the benefits and purpose of building an IR.

The California Digital Library (CDL) was founded by the University of California in 1997. Under the leadership of then UC President Richard C. Atkinson, the CDL's original mission was to forge a better system for scholarly information management and improved support for teaching and research. In collaboration with the ten University of California Libraries and other partners, CDL assembled one of the world's largest digital research libraries. CDL facilitates the licensing of online materials and develops shared services used throughout the UC system. Building on the foundations of the Melvyl Catalog, CDL has developed one of the largest online library catalogs in the country and works in partnership with the UC campuses to bring the treasures of California's libraries, museums, and cultural heritage organizations to the world. CDL continues to explore how services such as digital curation, scholarly publishing, archiving and preservation support research throughout the information lifecycle.

<span class="mw-page-title-main">Open data</span> Openly accessible data

Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license.

<span class="mw-page-title-main">Public Knowledge Project</span> Metadata reservation project for e-journals

The Public Knowledge Project (PKP) is a non-profit research initiative that is focused on the importance of making the results of publicly funded research freely available through open access policies, and on developing strategies for making this possible including software solutions. It is a partnership between the Faculty of Education at the University of British Columbia, the Canadian Centre for Studies in Publishing at Simon Fraser University, the University of Pittsburgh, Ontario Council of University Libraries, the California Digital Library and the School of Education at Stanford University. It seeks to improve the scholarly and public quality of academic research through the development of innovative online environments.

Research data archiving is the long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences. The various academic journals have differing policies regarding how much of their data and methods researchers are required to store in a public archive, and what is actually archived varies widely between different disciplines. Similarly, the major grant-giving institutions have varying attitudes towards public archival of data. In general, the tradition of science has been for publications to contain sufficient information to allow fellow researchers to replicate and therefore test the research. In recent years this approach has become increasingly strained as research in some areas depends on large datasets which cannot easily be replicated independently.

<span class="mw-page-title-main">German National Library of Economics</span> Research library of economics

The National Library of Economics is the world's largest research infrastructure for economic literature, online as well as offline. The ZBW is a member of the Leibniz Association and has been a foundation under public law since 2007. Several times the ZBW received the international LIBER Award for its innovative work in librarianship. The ZBW allows for access of millions of documents and research on economics, partnering with over 40 research institutions to create a connective Open Access portal and social web of research. Through its EconStor and EconBiz, researchers and students have accessed millions of datasets and thousands of articles. The ZBW also edits two journals: Wirtschaftsdienst and Intereconomics.

A library consortium is any cooperative association of libraries that coordinates resources and/or activities on behalf of its members, whether they are academic, public, school or special libraries, and/or information centers. Library consortia have been created to service specific regions or geographic areas, e.g., local, state, regional, national or international. Many libraries commonly belong to multiple consortia. The goal of a library consortium is to amplify the capabilities and effectiveness of its member libraries through collective action, including, but not limited to, print or electronic resource sharing, reducing costs through group purchases of resources, and hosting professional development opportunities. The “bedrock principle upon which consortia operate is that libraries can accomplish more together than alone.”

<span class="mw-page-title-main">Dryad (repository)</span>

Dryad is an international open-access repository of research data, especially data underlying scientific and medical publications. Dryad is a curated general-purpose repository that makes data discoverable, freely reusable, and citable. The scientific, educational, and charitable mission of Dryad is to provide the infrastructure for and promote the re-use of scholarly research data.

Open scientific data or open research data is a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse. A major purpose of the drive for open data is to allow the verification of scientific claims, by allowing others to look at the reproducibility of results, and to allow data from many sources to be integrated to give new knowledge.

DuraSpace was a 501(c)(3) not-for-profit organization founded in 2009 with the merger of the Fedora Commons organization and the DSpace Foundation, two of the world's largest providers of open source digital repository software. In 2011, DuraSpace launched DuraCloud, an open source digital preservation software service. In July 2019, DuraSpace merged with Lyrasis, becoming a division of that organization.

The Harvard-MIT Data Center (HMDC) provides multi-disciplinary information technology support for social science research and education at Harvard and MIT. Established in the early 1960s the HMDC was meant to be the original data center for political and social science at Harvard University, and over time it has evolved into an information technology service provider that transcends many educational fields.

<span class="mw-page-title-main">University of Cape Town Libraries</span> Library system of the University of Cape Town

University of Cape Town Libraries is the library system of the University of Cape Town in Cape Town, South Africa.

<span class="mw-page-title-main">Zenodo</span> Research data repository

Zenodo is a general-purpose open repository developed under the European OpenAIRE program and operated by CERN. It allows researchers to deposit research papers, data sets, research software, reports, and any other research related digital artefacts. For each submission, a persistent digital object identifier (DOI) is minted, which makes the stored items easily citeable.

<span class="mw-page-title-main">Samvera</span>

Samvera, originally known as Hydra, is an open-source digital repository software product. Samvera main components are Fedora Commons, Solr, Blacklight, and HydraHead. Each Samvera implementation is called a "head".

<span class="mw-page-title-main">Open access in the Netherlands</span> Overview of the culture and regulation of open access in the Netherlands

Scholarly communication of the Netherlands published in open access form can be found by searching the National Academic Research and Collaboration Information System (NARCIS). The web portal was developed in 2004 by the Data Archiving and Networked Services of the Netherlands Organisation for Scientific Research and Royal Netherlands Academy of Arts and Sciences.

The Ontario Council of University Libraries (OCUL) is an academic library consortium of Ontario's 21 university libraries located in Toronto, Ontario, Canada. Formed in 1967, OCUL member institutions work together to maximize the expertise and resources of their institutions through shared services and projects. OCUL works together in a number of key areas of importance for library services, including collective content purchasing, shared digital infrastructure, external partnerships, and professional development initiatives.

COVID-19 datasets are public databases for sharing case data and medical information related to the COVID-19 pandemic.

Nesstar was a suite of data and metadata management software created in 2000 and maintained by the former Norwegian Social Science Data Services until its end-of-life in 2022. The Nesstar tool suite consisted of a Nesstar Repository, Nesstar WebView, a Nesstar Editor, and the Nesstar Explorer as the user interface.

Mercè Crosas is a researcher and technologist specializing in data science, data management, and open data. Since November 2023 she is President of CODATA, the Committee on Data of the International Science Council. Crosas is also Head of Computational Social Sciences at the Barcelona Supercomputing Center.

References

  1. Crosas, M. "The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data". D-Lib Magazine. Retrieved 27 May 2015.
  2. "About the Project". Dataverse.
  3. Chander, Sharat (September 16, 2019). "2019 Duke's Choice Award Winners!". Oracle. Archived from the original on February 3, 2021. Retrieved February 10, 2021.
  4. "History of the Project". About the Project. Retrieved 27 May 2015.
  5. "DataverseNL". dataverse.nl. Retrieved 2023-02-18.
  6. "DataverseNO". dataverse.no. Retrieved 2023-02-18.
  7. "Borealis: A New Name for Laurier's Dataverse Data Repository | Laurier Library". library.wlu.ca. Retrieved 2023-05-08.
  8. "Introducing Borealis, the Canadian Dataverse Repository / le dépôt Dataverse - Dataverse - SPOT-DOCS". spotdocs.scholarsportal.info. Retrieved 2023-05-08.
  9. "Network of Experts". Digital Research Alliance of Canada. Retrieved 2023-05-08.