A data management plan or DMP is a formal document that outlines how data are to be handled both during a research project, and after the project is completed. [1] The goal of a data management plan is to consider the many aspects of data management, metadata generation, data preservation, and analysis before the project begins; [2] this may lead to data being well-managed in the present,[ citation needed ] and prepared for preservation in the future. [2]
DMPs were originally used in 1966 to manage aeronautical and engineering projects' data collection and analysis, and expanded across engineering and scientific disciplines in the 1970s and 1980s. Up until the early 2000s, DMPs were used "for projects of great technical complexity, and for limited mid-study data collection and processing purposes". [3] In the 2000s and later, E-research and economic policies drove the development and uptake of DMPs. [3]
Preparing a data management plan before data are collected is claimed to ensure that data are in the correct format, organized well, and better annotated. [4] This could arguably save time in the long term because there is no need to re-organize, re-format, or try to remember details about data. It is also claimed to increase research efficiency since both the data collector and other researchers might be able to understand and use well-annotated data in the future. One component of a data management plan is data archiving and preservation. By deciding on an archive ahead of time, the data collector can format data during collection to make its future submission to a database easier. If data are preserved, they are more relevant since they can be re-used by other researchers. It also allows the data collector to direct requests for data to the database, rather than address requests individually. A frequent argument in favor of preservation is that data that are preserved have the potential to lead to new, unanticipated discoveries, and they prevent duplication of scientific studies that have already been conducted. Data archiving also provides insurance against loss by the data collector.
In the 2010s, [3] funding agencies increasingly required data management plans as part of the proposal and evaluation process, [5] despite little or no evidence of their efficacy. [3]
"There is no general and definitive list of topics that should be covered in a DMP for a research project", [6] and researchers are often left to their own devices as to how to fill out a DMP. [2]
Metadata are the contextual details, including any information important for using data. This may include descriptions of temporal and spatial details, instruments, parameters, units, files, etc. Metadata is commonly referred to as “data about data”. [10] Issues to be considered include:
Data management and preservation costs may be considerable, depending on the nature of the project. By anticipating costs ahead of time, researchers ensure that the data will be properly managed and archived. Potential expenses that should be considered are
The data management plan should include how these costs will be paid.
All grant proposals submitted to National Science Foundation (NSF) must include a Data Management Plan that is no more than two pages. [11] This is a supplement (not part of the 15-page proposal) and should describe how the proposal will conform to the Award and Administration Guide policy (see below). It may include the following:
Policy summarized from the NSF Award and Administration Guide, Section 4 (Dissemination and Sharing of Research Results): [12]
Since 1995, the UK's Economic and Social Research Council (ESRC) have had a research data policy in place. The current ESRC Research Data Policy states that research data created as a result of ESRC-funded research should be openly available to the scientific community to the maximum extent possible, through long-term preservation and high-quality data management. [13]
ESRC requires a data management plan for all research award applications where new data are being created. Such plans are designed to promote a structured approach to data management throughout the data lifecycle, resulting in better quality data that is ready to archive for sharing and re-use. The UK Data Service, the ESRC's flagship data service, provides practical guidance on research data management planning suitable for social science researchers in the UK and around the world. [14] [15]
ESRC has a longstanding arrangement with the UK Data Archive, based at the University of Essex, as a place of deposit for research data, with award holders required to offer data resulting from their research grants via the UK Data Service. [16] The Archive enables data re-use by preserving data and making them available to the research and teaching communities.
There are three major themes identified in the literature in terms of benefits of DMPs: professional benefits, economic benefits and institutional benefits. [3] It has been argued that DMPs can form a catalyst for researchers to improve their data literacy and data management practices, often aided by the library. [3]
In practice, however, DMPs often fall short of their stated goals. A 2012 review of DMP policies by research funders found that policies were missing several elements from the Digital Curation Centre's list of criteria for a DMP. [17] Researchers shared DMP text. [18] DMPs are often regarded as an "administrative exercise rather than an integral part" of the research process, [19] and it has been acknowledged that DMPs do not guarantee good data management practices. [20] Most funders do not require a DMP after grants are awarded, thus robbing stakeholders of the powerful tool that an active DMP can be. Best practice would be to "require maintenance of the data management plan following award and during the active phase of a study." [6] At present, data sharing plans are more important than data management plans to funders. [6]
National Science Digital Library (NSDL) of the United States is an open-access online digital library and collaborative network of disciplinary and grade-level focused education providers operated by the Institute for the Study of Knowledge Management in Education. NSDL's mission is to provide quality digital learning collections to the science, technology, engineering, and mathematics (STEM) education community, both formal and informal, institutional and individual. NSDL's collections are refined by a network of STEM educational and disciplinary professionals. Their work is based on user data, disciplinary knowledge, and participation in the evolution of digital resources as major elements of effective STEM learning.
Ecoinformatics, or ecological informatics, is the science of information in ecology and environmental science. It integrates environmental and information sciences to define entities and natural processes with language common to both humans and computers. However, this is a rapidly developing area in ecology and there are alternative perspectives on what constitutes ecoinformatics.
E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. E-science has been more broadly interpreted since then, as "the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints, and print and/or electronic publications." In 2014, IEEE eScience Conference Series condensed the definition to "eScience promotes innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle" in one of the working definitions used by the organizers. E-science encompasses "what is often referred to as big data [which] has revolutionized science... [such as] the Large Hadron Collider (LHC) at CERN... [that] generates around 780 terabytes per year... highly data intensive modern fields of science...that generate large amounts of E-science data include: computational biology, bioinformatics, genomics" and the human digital footprint for the social sciences.
In library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and application of preservation methods and technologies, and combines policies, strategies and actions to ensure access to reformatted and "born-digital" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time.
Archival science, or archival studies, is the study and theory of building and curating archives, which are collections of documents, recordings, photographs and various other materials in physical or digital formats.
The California Digital Library (CDL) was founded by the University of California in 1997. Under the leadership of then UC President Richard C. Atkinson, the CDL's original mission was to forge a better system for scholarly information management and improved support for teaching and research. In collaboration with the ten University of California Libraries and other partners, CDL assembled one of the world's largest digital research libraries. CDL facilitates the licensing of online materials and develops shared services used throughout the UC system. Building on the foundations of the Melvyl Catalog, CDL has developed one of the largest online library catalogs in the country and works in partnership with the UC campuses to bring the treasures of California's libraries, museums, and cultural heritage organizations to the world. CDL continues to explore how services such as digital curation, scholarly publishing, archiving and preservation support research throughout the information lifecycle.
The Digital Curation Centre (DCC) was established to help solve the extensive challenges of digital preservation and digital curation and to lead research, development, advice, and support services for higher education institutions in the United Kingdom.
The UK Data Archive is a national centre of expertise in data archiving in the United Kingdom. It houses the largest collection of social sciences and population digital data in the UK. It is certified under CoreTrustSeal as a trusted digital repository. It is also certified under the international ISO 27001 standard for information security. Located in Colchester, the UK Data Archive is a specialist department of the University of Essex, co-located with the Institute for Social and Economic Research (ISER). It is primarily funded by the Economic and Social Research Council (ESRC) and the University of Essex.
Research data archiving is the long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences. The various academic journals have differing policies regarding how much of their data and methods researchers are required to store in a public archive, and what is actually archived varies widely between different disciplines. Similarly, the major grant-giving institutions have varying attitudes towards public archiving of data. In general, the tradition of science has been for publications to contain sufficient information to allow fellow researchers to replicate and therefore test the research. In recent years this approach has become increasingly strained as research in some areas depends on large datasets which cannot easily be replicated independently.
DataNet, or Sustainable Digital Data Preservation and Access Network Partner, was a research program of the U.S. National Science Foundation Office of Cyberinfrastructure. The office announced a request for proposals with this title on September 28, 2007. The lead paragraph of its synopsis describes the program as:
Science and engineering research and education are increasingly digital and increasingly data-intensive. Digital data are not only the output of research but provide input to new hypotheses, enabling new scientific insights and driving innovation. Therein lies one of the major challenges of this scientific generation: how to develop the new methods, management structures and technologies to manage the diversity, size, and complexity of current and future data sets and data streams. This solicitation addresses that challenge by creating a set of exemplar national and global data research infrastructure organizations that provide unique opportunities to communities of researchers to advance science and/or engineering research and learning.
Preservation metadata is item level information that describes the context and structure of a digital object. It provides background details pertaining to a digital object's provenance, authenticity, and environment. Preservation metadata, is a specific type of metadata that works to maintain a digital object's viability while ensuring continued access by providing contextual information, usage details, and rights.
Digital curation is the selection, preservation, maintenance, collection, and archiving of digital assets. Digital curation establishes, maintains, and adds value to repositories of digital data for present and future use. This is often accomplished by archivists, librarians, scientists, historians, and scholars. Enterprises are starting to use digital curation to improve the quality of information and data within their operational and strategic processes. Successful digital curation will mitigate digital obsolescence, keeping the information accessible to users indefinitely. Digital curation includes digital asset management, data curation, digital preservation, and electronic records management.
Data curation is the organization and integration of data collected from various sources. It involves annotation, publication and presentation of the data so that the value of the data is maintained over time, and the data remains available for reuse and preservation. Data curation includes "all the processes needed for principled and controlled data creation, maintenance, and management, together with the capacity to add value to data". In science, data curation may indicate the process of extraction of important information from scientific texts, such as research articles by experts, to be converted into an electronic format, such as an entry of a biological database.
DataONE is a network of interoperable data repositories facilitating data sharing, data discovery, and open science. Originally supported by $21.2 million in funding from the US National Science Foundation as one of the initial DataNet programs in 2009, funding was renewed in 2014 through 2020 with an additional $15 million. DataONE helps preserve, access, use, and reuse of multi-discipline scientific data through the construction of primary cyberinfrastructure and an education and outreach program. DataONE provides scientific data archiving for ecological and environmental data produced by scientists. DataONE's goal is to preserve and provide access to multi-scale, multi-discipline, and multi-national data. Users include scientists, ecosystem managers, policy makers, students, educators, librarians, and the public.
Open scientific data or open research data is a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse. A major purpose of the drive for open data is to allow the verification of scientific claims, by allowing others to look at the reproducibility of results, and to allow data from many sources to be integrated to give new knowledge.
The UK Data Service is the largest digital repository for quantitative and qualitative social science and humanities research data in the United Kingdom. The organisation is funded by the UK government through the Economic and Social Research Council and is led by the UK Data Archive at the University of Essex, in partnership with other universities.
The National Recording Preservation Plan is a strategic guide for the preservation of sound recordings in the United States. It was published in December 2012 by the Council on Library and Information Resources (CLIR) and the National Recording Preservation Board of the Library of Congress. The plan was written by a community of specialists, but is prominently credited to Brenda Nelson-Strauss, Alan Gevinson and Sam Brylawski
The documentation of cultural property is a critical aspect of collections care. As stewards of cultural property, museums collect and preserve not only objects but the research and documentation connected to those objects, in order to more effectively care for them. Documenting cultural heritage is a collaborative effort. Essentially, registrars, collection managers, conservators, and curators all contribute to the task of recording and preserving information regarding collections. There are two main types of documentation museums are responsible for: records generated in the registration process—accessions, loans, inventories, etc. and information regarding research on objects and their historical significance. Properly maintaining both types of documentation is vital to preserving cultural heritage.
Data preservation is the act of conserving and maintaining both the safety and integrity of data. Preservation is done through formal activities that are governed by policies, regulations and strategies directed towards protecting and prolonging the existence and authenticity of data and its metadata. Data can be described as the elements or units in which knowledge and information is created, and metadata are the summarizing subsets of the elements of data; or the data about the data. The main goal of data preservation is to protect data from being lost or destroyed and to contribute to the reuse and progression of the data.
In archives, the term "audiovisual" is frequently used generically to denote materials other than written documents. Films, videos, audio recordings, pictures, and other audio and visual media are collected in audiovisual archives. A vast amount of knowledge is included in audiovisual records, which are considered cultural treasures and must be preserved for future use. Print materials would not have the same reach across various audiences as audiovisual resources.
{{cite web}}
: CS1 maint: archived copy as title (link)Pryor, Graham (2014). Delivering research data management services. Facet Publishing. ISBN 9781856049337.