Data management plan

Last updated

A data management plan or DMP is a formal document that outlines how data are to be handled both during a research project, and after the project is completed. [1] The goal of a data management plan is to consider the many aspects of data management, metadata generation, data preservation, and analysis before the project begins; [2] this may lead to data being well-managed in the present,[ citation needed ] and prepared for preservation in the future. [2]

Contents

DMPs were originally used in 1966 to manage aeronautical and engineering projects' data collection and analysis, and expanded across engineering and scientific disciplines in the 1970s and 1980s. Up until the early 2000s, DMPs were used "for projects of great technical complexity, and for limited mid-study data collection and processing purposes". [3] In the 2000s and later, E-research and economic policies drove the development and uptake of DMPs. [3]

Importance

Preparing a data management plan before data are collected is claimed to ensure that data are in the correct format, organized well, and better annotated. [4] This could arguably save time in the long term because there is no need to re-organize, re-format, or try to remember details about data. It is also claimed to increase research efficiency since both the data collector and other researchers might be able to understand and use well-annotated data in the future. One component of a data management plan is data archiving and preservation. By deciding on an archive ahead of time, the data collector can format data during collection to make its future submission to a database easier. If data are preserved, they are more relevant since they can be re-used by other researchers. It also allows the data collector to direct requests for data to the database, rather than address requests individually. A frequent argument in favor of preservation is that data that are preserved have the potential to lead to new, unanticipated discoveries, and they prevent duplication of scientific studies that have already been conducted. Data archiving also provides insurance against loss by the data collector.

In the 2010s, [3] funding agencies increasingly required data management plans as part of the proposal and evaluation process, [5] despite little or no evidence of their efficacy. [3]

Major components

"There is no general and definitive list of topics that should be covered in a DMP for a research project", [6] and researchers are often left to their own devices as to how to fill out a DMP. [2]

Information about data & data format

Metadata content and format

Metadata are the contextual details, including any information important for using data. This may include descriptions of temporal and spatial details, instruments, parameters, units, files, etc. Metadata is commonly referred to as “data about data”. [10] Issues to be considered include:

Policies for access, sharing, and re-use

Long-term storage and data management

Budget

Data management and preservation costs may be considerable, depending on the nature of the project. By anticipating costs ahead of time, researchers ensure that the data will be properly managed and archived. Potential expenses that should be considered are

The data management plan should include how these costs will be paid.

NSF Data Management Plan

All grant proposals submitted to National Science Foundation (NSF) must include a Data Management Plan that is no more than two pages. [11] This is a supplement (not part of the 15-page proposal) and should describe how the proposal will conform to the Award and Administration Guide policy (see below). It may include the following:

  1. The types of data
  2. The standards to be used for data and metadata format and content
  3. Policies for access and sharing
  4. Policies and provisions for re-use
  5. Plans for archiving data

Policy summarized from the NSF Award and Administration Guide, Section 4 (Dissemination and Sharing of Research Results): [12]

  1. Promptly publish with appropriate authorship
  2. Share data, samples, physical collections, and supporting materials with others, within a reasonable time frame
  3. Share software and inventions
  4. Investigators can keep their legal rights over their intellectual property, but they still have to make their results, data, and collections available to others
  5. Policies will be implemented via
    1. Proposal review
    2. Award negotiations and conditions
    3. Support/incentives

ESRC Data Management Plan

Since 1995, the UK's Economic and Social Research Council (ESRC) have had a research data policy in place. The current ESRC Research Data Policy states that research data created as a result of ESRC-funded research should be openly available to the scientific community to the maximum extent possible, through long-term preservation and high-quality data management. [13]

ESRC requires a data management plan for all research award applications where new data are being created. Such plans are designed to promote a structured approach to data management throughout the data lifecycle, resulting in better quality data that is ready to archive for sharing and re-use. The UK Data Service, the ESRC's flagship data service, provides practical guidance on research data management planning suitable for social science researchers in the UK and around the world. [14] [15]

ESRC has a longstanding arrangement with the UK Data Archive, based at the University of Essex, as a place of deposit for research data, with award holders required to offer data resulting from their research grants via the UK Data Service. [16] The Archive enables data re-use by preserving data and making them available to the research and teaching communities.

Benefits

There are three major themes identified in the literature in terms of benefits of DMPs: professional benefits, economic benefits and institutional benefits. [3] It has been argued that DMPs can form a catalyst for researchers to improve their data literacy and data management practices, often aided by the library. [3]

In practice

In practice, however, DMPs often fall short of their stated goals. A 2012 review of DMP policies by research funders found that policies were missing several elements from the Digital Curation Centre's list of criteria for a DMP. [17] Researchers shared DMP text. [18] DMPs are often regarded as an "administrative exercise rather than an integral part" of the research process, [19] and it has been acknowledged that DMPs do not guarantee good data management practices. [20] Most funders do not require a DMP after grants are awarded, thus robbing stakeholders of the powerful tool that an active DMP can be. Best practice would be to "require maintenance of the data management plan following award and during the active phase of a study." [6] At present, data sharing plans are more important than data management plans to funders. [6]

See also

Related Research Articles

<span class="mw-page-title-main">National Science Digital Library</span>

National Science Digital Library (NSDL) of the United States is an open-access online digital library and collaborative network of disciplinary and grade-level focused education providers operated by the Institute for the Study of Knowledge Management in Education. NSDL's mission is to provide quality digital learning collections to the science, technology, engineering, and mathematics (STEM) education community, both formal and informal, institutional and individual. NSDL's collections are refined by a network of STEM educational and disciplinary professionals. Their work is based on user data, disciplinary knowledge, and participation in the evolution of digital resources as major elements of effective STEM learning.

Ecoinformatics, or ecological informatics, is the science of information in ecology and environmental science. It integrates environmental and information sciences to define entities and natural processes with language common to both humans and computers. However, this is a rapidly developing area in ecology and there are alternative perspectives on what constitutes ecoinformatics.

E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. E-science has been more broadly interpreted since then, as "the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints, and print and/or electronic publications." In 2014, IEEE eScience Conference Series condensed the definition to "eScience promotes innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle" in one of the working definitions used by the organizers. E-science encompasses "what is often referred to as big data [which] has revolutionized science... [such as] the Large Hadron Collider (LHC) at CERN... [that] generates around 780 terabytes per year... highly data intensive modern fields of science...that generate large amounts of E-science data include: computational biology, bioinformatics, genomics" and the human digital footprint for the social sciences.

In library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and application of preservation methods and technologies, and combines policies, strategies and actions to ensure access to reformatted and "born-digital" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time.

The Earth System Modeling Framework (ESMF) is open-source software for building climate, numerical weather prediction, data assimilation, and other Earth science software applications. These applications are computationally demanding and usually run on supercomputers. The ESMF is considered a technical layer, integrated into a sophisticated common modeling infrastructure for interoperability. Other aspects of interoperability and shared infrastructure include: common experimental protocols, common analytic methods, common documentation standards for data and data provenance, shared workflow, and shared model components.

<span class="mw-page-title-main">Archival science</span> Science of storage, registration and preservation of historical data

Archival science, or archival studies, is the study and theory of building and curating archives, which are collections of documents, recordings, photographs and various other materials in physical or digital formats.

The California Digital Library (CDL) was founded by the University of California in 1997. Under the leadership of then UC President Richard C. Atkinson, the CDL's original mission was to forge a better system for scholarly information management and improved support for teaching and research. In collaboration with the ten University of California Libraries and other partners, CDL assembled one of the world's largest digital research libraries. CDL facilitates the licensing of online materials and develops shared services used throughout the UC system. Building on the foundations of the Melvyl Catalog, CDL has developed one of the largest online library catalogs in the country and works in partnership with the UC campuses to bring the treasures of California's libraries, museums, and cultural heritage organizations to the world. CDL continues to explore how services such as digital curation, scholarly publishing, archiving and preservation support research throughout the information lifecycle.

The Digital Curation Centre (DCC) was established to help solve the extensive challenges of digital preservation and digital curation and to lead research, development, advice, and support services for higher education institutions in the United Kingdom.

<span class="mw-page-title-main">UK Data Archive</span>

The UK Data Archive is a national centre of expertise in data archiving in the United Kingdom. It houses the largest collection of social sciences and population digital data in the UK. It is certified under CoreTrustSeal as a trusted digital repository. It is also certified under the international ISO 27001 standard for information security. Located in Colchester, the UK Data Archive is a specialist department of the University of Essex, co-located with the Institute for Social and Economic Research (ISER). It is primarily funded by the Economic and Social Research Council (ESRC) and the University of Essex.

Research data archiving is the long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences. The various academic journals have differing policies regarding how much of their data and methods researchers are required to store in a public archive, and what is actually archived varies widely between different disciplines. Similarly, the major grant-giving institutions have varying attitudes towards public archiving of data. In general, the tradition of science has been for publications to contain sufficient information to allow fellow researchers to replicate and therefore test the research. In recent years this approach has become increasingly strained as research in some areas depends on large datasets which cannot easily be replicated independently.

DataNet, or Sustainable Digital Data Preservation and Access Network Partner, was a research program of the U.S. National Science Foundation Office of Cyberinfrastructure. The office announced a request for proposals with this title on September 28, 2007. The lead paragraph of its synopsis describes the program as:

Science and engineering research and education are increasingly digital and increasingly data-intensive. Digital data are not only the output of research but provide input to new hypotheses, enabling new scientific insights and driving innovation. Therein lies one of the major challenges of this scientific generation: how to develop the new methods, management structures and technologies to manage the diversity, size, and complexity of current and future data sets and data streams. This solicitation addresses that challenge by creating a set of exemplar national and global data research infrastructure organizations that provide unique opportunities to communities of researchers to advance science and/or engineering research and learning.

Preservation metadata is item level information that describes the context and structure of a digital object. It provides background details pertaining to a digital object's provenance, authenticity, and environment. Preservation metadata, is a specific type of metadata that works to maintain a digital object's viability while ensuring continued access by providing contextual information, usage details, and rights.

Digital curation is the selection, preservation, maintenance, collection, and archiving of digital assets. Digital curation establishes, maintains, and adds value to repositories of digital data for present and future use. This is often accomplished by archivists, librarians, scientists, historians, and scholars. Enterprises are starting to use digital curation to improve the quality of information and data within their operational and strategic processes. Successful digital curation will mitigate digital obsolescence, keeping the information accessible to users indefinitely. Digital curation includes digital asset management, data curation, digital preservation, and electronic records management.

<span class="mw-page-title-main">DataONE</span> International federation of data repositories

DataONE is a network of interoperable data repositories facilitating data sharing, data discovery, and open science. Originally supported by $21.2 million in funding from the US National Science Foundation as one of the initial DataNet programs in 2009, funding was renewed in 2014 through 2020 with an additional $15 million. DataONE helps preserve, access, use, and reuse of multi-discipline scientific data through the construction of primary cyberinfrastructure and an education and outreach program. DataONE provides scientific data archiving for ecological and environmental data produced by scientists. DataONE's goal is to preserve and provide access to multi-scale, multi-discipline, and multi-national data. Users include scientists, ecosystem managers, policy makers, students, educators, librarians, and the public.

Open scientific data or open research data is a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse. A major purpose of the drive for open data is to allow the verification of scientific claims, by allowing others to look at the reproducibility of results, and to allow data from many sources to be integrated to give new knowledge.

The UK Data Service is the largest digital repository for quantitative and qualitative social science and humanities research data in the United Kingdom. The organisation is funded by the UK government through the Economic and Social Research Council and is led by the UK Data Archive at the University of Essex, in partnership with other universities.

<span class="mw-page-title-main">National Recording Preservation Plan</span>

The National Recording Preservation Plan is a strategic guide for the preservation of sound recordings in the United States. It was published in December 2012 by the Council on Library and Information Resources (CLIR) and the National Recording Preservation Board of the Library of Congress. The plan was written by a community of specialists, but is prominently credited to Brenda Nelson-Strauss, Alan Gevinson and Sam Brylawski

<span class="mw-page-title-main">Cultural property documentation</span> Aspect of collections care

The documentation of cultural property is a critical aspect of collections care. As stewards of cultural property, museums collect and preserve not only objects but the research and documentation connected to those objects, in order to more effectively care for them. Documenting cultural heritage is a collaborative effort. Essentially, registrars, collection managers, conservators, and curators all contribute to the task of recording and preserving information regarding collections. There are two main types of documentation museums are responsible for: records generated in the registration process—accessions, loans, inventories, etc. and information regarding research on objects and their historical significance. Properly maintaining both types of documentation is vital to preserving cultural heritage.

Data preservation is the act of conserving and maintaining both the safety and integrity of data. Preservation is done through formal activities that are governed by policies, regulations and strategies directed towards protecting and prolonging the existence and authenticity of data and its metadata. Data can be described as the elements or units in which knowledge and information is created, and metadata are the summarizing subsets of the elements of data; or the data about the data. The main goal of data preservation is to protect data from being lost or destroyed and to contribute to the reuse and progression of the data.

<span class="mw-page-title-main">Audiovisual archive</span>

In archives, the term "audiovisual" is frequently used generically to denote materials other than written documents. Films, videos, audio recordings, pictures, and other audio and visual media are collected in audiovisual archives. A vast amount of knowledge is included in audiovisual records, which are considered cultural treasures and must be preserved for future use. Print materials would not have the same reach across various audiences as audiovisual resources.

References

  1. "Data Management Plan". University of Virginia Library. Archived from the original on Nov 9, 2012.
  2. 1 2 3 Burnette, Margaret; Williams, Sarah; Imker, Heidi (16 September 2016). "From Plan to Action: Successful Data Management Plan Implementation in a Multidisciplinary Project". Journal of eScience Librarianship. 5 (1): e1101. doi: 10.7191/jeslib.2016.1101 .
  3. 1 2 3 4 5 6 Smale, Nicholas; Unsworth, Kathryn; Denyer, Gareth; Barr, Daniel (17 October 2018). "The History, Advocacy and Efficacy of Data Management Plans". bioRxiv : 443499. doi:10.1101/443499. S2CID   91931719.
  4. "Why manage & share your data? - Data management". libraries.mit.edu.
  5. "Data Management & Sharing Frequently Asked Questions (FAQs)". Archived from the original on 2017-07-11. Retrieved 2018-04-06.
  6. 1 2 3 Williams, Mary; Bagwell, Jacqueline; Nahm Zozus, Meredith (July 2017). "Data management plans: the missing perspective". Journal of Biomedical Informatics . 71: 130–142. doi:10.1016/j.jbi.2017.05.004. PMC   6697079 . PMID   28499952.
  7. "Elements of a Data Management Plan". www.icpsr.umich.edu. Retrieved 2015-09-30.
  8. "Archived copy" (PDF). libraries.mit.edu. Archived from the original (PDF) on 4 May 2018. Retrieved 12 January 2022.{{cite web}}: CS1 maint: archived copy as title (link)
  9. Guns, Raf. "Tools for version control of research data" (PDF). University of Antwerp .
  10. Michener,WK and JW Brunt. 2000. Ecological Data: Design, Management and Processing. Blackwell Science, 180p.
  11. "GPG Chapter II". www.nsf.gov.
  12. "Dissemination and Sharing of Research Results - NSF - National Science Foundation". www.nsf.gov.
  13. ESRC Research Data Policy 2010
  14. Prepare and manage data: Guidance from the UK Data Service
  15. "Managing and Sharing Research Data - SAGE Publications Inc". www.sagepub.com. Archived from the original on 2014-04-07. Retrieved 2014-04-01.
  16. "UK Data Archive - WHO CAN DEPOSIT?". www.data-archive.ac.uk.
  17. Dietrich, Dianne; Adamus, Trisha; Miner, Alison; Steinhart, Gail (2012). "De-Mystifying the Data Management Requirements of Research Funders". Issues in Science and Technology Librarianship . 70 (70). doi:10.5062/F44M92G2.
  18. Parham, Susan Wells; Doty, Chris (October 2012). "NSF DMP content analysis: What are researchers saying?". Bulletin of the American Society for Information Science and Technology . 39 (1): 37–38. doi:10.1002/bult.2012.1720390113. hdl: 1853/44391 .
  19. Miksa, Tomasz; Simms, Stephanie; Mietchen, Daniel; Jones, Sarah (28 March 2019). "Ten principles for machine-actionable data management plans". PLOS Computational Biology . 15 (3): e1006750. Bibcode:2019PLSCB..15E6750M. doi: 10.1371/journal.pcbi.1006750 . PMC   6438441 . PMID   30921316. S2CID   85563774.
  20. Donelly, Martin (2012). "Data management plans and planning". In Pryor, Graham (ed.). Managing research data. London: Facet Publishing. pp. 83–104. ISBN   9781856048910.

Further reading

Pryor, Graham (2014). Delivering research data management services. Facet Publishing. ISBN   9781856049337.