Microattribution

Last updated

The term microattribution (a form of data citation) is defined as "giving database accessions the same citation conventions and indices that journal articles currently enjoy". [1] In the sense that the purpose of precise attribution is to extend the scholarly convention of giving citation credit, the provenance of a piece of scholarship (observation or data deposition) is recognized to give credit and priority to a preceding author. Microattribution is thus defined as "a scholarly contribution smaller than a journal article being ascribed to a particular author" or a small scholarly contribution being ascribed to a particular author. [2] Since data accessions can describe contributions that can vastly exceed research articles in size and quality, quantum attribution or precise citation might be better terms.

Contents

Origin

The concept was introduced in a February 2007 blog post "Duke of URL" by Myles Axton. "In the interests of giving credit to the resources geneticists find most useful, here are the numbers of papers citing the most frequently cited links."

The term was first used in an April 2007 editorial published in Nature Genetics . "[The Human Variome Project] will need to introduce publishing innovations at both ends of the citation spectrum. It will need to track the citation of each variant's accession code in papers, database entries and across the web. This closing of the online publication loop might be termed microattribution." [3]

Subsequent editorials and blog posts elaborated the idea that the provenance of data accession codes was inseparable from the data and could be used to give credit to the contributors. "Accession numbers to database entries are routinely used for data retrieval. They should now also be used to accrue quantitative credit for their authors in a systematic process of microattribution." [4]

An example of the value of microattributions can be seen in the description of genetic variation. A paper published in Nature Genetics paper in March 2011 [5] concluded that microattribution demonstrably increased the reporting of human variants, leading to a comprehensive online resource for systematically describing human genetic variation. A paper on microattribution and nanopublication as means to incentivize the placement of human genome variation data into the public domain was published in June 2012. [6]

Nanopublications

Barend Mons and Jan Velterop proposed nanopublications for single, attributable and machine-readable assertions in scientific literature. [7] From the technical viewpoint, a nanopublication is a Resource Description Framework (RDF) graph built around an assertion represented as a triple (subject-predicate-object) and usually extracted, manually or automatically, from a scientific publication. The nanopublication enriches the assertion with provenance and publication information. The RDF representation format enables interoperability and thus the re-use of data, whereas provenance and publication information eases authorship recognition, credit distribution, and citation. [8]

Related Research Articles

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

<span class="mw-page-title-main">Human variability</span> Range of possible values for any characteristic of human beings

Human variability, or human variation, is the range of possible values for any characteristic, physical or mental, of human beings.

<span class="mw-page-title-main">Single-nucleotide polymorphism</span> Single nucleotide in genomic DNA at which different sequence alternatives exist

In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population, many publications do not apply such a frequency threshold.

The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease and responses to drugs and environmental factors. The information produced by the project is made freely available for research.

<span class="mw-page-title-main">Wellcome Sanger Institute</span> British genomics research institute

The Wellcome Sanger Institute, previously known as The Sanger Centre and Wellcome Trust Sanger Institute, is a non-profit British genomics and genetics research institute, primarily funded by the Wellcome Trust.

<span class="mw-page-title-main">Copy number variation</span> Repeated DNA variation between individuals

Copy number variation (CNV) is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals. Copy number variation is a type of structural variation: specifically, it is a type of duplication or deletion event that affects a considerable number of base pairs. Approximately two-thirds of the entire human genome may be composed of repeats and 4.8–9.5% of the human genome can be classified as copy number variations. In mammals, copy number variations play an important role in generating necessary variation in the population as well as disease phenotype.

<span class="mw-page-title-main">Human genetic variation</span> Genetic diversity in human populations

Human genetic variation is the genetic differences in and among populations. There may be multiple variants of any given gene in the human population (alleles), a situation called polymorphism.

<span class="mw-page-title-main">Human Genome Project</span> Human genome sequencing programme

The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a physical and a functional standpoint. It started in 1990 and was completed in 2003. It remains the world's largest collaborative biological project. Planning for the project started after it was adopted in 1984 by the US government, and it officially launched in 1990. It was declared complete on April 14, 2003, and included about 92% of the genome. Level "complete genome" was achieved in May 2021, with a remaining only 0.3% bases covered by potential issues. The final gapless assembly was finished in January 2022.

The variome is the whole set of genetic variations found in populations of species that have gone through a relatively short evolution change. For example, among humans, about 1 in every 1,200 nucleotide bases differ. The size of human variome in terms of effective population size is claimed to be about 10,000 individuals. This variation rate is comparatively small compared to other species. For example, the effective population size of tigers which perhaps has the whole population size less than 10,000 in the wild is not much smaller than the human species indicating a much higher level of genetic diversity although they are close to extinction in the wild. In practice, the variome can be the sum of the single nucleotide polymorphisms (SNPs), indels, and structural variation (SV) of a population or species. The Human Variome Project seeks to compile this genetic variation data worldwide. Variomics is the study of variome and a branch of bioinformatics.

<span class="mw-page-title-main">Human Variome Project</span>

The Human Variome Project (HVP) is the global initiative to collect and curate all human genetic variation affecting human health. Its mission is to improve health outcomes by facilitating the unification of data on human genetic variation and its impact on human health.

<span class="mw-page-title-main">Barend Mons</span> Biologist and bioinformatics specialist

Barend Mons is a molecular biologist by training and a leading FAIR data specialist. The first decade of his scientific career he spent on fundamental research on malaria parasites and later on translational research for malaria vaccines. In the year 2000 he switched to advanced data stewardship and (biological) systems analytics. He is currently a professor in Leiden and most known for innovations in scholarly collaboration, especially nanopublications, knowledge graph based discovery and most recently the FAIR data initiative and GO FAIR. Since 2012 he is a Professor in biosemantics in the Department of Human Genetics at the Leiden University Medical Center (LUMC) in The Netherlands. In 2015 Barend was appointed chair of the High Level Expert Group on the European Open Science Cloud. Since 2017 Barend is heading the International Support and Coordination office of the GO FAIR initiative. He is also the elected president of CODATA, the standing committee on research data related issues of the International Science Council. Barend is a member of the Netherlands Academy of Technology and Innovation(ACTI). He is also the European representative in the Board on Research Data and Information (BRDI) of the National Academies of Science for engineering and medicine in the USA. Barend is a frequent keynote speaker about FAIR and open science around the world, and participates in various national and international boards.

<span class="mw-page-title-main">Genome-wide association study</span> Study of genetic variants in different individuals

In genomics, a genome-wide association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

<span class="mw-page-title-main">Centre for Arab Genomic Studies</span>

The Centre for Arab Genomic Studies (CAGS) is a not-for-profit study centre aimed at the characterization and prevention of genetic disorders in the Arab World. The Centre is closely associated with the Sheikh Hamdan Award for Medical Sciences. One of the major projects of CAGS is the Catalogue for Transmission Genetics in Arabs (CTGA), an online, freely accessible database of genetic disorders reported from the Arab World. CAGS has been involved in the Human Variome Project as a representative of the Arab region and has been one of the first organizations to take an active lead in working on the project. CAGS organizes the Pan Arab Human Genetics Conference every alternate year, to provide a platform for discussion and education on genetic issues in the region.

<span class="mw-page-title-main">1000 Genomes Project</span> International research effort on genetic variation

The 1000 Genomes Project (1KGP), taken place from January 2008 to 2015, was an international research effort to establish the most detailed catalogue of human genetic variation at the time. Scientists planned to sequence the genomes of at least one thousand anonymous healthy participants from a number of different ethnic groups within the following three years, using advancements in newly developed technologies. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journal Nature. In 2012, the sequencing of 1092 genomes was announced in a Nature publication. In 2015, two papers in Nature reported results and the completion of the project and opportunities for future research.

<span class="mw-page-title-main">Whole genome sequencing</span> Determining nearly the entirety of the DNA sequence of an organisms genome at a single time

Whole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA contained in the mitochondria and, for plants, in the chloroplast.

Behavioural genetics, also referred to as behaviour genetics, is a field of scientific research that uses genetic methods to investigate the nature and origins of individual differences in behaviour. While the name "behavioural genetics" connotes a focus on genetic influences, the field broadly investigates the extent to which genetic and environmental factors influence individual differences, and the development of research designs that can remove the confounding of genes and environment. Behavioural genetics was founded as a scientific discipline by Francis Galton in the late 19th century, only to be discredited through association with eugenics movements before and during World War II. In the latter half of the 20th century, the field saw renewed prominence with research on inheritance of behaviour and mental illness in humans, as well as research on genetically informative model organisms through selective breeding and crosses. In the late 20th and early 21st centuries, technological advances in molecular genetics made it possible to measure and modify the genome directly. This led to major advances in model organism research and in human studies, leading to new scientific discoveries.

Genomic structural variation is the variation in structure of an organism's chromosome, such as deletions, duplications, copy-number variants, insertions, inversions and translocations. Originally, a structure variation affects a sequence length about 1kb to 3Mb, which is larger than SNPs and smaller than chromosome abnormality. However, the operational range of structural variants has widened to include events > 50bp. Some structural variants are associated with genetic diseases, however most are not. Approximately 13% of the human genome is defined as structurally variant in the normal population, and there are at least 240 genes that exist as homozygous deletion polymorphisms in human populations, suggesting these genes are dispensable in humans. While humans carry a median of 3.6 Mbp in SNPs, a median of 8.9 Mbp is affected by structual variation which thus causes most genetic differences between humans in terms of raw sequence data.

In genetics, imputation is the statistical inference of unobserved genotypes. It is achieved by using known haplotypes in a population, for instance from the HapMap or the 1000 Genomes Project in humans, thereby allowing to test for association between a trait of interest and experimentally untyped genetic variants, but whose genotypes have been statistically inferred ("imputed"). Genotype imputation is usually performed on SNPs, the most common kind of genetic variation.

<span class="mw-page-title-main">Richard Cotton (geneticist)</span>

Richard Cotton AM was an Australian medical researcher and founder of the Murdoch Institute and the Human Variome Project. Cotton focused on the prevention and treatment of genetic disorders and birth defects.

<span class="mw-page-title-main">Complex traits</span>

Complex traits, also known as quantitative traits, are traits that do not behave according to simple Mendelian inheritance laws. More specifically, their inheritance cannot be explained by the genetic segregation of a single gene. Such traits show a continuous range of variation and are influenced by both environmental and genetic factors. Compared to strictly Mendelian traits, complex traits are far more common, and because they can be hugely polygenic, they are studied using statistical techniques such as quantitative genetics and quantitative trait loci (QTL) mapping rather than classical genetics methods. Examples of complex traits include height, circadian rhythms, enzyme kinetics, and many diseases including diabetes and Parkinson's disease. One major goal of genetic research today is to better understand the molecular mechanisms through which genetic variants act to influence complex traits.

References

  1. Axton, Myles (24 November 2007). "Towards a hermeneutics of quantum citation". Nature Research. Archived from the original on 15 August 2011. Retrieved 25 October 2011.
  2. "On microattribution". Gobbledygook . Retrieved 3 October 2011.
  3. "What is the Human Variome Project?". Nature Genetics . 39 (4): 423. 2007. doi: 10.1038/ng0407-423 . PMID   17392793. S2CID   28447607.
  4. "Compete, collaborate, compel". Nature Genetics . 39 (8): 931. 2007. doi: 10.1038/ng0807-931 . PMID   17660804. S2CID   38002242.
  5. Giardine, Belinda; et al. (April 2011). "Systematic documentation and analysis of human genetic variation in hemoglobinopathies using the microattribution approach, compel". Nature Genetics . 43 (4): 295–301. doi: 10.1038/ng.785 . PMC   3878152 . PMID   21423179. S2CID   733759.
  6. Patrinos, George P.; Cooper, David N.; Van Mulligen, Erik; Gkantouna, Vassiliki; Tzimas, Giannis; Tatum, Zuotian; Schultes, Erik; Roos, Marco; Mons, Barend (2012). "Microattribution and nanopublication as means to incentivize the placement of human genome variation data into the public domain". Human Mutation. 33 (11): 1503–1512. doi: 10.1002/humu.22144 . PMID   22736453.
  7. "Microattribution and nanopublication as means to incentivize the placement of human genome variation data into the public domain (PDF)" (PDF). NBIC (Netherlands Bioinformatics Centre). Archived from the original (PDF) on 23 May 2012. Retrieved 29 June 2012.
  8. Giachelle, Fabio; Dosso, Dennis; Silvello, Gianmaria (1 January 2021). "Search, access, and explore life science nanopublications on the Web". PeerJ Computer Science. 7: e335. doi: 10.7717/PEERJ-CS.335 . PMC   7959622 . PMID   33816986. Creative Commons by small.svg  This article incorporates text available under the CC BY 4.0 license.