Minimum information standard

Last updated

Minimum information standards are sets of guidelines and formats for reporting data derived by specific high-throughput methods. Their purpose is to ensure the data generated by these methods can be easily verified, analysed and interpreted by the wider scientific community. Ultimately, they facilitate the transfer of data from journal articles (unstructured data) into databases (structured data) in a form that enables data to be mined across multiple data sets. Minimal information standards are available for a vast variety of experiment types including microarray (MIAME), RNAseq (MINSEQE), metabolomics (MSI) and proteomics (MIAPE). [1]

Contents

Minimum information standards typically have two parts. Firstly, there is a set of reporting requirements – typically presented as a table or a checklist. Secondly, there is a data format. Information about an experiment needs to be converted into the appropriate data format for it to be submitted to the relevant database. In the case of MIAME, the data format is provided in spreadsheet format (MAGE-TAB). Some of the communities that maintain minimum information standards also provide tools to help experimental researchers to annotate their data. [1]

MI Standards

The individual minimum information standards are brought by the communities of cross-disciplinary specialists focused on the problematic of the specific method used in experimental biology. The standards then provide specifications what information about the experiments (metadata) is crucial and important to be reported together with the resultant data to make it comprehensive. [2] [3] The need for this standardization is largely driven by the development of high-throughput experimental methods that provide tremendous amounts of data. The development of minimum information standards of different methods is since 2008 being harmonized by "Minimum Information about a Biomedical or Biological Investigation" (MIBBI) project. [4]

MIAPPE, Minimum Information About a Plant Phenotyping Experiment

MIAPPE is an open, community driven project to harmonize data from plant phenotyping experiments. MIAPPE comprises both a conceptual checklist of metadata required to adequately describe a plant phenotyping experiment.

MIQE, Minimum Information for Publication of Quantitative Real-Time PCR Experiments

Published in 2009 these guidelines for the basis of requirements by many journals when submitting QPCR data, sadly they are not adhered to enough. [5]

MIAME, gene expression microarray

Minimum Information About a Microarray Experiment (MIAME) [3] describes the Minimum Information About a Microarray Experiment that is needed to enable the interpretation of the results of the experiment unambiguously and potentially to reproduce the experiment and is aimed at facilitating the dissemination of data from microarray experiments. It was published by the FGED Society in 2001 and was the first published minimum information standard for high-throughput experiments in the life sciences.

MIAME contains a number of extensions to cover specific biological domains, including MIAME-env, MIAME-nut and MIAME-tox, covering environmental genomics, nutritional genomics and toxogenomics, respectively.

MINI: Minimum Information about a Neuroscience Investigation

MINI: Electrophysiology

Electrophysiology is a technology used to study the electrical properties of biological cells and tissues. Electrophysiology typically involves the measurements of voltage change or electric current flow on a wide variety of scales from single ion channel proteins to whole tissues. This document is a single module, as part of the Minimum Information about a Neuroscience investigation (MINI) family of reporting guideline documents, produced by community consultation and continually available for public comment. A MINI module represents the minimum information that should be reported about a dataset to facilitate computational access and analysis to allow a reader to interpret and critically evaluate the processes performed and the conclusions reached, and to support their experimental corroboration. In practice a MINI module comprises a checklist of information that should be provided (for example about the protocols employed) when a data set is described for publication. The full specification of the MINI module can be found here. [6]

MIARE, RNAi experiment

Minimum Information About an RNAi Experiment (MIARE) is a data reporting guideline which describes the minimum information that should be reported about an RNAi experiment to enable the unambiguous interpretation and reproduction of the results.

MIACA, cell based assay

Advances in genomics and functional genomics have enabled large-scale analyses of gene and protein function by means of high-throughput cell biological analyses. Thereby, cells in culture can be perturbed in vitro and the induced effects recorded and analyzed. Perturbations can be triggered in several ways, for instance with molecules (siRNAs, expression constructs, small chemical compounds, ligands for receptors, etc.), through environmental stresses (such as temperature shift, serum starvation, oxygen deprivation, etc.), or combinations thereof. The cellular responses to such perturbations are analyzed in order to identify molecular events in the biological processes addressed and understand biological principles. We propose the Minimum Information About a Cellular Assay (MIACA) for reporting a cellular assay, and CA-OM, the modular cellular assay object model, to facilitate exchange of data and accompanying information, and to compare and integrate data that originate from different, albeit complementary approaches, and to elucidate higher order principles. Documents describing MIACA are available and provide further information as well as the checklist of terms that should be reported.

MIAPE, proteomic experiments

The Minimum Information About a Proteomic Experiment documents describe information which should be given along with a proteomic experiment. The parent document describes the processes and principles underpinning the development of a series of domain specific documents which now cover all aspects of a MS-based proteomics workflow.

MIMIx, molecular interactions

This document has been developed and maintained by the Molecular Interaction worktrack of the HUPO-PSI (www.psidev.info) and describes the Minimum Information about a Molecular Interaction experiment.

MIAPAR, protein affinity reagents

The Minimum Information About a Protein Affinity Reagent has been developed and maintained by the Molecular Interaction worktrack of the HUPO-PSI (www.psidev.info)in conjunction with the HUPO Antibody Initiative and a European consortium of binder producers and seeks to encourage users to improve their description of binding reagents, such as antibodies, used in the process of protein identification.

MIABE, bioactive entities

The Minimum Information About a Bioactive Entity was produced by representatives from both large pharma and academia who are looking to improve the description of usually small molecules which bind to, and potentially modulate the activity of, specific targets in a living organism. This document encompasses drug-like molecules as well as herbicides, pesticides and food additives. It is primarily maintained through the EMBL-EBI Industry program (www.ebi.ac.uk/industry).

MIGS/MIMS, genome/metagenome sequences

This specification is being developed by the Genomic Standards Consortium

MIFlowCyt, flow cytometry

Minimum Information about a Flow Cytometry Experiment

The Minimum Information about a Flow Cytometry Experiment (MIFlowCyt) is a standard related to flow cytometry which establishes criteria to record information on experimental overview, samples, instrumentation and data analysis. [2] It promotes consistent annotation of clinical, biological and technical issues surrounding a flow cytometry experiment. [2] [7] [8]

MISFISHIE, In Situ Hybridization and Immunohistochemistry Experiments

MIAPA, Phylogenetic Analysis

Criteria for Minimum Information About a Phylogenetic Analysis were described in 2006. [9]

MIRAGE, Glycomics

The MIRAGE project is supported and coordinated by the Beilstein-Institut to establish guidelines for data handling and processing in glycomics research [10] [11]

MIAO, ORF

MIAMET, METabolomics experiment

MIAFGE, Functional Genomics Experiment

MIRIAM, Minimum Information Required in the Annotation of Models

The Minimal Information Required In the Annotation of Models (MIRIAM), is a set of rules for the curation and annotation of quantitative models of biological systems.

MIASE, Minimum Information About a Simulation Experiment

The Minimum Information About a Simulation Experiment (MIASE) is an effort to standardize the description of simulation experiments in the field of systems biology.

CIMR, Core Information for Metabolomics Reporting

STRENDA, Standards for Reporting Enzymology Data

The Standards for Reporting Enzymology Data (STRENDA) is an initiative which specifically focuses on the development of guidelines for reporting (describing metadata) enzymology experiments with the aim to improve the quality of enzymology data published in the scientific literature.

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combines biology, chemistry, physics, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data. Bioinformatics has been used for in silico analyses of biological queries using computational and statistical techniques.

<span class="mw-page-title-main">DNA microarray</span> Collection of microscopic DNA spots attached to a solid surface

A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Each DNA spot contains picomoles of a specific DNA sequence, known as probes. These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA sample under high-stringency conditions. Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target. The original nucleic acid arrays were macro arrays approximately 9 cm × 12 cm and the first computerized image based analysis was published in 1981. It was invented by Patrick O. Brown. An example of its application is in SNPs arrays for polymorphisms in cardiovascular diseases, cancer, pathogens and GWAS analysis. It is also used for the identification of structural variations and the measurement of gene expression.

Minimum information about a microarray experiment (MIAME) is a standard created by the FGED Society for reporting microarray experiments.

Biomarker discovery is a medical term describing the process by which biomarkers are discovered. Many commonly used blood tests in medicine are biomarkers. There is interest in biomarker discovery on the part of the pharmaceutical industry; blood-test or other biomarkers could serve as intermediate markers of disease in clinical trials, and as possible drug targets.

The Association of Biomolecular Resource Facilities (ABRF) is dedicated to advancing core and research biotechnology laboratories through research, communication, and education. ABRF members include over 2000 scientists representing 340 different core laboratories in 41 countries, including those in industry, government, academic and research institutions.

The Functional GEnomics Data Society (FGED) was a non-profit, volunteer-run international organization of biologists, computer scientists, and data analysts that aims to facilitate biological and biomedical discovery through data integration. The approach of FGED was to promote the sharing of basic research data generated primarily via high-throughput technologies that generate large data sets within the domain of functional genomics.

<span class="mw-page-title-main">Genomic Standards Consortium</span>

The Genomic Standards Consortium (GSC) is an initiative working towards richer descriptions of our collection of genomes, metagenomes and marker genes. Established in September 2005, this international community includes representatives from a range of major sequencing and bioinformatics centres and research institutions. The goal of the GSC is to promote mechanisms for standardizing the description of (meta)genomes, including the exchange and integration of (meta)genomic data. The number and pace of genomic and metagenomic sequencing projects will only increase as the use of ultra-high-throughput methods becomes common place and standards are vital to scientific progress and data sharing.

Gene Expression Omnibus (GEO) is a database for gene expression profiling and RNA methylation profiling managed by the National Center for Biotechnology Information (NCBI). These high-throughput screening genomics data are derived from microarray or RNA-Seq experimental data. These data need to conform to the minimum information about a microarray experiment (MIAME) format.

The Proteomics Standards Initiative (PSI) is a working group of the Human Proteome Organization. It aims to define data standards for proteomics to facilitate data comparison, exchange and verification.

LabKey Server is a software suite available for scientists to integrate, analyze, and share biomedical research data. The platform provides a secure data repository that allows web-based querying, reporting, and collaborating across a range of data sources. Specific scientific applications and workflows can be added on top of the basic platform and leverage a data processing pipeline.

Suspension array technology is a high throughput, large-scale, and multiplexed screening platform used in molecular biology. SAT has been widely applied to genomic and proteomic research, such as single nucleotide polymorphism (SNP) genotyping, genetic disease screening, gene expression profiling, screening drug discovery and clinical diagnosis. SAT uses microsphere beads to prepare arrays. SAT allows for the simultaneous testing of multiple gene variants through the use of these microsphere beads as each type of microsphere bead has a unique identification based on variations in optical properties, most common is fluorescent colour. As each colour and intensity of colour has a unique wavelength, beads can easily be differentiated based on their wavelength intensity. Microspheres are readily suspendable in solution and exhibit favorable kinetics during an assay. Similar to flat microarrays, an appropriate receptor molecule, such as DNA oligonucleotide probes, antibodies, or other proteins, attach themselves to the differently labeled microspheres. This produces thousands of microsphere array elements. Probe-target hybridization is usually detected by optically labeled targets, which determines the relative abundance of each target in the sample.

The phenotype microarray approach is a technology for high-throughput phenotyping of cells. A phenotype microarray system enables one to monitor simultaneously the phenotypic reaction of cells to environmental challenges or exogenous compounds in a high-throughput manner. The phenotypic reactions are recorded as either end-point measurements or respiration kinetics similar to growth curves.

Flow cytometry bioinformatics is the application of bioinformatics to flow cytometry data, which involves storing, retrieving, organizing and analyzing flow cytometry data using extensive computational resources and tools. Flow cytometry bioinformatics requires extensive use of and contributes to the development of techniques from computational statistics and machine learning. Flow cytometry and related methods allow the quantification of multiple independent biomarkers on large numbers of single cells. The rapid growth in the multidimensionality and throughput of flow cytometry data, particularly in the 2000s, has led to the creation of a variety of computational analysis methods, data standards, and public databases for the sharing of results.

UniCarb-DB is a structural and mass spectrometric database used in glycomics. UniCarb-DB provides over 1000 LC-MS/MS spectra for N- and O-linked glycans released from glycoproteins that were manually annotated. Each entry contains reference to published work, information about structure, GlyToucan Accession Number, MS/MS fragmentation with complete peak lists, biological contexts and experimental metadata. The database was created by a collaboration between University of Gothenburg and Macquarie University and since November 2016 is hosted by Swiss Institute for Bioinformatics. The database is the first to implement the Minimum Information standard MIRAGE for submission of glycomic MS/MS data into the database.

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.

The Minimum Information Required About a Glycomics Experiment (MIRAGE) initiative is part of the Minimum Information Standards and specifically applies to guidelines for reporting on a glycomics experiment. The initiative is supported by the Beilstein Institute for the Advancement of Chemical Sciences. The MIRAGE project focuses on the development of publication guidelines for interaction and structural glycomics data as well as the development of data exchange formats. The project was launched in 2011 in Seattle and set off with the description of the aims of the MIRAGE project.

<span class="mw-page-title-main">Terry Gaasterland</span>

Theresa Gaasterland is an American politician and scientist. She is a Professor of Computational Biology and Genomics and Director of the Scripps Genome Center at the University of California, San Diego (UCSD). She was elected a Fellow of the International Society for Computational Biology (ISCB) in 2018 for outstanding contributions to the fields of computational biology and bioinformatics.

In molecular biology, a batch effect occurs when non-biological factors in an experiment cause changes in the data produced by the experiment. Such effects can lead to inaccurate conclusions when their causes are correlated with one or more outcomes of interest in an experiment. They are common in many types of high-throughput sequencing experiments, including those using microarrays, mass spectrometers, and single-cell RNA-sequencing data. They are most commonly discussed in the context of genomics and high-throughput sequencing research, but they exist in other fields of science as well.

Catherine E. Costello is the William Fairfield Warren distinguished professor in the Department of Biochemistry, Cell Biology and Genomics, and the director of the Center for Biomedical Mass Spectrometry at the Boston University School of Medicine.

<span class="mw-page-title-main">Olga Ornatsky</span> Canadian Scientist

Olga Ornatsky is a Soviet born, Canadian scientist. Ornatsky co-founded DVS Sciences in 2004 along with Dmitry Bandura, Vladimir Baranov and Scott D. Tanner.

References

  1. 1 2 EMBL-EBI. "Minimum information standards | Bioinformatics for the terrified" . Retrieved 2021-06-28. CC BY icon-80x15.png  This article incorporates text available under the CC BY 4.0 license.
  2. 1 2 3 Lee, Jamie A.; Spidlen, Josef; Boyce, Keith; Cai, Jennifer; Crosbie, Nicholas; Dalphin, Mark; Furlong, Jeff; Gasparetto, Maura; Goldberg, Michael; Goralczyk, Elizabeth M.; Hyun, Bill; Jansen, Kirstin; Kollmann, Tobias; Kong, Megan; Leif, Robert; McWeeney, Shannon; Moloshok, Thomas D.; Moore, Wayne; Nolan, Garry; Nolan, John; Nikolich-Zugich, Janko; Parrish, David; Purcell, Barclay; Qian, Yu; Selvaraj, Biruntha; Smith, Clayton; Tchuvatkina, Olga; Wertheimer, Anne; Wilkinson, Peter; Wilson, Christopher; Wood, James; Zigon, Robert; Scheuermann, Richard H.; Brinkman, Ryan R. (1 October 2008). "MIFlowCyt: The minimum information about a flow cytometry experiment". Cytometry Part A. 73 (10): 926–930. doi: 10.1002/cyto.a.20623 . PMC   2773297 . PMID   18752282.
  3. 1 2 Brazma, Alvis; Hingamp, Pascal; Quackenbush, John; Sherlock, Gavin; Spellman, Paul; Stoeckert, Chris; Aach, John; Ansorge, Wilhelm; Ball, Catherine A.; Causton, Helen C.; Gaasterland, Terry; Glenisson, Patrick; Holstege, Frank C.P.; Kim, Irene F.; Markowitz, Victor; Matese, John C.; Parkinson, Helen; Robinson, Alan; Sarkans, Ugis; Schulze-Kremer, Steffen; Stewart, Jason; Taylor, Ronald; Vilo, Jaak; Vingron, Martin (30 November 2001). "Minimum information about a microarray experiment (MIAME)—toward standards for microarray data". Nature Genetics. 29 (4): 365–371. doi: 10.1038/ng1201-365 . PMID   11726920. S2CID   6994467.
  4. Taylor, Chris F (2008). "Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project". Nature Biotechnology. 26 (8): 889–896. doi:10.1038/nbt.1411. PMC   2771753 . PMID   18688244.
  5. Bustin, Stephen; Benes, Vladimir; Garson, Jeremy; Hellermans, Jan; Huggett, Jim; Kubista, Mikael; Mueller, Reinhold; Nolan, Tania; Pffaffl, Michael; Shipley, Gregory; Vandesompele, Jo; Wittwer, Carl (2009). "The MIQE Guidelines: Minimum Information for Publication of Quantitative Real-Time PCR Experiments". Clinical Chemistry. 55 (4): 611–622. doi: 10.1373/clinchem.2008.112797 . PMID   19246619.
  6. Gibson, Frank, Overton, Paul, Smulders, Tom, Schultz, Simon, Eglen, Stephen, Ingram, Colin, Panzeri, Stefano, Bream, Phil, Sernagor, Evelyne, Cunningham, Mark, Adams, Christopher, Echtermeyer, Christoph, Simonotto, Jennifer, Kaiser, Marcus, Swan, Daniel, Fletcher, Marty, and Lord, Phillip. Minimum Information about a Neuroscience Investigation (MINI) Electrophysiology. Available from Nature Precedings <http://hdl.handle.net/10101/npre.2008.1720.1> (2008)
  7. "MIFlowCyt - FICCS Wiki". 2007-05-20. Archived from the original on 2007-05-20. Retrieved 2021-04-21.
  8. "MIFlowCyt Standard - ISAC Recommendation -- Bioinformatics Standards for Flow Cytometry". flowcyt.sourceforge.net. Retrieved 2021-04-21.
  9. Leebens-Mack, J.; Vision, T.; Brenner, E.; Bowers, J. E.; Cannon, S.; Clement, M. J.; Cunningham, C. W.; Depamphilis, C.; Desalle, R.; Doyle, J. J.; Eisen, J. A.; Gu, X.; Harshman, J.; Jansen, R. K.; Kellogg, E. A.; Koonin, E. V.; Mishler, B. D.; Philippe, H.; Pires, J. C.; Qiu, Y. L.; Rhee, S. Y.; Sjölander, K.; Soltis, D. E.; Soltis, P. S.; Stevenson, D. W.; Wall, K.; Warnow, T.; Zmasek, C. (2006). "Taking the First Steps towards a Standard for Reporting on Phylogenies: Minimum Information about a Phylogenetic Analysis (MIAPA)". OMICS: A Journal of Integrative Biology. 10 (2): 231–7. doi:10.1089/omi.2006.10.231. PMC   3167193 . PMID   16901231.
  10. Struwe, WB; et al. (2016). "The minimum information required for a glycomics experiment (MIRAGE) project: sample preparation guidelines for reliable reporting of glycomics datasets". Glycobiology. 26 (9): 907–910. doi:10.1093/glycob/cww082. PMC   5045532 . PMID   27654115.
  11. York, WS; et al. (2014). "MIRAGE: the minimum information required for a glycomics experiment". Glycobiology. 24 (5): 402–406. doi:10.1093/glycob/cwu018. PMC   3976285 . PMID   24653214.