Multi-Omics Profiling Expression Database

Last updated
MOPED
Database.png
Content
DescriptionMOPED enables discoveries through consistently processed multi-omics data
Contact
Research center Seattle Children's Research Institute
AuthorsRoger Higdon
Primary citationHigdon R, et al. [1]
Release date2012
Access
Website MOPED

The Multi-Omics Profiling Expression Database (MOPED) was an expanding multi-omics resource that supports rapid browsing of transcriptomics and proteomics information from publicly available studies on model organisms and humans. [2] As to date (2021) it has ceased activities and is unaccessible online. [3]

Contents

Systematic Protein Investigative Research Environment

MOPED is designed to simplify the comparison and sharing of data for the greater research community. MOPED employs the standardized analysis pipeline SPIRE [4] to uniquely provide protein level absolute and relative expression data, meta- analysis capabilities and quantitative data. Processed relative expression transcriptomics data were obtained from the Gene Expression Omnibus (GEO). Data can be queried for specific proteins and genes, browsed based on organism, tissue, localization and condition, and sorted by false discovery rate and expression. MOPED empowers users to visualize their own expression data and compare it with existing studies. Further, MOPED links to various protein and pathway data- bases, including GeneCards, Panther, Entrez, UniProt, KEGG, SEED, and Reactome. Protein and gene identifiers are integrated from GeneCards (cross-referenced with MOPED), Genbank, RefSeq, UniProt, WormBase, and Saccharomyces Genome Database (SGD). The current version of MOPED (MOPED 2.5, 2014) contains approximately 5 million total records including ~260 experiments and ~390 conditions. MOPED is developed and supported by the Kolker team at Seattle Children's Research Institute.

Model Organism Protein Expression Database

MOPED was previously known as the Model Organism Protein Expression Database, before changing its name to the Multi-Omics Profiling Expression Database. [5] [6]

Related Research Articles

<span class="mw-page-title-main">Proteome</span> Set of proteins that can be expressed by a genome, cell, tissue, or organism

The proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. Proteomics is the study of the proteome.

<span class="mw-page-title-main">Omics</span> Suffix in biology

The branches of science known informally as omics are various disciplines in biology whose names end in the suffix -omics, such as genomics, proteomics, metabolomics, metagenomics, phenomics and transcriptomics. Omics aims at the collective characterization and quantification of pools of biological molecules that translate into the structure, function, and dynamics of an organism or organisms.

<span class="mw-page-title-main">KEGG</span> Collection of bioinformatics databases

KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.

<span class="mw-page-title-main">PROSITE</span> Database of protein domains, families and functional sites

PROSITE is a protein database. It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns and profiles in them. These are manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot protein annotation. PROSITE was created in 1988 by Amos Bairoch, who directed the group for more than 20 years. Since July 2018, the director of PROSITE and Swiss-Prot is Alan Bridge.

Mouse Genome Informatics (MGI) is a free, online database and bioinformatics resource hosted by The Jackson Laboratory, with funding by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). MGI provides access to data on the genetics, genomics and biology of the laboratory mouse to facilitate the study of human health and disease. The database integrates multiple projects, with the two largest contributions coming from the Mouse Genome Database and Mouse Gene Expression Database (GXD). As of 2018, MGI contains data curated from over 230,000 publications.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

<span class="mw-page-title-main">STRING</span>

In molecular biology, STRING is a biological database and web resource of known and predicted protein–protein interactions.

<span class="mw-page-title-main">Therapeutic Targets Database</span> Database of protein targets in drug design

Therapeutic Target Database (TTD) is a pharmaceutical and medical repository constructed by the Innovative Drug Research and Bioinformatics Group (IDRB) at Zhejiang University, China and the Bioinformatics and Drug Design Group at the National University of Singapore. It provides information about known and explored therapeutic protein and nucleic acid targets, the targeted disease, pathway information and the corresponding drugs directed at each of these targets. Detailed knowledge about target function, sequence, 3D structure, ligand binding properties, enzyme nomenclature and drug structure, therapeutic class, and clinical development status. TTD is freely accessible without any login requirement at https://idrblab.org/ttd/.

<span class="mw-page-title-main">Proteogenomics</span>

Proteogenomics is a field of biological research that utilizes a combination of proteomics, genomics, and transcriptomics to aid in the discovery and identification of peptides. Proteogenomics is used to identify new peptides by comparing MS/MS spectra against a protein database that has been derived from genomic and transcriptomic information. Proteogenomics often refers to studies that use proteomic information, often derived from mass spectrometry, to improve gene annotations. The utilization of both proteomics and genomics data alongside advances in the availability and power of spectrographic and chromatographic technology led to the emergence of proteogenomics as its own field in 2004.

The human gene Chromosome 3 open reading frame 14 is a gene of uncertain function located at 3p14.2 near fragile site FRBA3—which falls between this gene and the centromere. Its protein is expected to localize to the nucleus and bind DNA. Orthologs have been identified in all of the major animal groups, minus amphibians and insects, tracing as far back as the sea anemone; indicating an origin of over 1000 mya, highlighting its importance in the animal genome.

DisProt is a manually curated biological database of intrinsically disordered proteins (IDPs) and regions (IDRs). DisProt annotations cover state information on the protein but also, when available, its state transitions, interactions and functional aspects of disorder detected by specific experimental methods. DisProt is hosted and maintained in the BioComputing UP laboratory.

Systematic Protein Investigative Research Environment (SPIRE) provides web-based experiment-specific mass spectrometry (MS) proteomics analysis in order to identify proteins and peptides, and label-free expression and relative expression analyses. SPIRE provides a web-interface and generates results in both interactive and simple data formats.

Pan-cancer analysis aims to examine the similarities and differences among the genomic and cellular alterations found across diverse tumor types. International efforts have performed pan-cancer analysis on exomes and the whole genomes of cancers, the latter including their non-coding regions. In 2018, The Cancer Genome Atlas (TCGA) Research Network used exome, transcriptome, and DNA methylome data to develop an integrated picture of commonalities, differences, and emergent themes across tumor types.

<span class="mw-page-title-main">Gene set enrichment analysis</span> Bioinformatics method

Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes which are used for the analysis.

The Expression Atlas is a database maintained by the European Bioinformatics Institute that provides information on gene expression patterns from RNA-Seq and Microarray studies, and protein expression from Proteomics studies. The Expression Atlas allows searches by gene, splice variant, protein attribute, disease, treatment or organism part. Individual genes or gene sets can be searched for. All datasets in Expression Atlas have its metadata manually curated and its data analysed through standardised analysis pipelines. There are two components to the Expression Atlas, the Baseline Atlas and the Differential Atlas:

<span class="mw-page-title-main">Multiomics</span>

Multiomics, multi-omics, integrative omics, "panomics" or "pan-omics" is a biological analysis approach in which the data sets are multiple "omes", such as the genome, proteome, transcriptome, epigenome, metabolome, and microbiome ; in other words, the use of multiple omics technologies to study life in a concerted way. By combining these "omes", scientists can analyze complex biological big data to find novel associations between biological entities, pinpoint relevant biomarkers and build elaborate markers of disease and physiology. In doing so, multiomics integrates diverse omics data to find a coherently matching geno-pheno-envirotype relationship or association. The OmicTools service lists more than 99 softwares related to multiomic data analysis, as well as more than 99 databases on the topic.

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

In molecular biology, MvirDB is a publicly available database that stores information on toxins, virulence factors and antibiotic resistance genes. Sources that this database uses for DNA and protein information include: Tox-Prot, SCORPION, the PRINTS Virulence Factors, VFDB, TVFac, Islander, ARGO and VIDA. The database provides a BLAST tool that allows the user to query their sequence against all DNA and protein sequences in MvirDB. Information on virulence factors can be obtained from the usage of the provided browser tool. Once the browser tool is used, the results are returned as a readable table that is organized by ascending E-Values, each of which are hyperlinked to their related page. MvirDB is implemented in an Oracle 10g relational database.

Deterministic Barcoding in Tissue for Spatial Omics Sequencing (DBiT-seq) was developed at Yale University by Rong Fan and colleagues in 2020 to create a multi-omics approach for studying spatial gene expression heterogenicity within a tissue sample. This method can used for the co-mapping mRNA and protein levels at a near single-cell resolution in fresh or frozen formaldehyde-fixed tissue samples. DBiT-seq utilizes next generation sequencing (NGS) and microfluidics. This method allows for simultaneous spatial transcriptomic and proteomic analysis of a tissue sample. DBiT-seq improves upon previous spatial transcriptomics applications such as High-Definition Spatial Transcriptomics (HDST) and Slide-seq by increasing the number of detectable genes per pixel, increased cellular resolution, and ease of implementation.

References

  1. Higdon R; Stewart E; Stanberry L; Haynes W; Choiniere J; Montague E; Anderson N; Yandl Y; Janko I; Broomall W; Fishilevich S; Lancet D; Kolker N; Eugene Kolker. (Jan 2014). "MOPED enables discoveries through consistently processed proteomics data". J Proteome Res. 13 (1): 107–113. doi:10.1021/pr400884c. PMC   4039175 . PMID   24350770.
  2. Kolker E, Higdon R, Haynes W, Welch D, Broomall W, Lancet D, Stanberry L, Kolker N (Jan 2012). "MOPED: Model Organism Protein Expression Database". Nucleic Acids Res. 40 (Database issue): D1093–9. doi:10.1093/nar/gkr1177. PMC   3245040 . PMID   22139914.
  3. "Start Moped Web Application". www.proteinspire.org. Archived from the original on 2014-04-22.
  4. Kolker E, Higdon R, Morgan P, Sedensky M, Welch D, Bauman A, Stewart E, Haynes W, Broomall W, Kolker N (December 2011). "SPIRE: Systematic protein investigative research environment". J Proteomics. 75 (1): 122–6. doi:10.1016/j.jprot.2011.05.009. PMID   21609792.
  5. Kolker, Eugene; Higdon, Roger; Haynes, Winston; Welch, Dean; Broomall, William; Lancet, Doron; Stanberry, Larissa; Kolker, Natali (2011-12-01). "MOPED: Model Organism Protein Expression Database". Nucleic Acids Research. 40 (D1): D1093–D1099. doi:10.1093/nar/gkr1177. ISSN   1362-4962. PMC   3245040 . PMID   22139914.
  6. Montague, Elizabeth; Janko, Imre; Stanberry, Larissa; Lee, Elaine; Choiniere, John; Anderson, Nathaniel; Stewart, Elizabeth; Broomall, William; Higdon, Roger; Kolker, Natali; Kolker, Eugene (2015-01-28). "Beyond protein expression, MOPED goes multi-omics". Nucleic Acids Research. 43 (D1): D1145–D1151. doi:10.1093/nar/gku1175. ISSN   1362-4962. PMC   4383969 . PMID   25404128.

Further reading