Content | |
---|---|
Description | orthology inference among 1000 complete genomes. |
Contact | |
Laboratory | ETH Zurich |
Authors | Christophe Dessimoz Adrian Schneider Adrian Altenhoff Gaston H. Gonnet |
Primary citation | Altenhoff et al. [1] |
Release date | 2004 |
Access | |
Website | omabrowser |
Download URL | http://omabrowser.org/All/download.html |
Web service URL | wsdl |
Miscellaneous | |
Data release frequency | 2 releases per year |
OMA (Orthologous MAtrix) is a database of orthologs extracted from available complete genomes. [1] [2] The orthology predictions of OMA are available in several forms:
In bioinformatics, sequence clustering algorithms attempt to group biological sequences that are somehow related. The sequences can be either of genomic, "transcriptomic" (ESTs) or protein origin. For proteins, homologous sequences are typically grouped into families. For EST data, clustering is important to group sequences originating from the same gene before the ESTs are assembled to reconstruct the original mRNA.
Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).
The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.
KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.
TreeFam is a database of phylogenetic trees of animal genes. It aims at developing a curated resource that gives reliable information about ortholog and paralog assignments, and evolutionary history of various gene families.
MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.
Inparanoid is an algorithm that finds orthologous genes and paralogous genes that arose—most likely by duplication—after some speciation event. Such protein-coding genes are called in-paralogs, as opposed to out-paralogs.
In molecular biology, STRING is a biological database and web resource of known and predicted protein–protein interactions.
SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.
Gaston H. Gonnet is a Uruguayan Canadian computer scientist and entrepreneur. He is best known for his contributions to the Maple computer algebra system and the creation of a digital version of the Oxford English Dictionary.
OrthoDB presents a catalog of orthologous protein-coding genes across vertebrates, arthropods, fungi, plants, and bacteria. Orthology refers to the last common ancestor of the species under consideration, and thus OrthoDB explicitly delineates orthologs at each major radiation along the species phylogeny. The database of orthologs presents available protein descriptors, together with Gene Ontology and InterPro attributes, which serve to provide general descriptive annotations of the orthologous groups, and facilitate comprehensive orthology database querying. OrthoDB also provides computed evolutionary traits of orthologs, such as gene duplicability and loss profiles, divergence rates, sibling groups, and gene intron-exon architectures.
PhylomeDB is a public biological database for complete catalogs of gene phylogenies (phylomes). It allows users to interactively explore the evolutionary history of genes through the visualization of phylogenetic trees and multiple sequence alignments. Moreover, phylomeDB provides genome-wide orthology and paralogy predictions which are based on the analysis of the phylogenetic trees. The automated pipeline used to reconstruct trees aims at providing a high-quality phylogenetic analysis of different genomes, including Maximum Likelihood tree inference, alignment trimming and evolutionary model testing.
The human gene Chromosome 3 open reading frame 14 is a gene of uncertain function located at 3p14.2 near fragile site FRBA3—which falls between this gene and the centromere. Its protein is expected to localize to the nucleus and bind DNA. Orthologs have been identified in all of the major animal groups, minus amphibians and insects, tracing as far back as the sea anemone; indicating an origin of over 1000 mya, highlighting its importance in the animal genome.
The eggNOG database is a database of biological information hosted by the EMBL. It is based on the original idea of COGs and expands that idea to non-supervised orthologous groups constructed from numerous organisms. The database was created in 2007 and updated to version 4.5 in 2015. eggNOG stands for evolutionary genealogy of genes: Non-supervised Orthologous Groups.
PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase website was redeveloped in 2016 to provide users with a more fully integrated, better-performing service.
Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.
FAM163A, also known as cebelin and neuroblastoma-derived secretory protein (NDSP) is a protein that in humans is encoded by the FAM163A gene. This protein has been implicated in promoting proliferation and anchorage-independent growth of neuroblastoma cancer cells. In addition, this protein has been found to be up-regulated in the lung tissue of chronic smokers. FAM163A is found on human chromosome 1q25.2; its protein product is 167 amino acids long. FAM163A contains a very highly conserved signal peptide sequence, coded for by the first ~37 amino acids in its sequence; albeit only conserved in eukaryotes, the most distant of which being the Japanese Rice Fish.
In molecular phylogenetics, relationships among individuals are determined using character traits, such as DNA, RNA or protein, which may be obtained using a variety of sequencing technologies. High-throughput next-generation sequencing has become a popular technique in transcriptomics, which represent a snapshot of gene expression. In eukaryotes, making phylogenetic inferences using RNA is complicated by alternative splicing, which produces multiple transcripts from a single gene. As such, a variety of approaches may be used to improve phylogenetic inference using transcriptomic data obtained from RNA-Seq and processed using computational phylogenetics.
Christophe Dessimoz is a Swiss National Science Foundation (SNSF) Professor at the University of Lausanne, Associate Professor at University College London and a group leader at the Swiss Institute of Bioinformatics. He was awarded the Overton Prize in 2019 for his contributions to computational biology.