EggNOG (database)

Last updated
eggNOG
Database.png
Content
DescriptionDatabase of orthologous proteins and functional annotations at multiple taxonomical levels.
Contact
Research center European Molecular Biology Laboratory
Authors Huerta-Cepas et al.
Primary citationHuerta-Cepas et al. (2015) [1]
Release date2015
Access
Website http://eggnogdb.embl.de

The eggNOG database is a database of biological information hosted by the EMBL. It is based on the original idea of COGs (clusters of orthologous groups) [2] [3] and expands that idea to non-supervised orthologous groups constructed from numerous organisms. [4] The database was created in 2007 [5] and updated to version 4.5 in 2015. [1] eggNOG stands for evolutionary genealogy of genes: Non-supervised Orthologous Groups.

Related Research Articles

Biological database

Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.

Sequence homology Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

UniProt Database of protein sequences and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, United States.

Pfam

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 33.1, was released in May 2020 and contains 18,259 families.

Amos Bairoch

Amos Bairoch is a Swiss bioinformatician and Professor of Bioinformatics at the Department of Human Protein Sciences of the University of Geneva where he leads the CALIPHO group at the Swiss Institute of Bioinformatics (SIB) combining bioinformatics, curation, and experimental efforts to functionally characterize human proteins.

InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

David J. Lipman

David J. Lipman is an American biologist who from 1989 to 2017 was the Director of the National Center for Biotechnology Information (NCBI) at the National Institutes of Health. NCBI is the home of GenBank, the U.S. node of the International Sequence Database Consortium, and PubMed, one of the most heavily used sites in the world for the search and retrieval of biomedical information. Lipman is one of the original authors of the BLAST sequence alignment program, and a respected figure in bioinformatics. In 2017, he left NCBI and became Chief Science Officer at Impossible Foods.

MicrobesOnline

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

STRING

In molecular biology, STRING is a biological database and web resource of known and predicted protein–protein interactions.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

miRBase

In bioinformatics, miRBase is a biological database that acts as an archive of microRNA sequences and annotations. As of September 2010 it contained information about 15,172 microRNAs. This number has risen to 38,589 by March 2018. The miRBase registry provides a centralised system for assigning new names to microRNA genes.

Simple Modular Architecture Research Tool (SMART) is a biological database that is used in the identification and analysis of protein domains within protein sequences. SMART uses profile-hidden Markov models built from multiple sequence alignments to detect protein domains in protein sequences. The most recent release of SMART contains 1,204 domain models. Data from SMART was used in creating the Conserved Domain Database collection and is also distributed as part of the InterPro database. The database is hosted by the European Molecular Biology Laboratory in Heidelberg.

OrthoDB

OrthoDB presents a catalog of orthologous protein-coding genes across vertebrates, arthropods, fungi, plants, and bacteria. Orthology refers to the last common ancestor of the species under consideration, and thus OrthoDB explicitly delineates orthologs at each major radiation along the species phylogeny. The database of orthologs presents available protein descriptors, together with Gene Ontology and InterPro attributes, which serve to provide general descriptive annotations of the orthologous groups, and facilitate comprehensive orthology database querying. OrthoDB also provides computed evolutionary traits of orthologs, such as gene duplicability and loss profiles, divergence rates, sibling groups, and gene intron-exon architectures.

Rolf Apweiler

Rolf Apweiler is a director of European Bioinformatics Institute (EBI) part of the European Molecular Biology Laboratory (EMBL) with Ewan Birney.

European Nucleotide Archive

The European Nucleotide Archive (ENA) is a repository providing free and unrestricted access to annotated DNA and RNA sequences. It also stores complementary information such as experimental procedures, details of sequence assembly and other metadata related to sequencing projects. The archive is composed of three main databases: the Sequence Read Archive, the Trace Archive and the EMBL Nucleotide Sequence Database. The ENA is produced and maintained by the European Bioinformatics Institute and is a member of the International Nucleotide Sequence Database Collaboration (INSDC) along with the DNA Data Bank of Japan and GenBank.

Experimental factor ontology

Experimental factor ontology, also known as EFO, is an open-access ontology of experimental variables particularly those used in molecular biology. The ontology covers variables which include aspects of disease, anatomy, cell type, cell lines, chemical compounds and assay information. EFO is developed and maintained at the EMBL-EBI as a cross-cutting resource for the purposes of curation, querying and data integration in resources such as Ensembl, ChEMBL and Expression Atlas.

SHLD1

SHLD1 or shieldin complex subunit 1 is a gene on chromosome 20. The C20orf196 gene encodes an mRNA that is 1,763 base pairs long, and a protein that is 205 amino acids long.

In molecular biology, PathoPhenoDB is a biological database created by Kafkas et al. This database connects pathogens to their phenotypes using multiple databases such as NCBI, Human Disease OntologyHuman Phenotype Ontology, Mammalian Phenotype Ontology, PubChem, SIDER and CARD. Pathogen-disease associations were gathered mainly on CDC and the List of Infectious Diseases page on Wikipedia. The manner by which they assigned taxonomy was semi-automatic. When mapped against NCBI Taxonomy, if the pathogen was not an exact match, it is then mapped to the parent class. PathoPhenoDB employs NPMI in order to filter pairs based on their co-occurrence statistics.

VFDB also known as Virulence Factor Database is a database that provides scientist quick access to virulence factors in bacterial pathogens. It can be navigated and browsed using genus or words. A BLAST tool is provided for search against known virulence factors. VFDB contains a collection of 16 important bacterial pathogens. Perl scripts were used to extract positions and sequences of VF from GenBank. Clusters of Orthologous Groups (COG) was used to update incomplete annotations. More information was obtained by NCBI. VFDB was built on Linux operation systems on DELL PowerEdge 1600SC servers.

References

  1. 1 2 Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC, Rattei T, Mende DR, Sunagawa S, Kuhn M, Jensen LJ, von Mering C, Bork P (2016). "eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences". Nucleic Acids Res. 44 (D1): D286–93. doi:10.1093/nar/gkv1248. PMC   4702882 . PMID   26582926.
  2. Tatusov RL, Koonin EV, Lipman DJ (1997). "A genomic perspective on protein families". Science. 278 (5338): 631–7. Bibcode:1997Sci...278..631T. doi:10.1126/science.278.5338.631. PMID   9381173.
  3. Tatusov RL, Galperin MY, Natale DA, Koonin EV (2000). "The COG database: a tool for genome-scale analysis of protein functions and evolution". Nucleic Acids Res. 28 (1): 33–6. doi:10.1093/nar/28.1.33. PMC   102395 . PMID   10592175.
  4. Powell, Sean; Szklarczyk, Damian; Trachana, Kalliopi; Roth, Alexander; Kuhn, Michael; Muller, Jean; Arnold, Roland; Rattei, Thomas; Letunic, Ivica; Doerks, Tobias; Jensen, Lars J; von Mering, Christian; Bork, Peer (January 2012). "eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges". Nucleic Acids Research. 40 (D1): D284–D289. doi:10.1093/nar/gkr1060. PMC   3245133 . PMID   22096231.
  5. Jensen LJ, Julien P, Kuhn M, von Mering C, Muller J, Doerks T, Bork P (2008). "eggNOG: automated construction and annotation of orthologous groups of genes". Nucleic Acids Res. 36 (Database issue): D250–4. doi:10.1093/nar/gkm796. PMC   2238944 . PMID   17942413.