MANET database

Last updated

The Molecular Ancestry Network (MANET) database is a bioinformatics database that maps evolutionary relationships of protein architectures directly onto biological networks. [1] It was originally developed by Hee Shin Kim, Jay E. Mittenthal and Gustavo Caetano-Anolles in the Department of Crop Sciences of the University of Illinois at Urbana-Champaign. [2]

MANET traces for example the ancestry of individual metabolic enzymes in metabolism with bioinformatic, phylogenetic, and statistical methods. MANET currently links information in the Structural Classification of Proteins (SCOP) database, the metabolic pathways database of the Kyoto Encyclopedia of Genes and Genomes (KEGG), and phylogenetic reconstructions describing the evolution of protein fold architecture at a universal level. [3] The database has been updated to reflect evolution of metabolism at the level of protein fold families. [4] MANET literally "paints" the ancestries of enzymes derived from rooted phylogenetic trees directly onto over one hundred metabolic pathways representations, paying homage to one of the fathers of impressionism. It also provides numerous functionalities that enable searching specific protein folds with defined ancestry values, displaying the distribution of enzymes that are painted, and exploring quantitative details describing individual protein folds. This permits the study of global and local metabolic network architectures, and the extraction of evolutionary patterns at global and local levels.

A statistical analysis of the data in MANET showed for example a patchy distribution of ancestry values assigned to protein folds in each subnetwork, indicating that evolution of metabolism occurred globally by widespread recruitment of enzymes. [2] MANET was used recently to sort out enzymatic recruitment processes in metabolic networks and propose that modern metabolism originated in the purine nucleotide metabolic subnetwork. [5] The database is useful for the study of metabolic evolution.

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Metabolism</span> Set of chemical reactions in organisms

Metabolism is the set of life-sustaining chemical reactions in organisms. The three main functions of metabolism are: the conversion of the energy in food to energy available to run cellular processes; the conversion of food to building blocks of proteins, lipids, nucleic acids, and some carbohydrates; and the elimination of metabolic wastes. These enzyme-catalyzed reactions allow organisms to grow and reproduce, maintain their structures, and respond to their environments. The word metabolism can also refer to the sum of all chemical reactions that occur in living organisms, including digestion and the transportation of substances into and between different cells, in which case the above described set of reactions within the cells is called intermediary metabolism.

Viral evolution is a subfield of evolutionary biology and virology that is specifically concerned with the evolution of viruses. Viruses have short generation times, and many—in particular RNA viruses—have relatively high mutation rates. Although most viral mutations confer no benefit and often even prove deleterious to viruses, the rapid rate of viral mutation combined with natural selection allows viruses to quickly adapt to changes in their host environment. In addition, because viruses typically produce many copies in an infected host, mutated genes can be passed on to many offspring quickly. Although the chance of mutations and evolution can change depending on the type of virus, viruses overall have high chances for mutations.

<span class="mw-page-title-main">Protein complex</span> Type of stable macromolecular complex

A protein complex or multiprotein complex is a group of two or more associated polypeptide chains. Protein complexes are distinct from multidomain enzymes, in which multiple catalytic domains are found in a single polypeptide chain.

<span class="mw-page-title-main">Rossmann fold</span>

The Rossmann fold is a tertiary fold found in proteins that bind nucleotides, such as enzyme cofactors FAD, NAD+, and NADP+. This fold is composed of alternating beta strands and alpha helical segments where the beta strands are hydrogen bonded to each other forming an extended beta sheet and the alpha helices surround both faces of the sheet to produce a three-layered sandwich. The classical Rossmann fold contains six beta strands whereas Rossmann-like folds, sometimes referred to as Rossmannoid folds, contain only five strands. The initial beta-alpha-beta (bab) fold is the most conserved segment of the Rossmann fold. The motif is named after Michael Rossmann who first noticed this structural motif in the enzyme lactate dehydrogenase in 1970 and who later observed that this was a frequently occurring motif in nucleotide binding proteins.

<span class="mw-page-title-main">Last universal common ancestor</span> Most recent common ancestor of all current life on Earth

The last universal common ancestor (LUCA) is the hypothesized common ancestral cell from which the three domains of life, the Bacteria, the Archaea, and the Eukarya originated. It is suggested to have been a "cellular organism that had a lipid bilayer and used DNA, RNA, and protein". The LUCA has also been defined as "a hypothetical organism ancestral to all three domains". The LUCA is the point or stage at which the three domains of life diverged from preexisting forms of life. The nature of this point or stage of divergence remains a topic of research.

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data. These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. As such, computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.

<span class="mw-page-title-main">Metabolic network modelling</span> Form of biological modelling

Metabolic network modelling, also known as metabolic network reconstruction or metabolic pathway analysis, allows for an in-depth insight into the molecular mechanisms of a particular organism. In particular, these models correlate the genome with molecular physiology. A reconstruction breaks down metabolic pathways into their respective reactions and enzymes, and analyzes them within the perspective of the entire network. In simplified terms, a reconstruction collects all of the relevant metabolic information of an organism and compiles it in a mathematical model. Validation and analysis of reconstructions can allow identification of key features of metabolism such as growth yield, resource distribution, network robustness, and gene essentiality. This knowledge can then be applied to create novel biotechnology.

<span class="mw-page-title-main">KEGG</span> Collection of bioinformatics databases

KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.

<span class="mw-page-title-main">Protein domain</span> Self-stable region of a proteins chain that folds independently from the rest

In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

<span class="mw-page-title-main">Gustavo Caetano-Anolles</span>

Gustavo Caetano-Anollés is Professor of Bioinformatics in the Department of Crop Sciences, University of Illinois at Urbana-Champaign. He is an expert in the field of evolutionary and comparative genomics.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

Evolution of cells refers to the evolutionary origin and subsequent evolutionary development of cells. Cells first emerged at least 3.8 billion years ago approximately 750 million years after Earth was formed.

A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred. Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similarity is evident. Sequence homology can then be deduced even if not apparent. Superfamilies typically contain several protein families which show sequence similarity within each family. The term protein clan is commonly used for protease and glycosyl hydrolases superfamilies based on the MEROPS and CAZy classification systems.

<span class="mw-page-title-main">Lokiarchaeota</span> Phylum of archaea

Lokiarchaeota is a proposed phylum of the Archaea. The phylum includes all members of the group previously named Deep Sea Archaeal Group, also known as Marine Benthic Group B. Lokiarchaeota is part of the superphylum Asgard containing the phyla: Lokiarchaeota, Thorarchaeota, Odinarchaeota, Heimdallarchaeota, and Helarchaeota. A phylogenetic analysis disclosed a monophyletic grouping of the Lokiarchaeota with the eukaryotes. The analysis revealed several genes with cell membrane-related functions. The presence of such genes support the hypothesis of an archaeal host for the emergence of the eukaryotes; the eocyte-like scenarios.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

Christos A. Ouzounis is a computational biologist, a director of research at the CERTH, and Professor of Bioinformatics at Aristotle University in Thessaloniki.

Reductive evolution is the process by which microorganisms remove genes from their genome. It can occur when bacteria found in a free-living state enter a restrictive state or are completely absorbed by another organism becoming intracellular (symbiogenesis). The bacteria will adapt to survive and thrive in the restrictive state by altering and reducing its genome to get rid of the newly redundant pathways that are provided by the host. In an endosymbiont or symbiogenesis relationship where both the guest and host benefit, the host can also undergo reductive evolution to eliminate pathways that are more efficiently provided for by the guest.

References

  1. "Molecular Ancestry Network, University of Illinois". www.manet.uiuc.edu. Archived from the original on 9 July 2017. Retrieved 14 January 2022.
  2. 1 2 Kim HS, Mittenthal JE, Caetano-Anolles G (2006). "MANET:tracing evolution of protein architecture in metabolic networks". BMC Bioinformatics. 7: 351. doi: 10.1186/1471-2105-7-351 . PMC   1559654 . PMID   16854231.
  3. Caetano-Anolles G, Caetano-Anolles D (2003). "An evolutionarily structured universe of protein architecture". Genome Res. 13 (7): 1563–71. doi:10.1101/gr.1161903. PMC   403752 . PMID   12840035.
  4. Mughal F, Caetano-Anolles G (2019). "MANET 3.0: Hierarchy and modularity in evolving metabolic networks". PLOS ONE. 14 (10): e0224201. Bibcode:2019PLoSO..1424201M. doi: 10.1371/journal.pone.0224201 . PMC   6812854 . PMID   31648227.
  5. Caetano-Anolles G, Kim HS, Mittenthal JE (2007). "The origin of modern metabolic networks inferred from phylogenomic analysis of protein architecture". Proc Natl Acad Sci USA. 104 (22): 9358–63. Bibcode:2007PNAS..104.9358C. doi: 10.1073/pnas.0701214104 . PMC   1890499 . PMID   17517598.