ModBase

Last updated
ModBase
Database.png
Content
DescriptionDatabase of comparative protein structure models
Contact
Research center University of California at San Francisco
LaboratoryDepartment of Bioengineering and Therapeutic Sciences
AuthorsUrsula Pieper, Eswar Narayanan, Ben Webb, Andrej Sali
Primary citationPieper & al. (2011) [1]
Release date1998
Access
Website http://salilab.org/modbase
Download URL ftp://salilab.org/databases/modbase
Web service URL http://salilab.org/modweb

ModBase is a database of annotated comparative protein structure models, containing models for more than 3.8 million unique protein sequences. [1] Models are created by the comparative modeling pipeline ModPipe which relies on the MODELLER program.

Contents

ModBase is developed in the laboratory of Andrej Sali at UCSF. ModBase models are also accessible through the Protein Model Portal.

See also

Related Research Articles

<span class="mw-page-title-main">Structural genomics</span>

Structural genomics seeks to describe the 3-dimensional structure of every protein encoded by a given genome. This genome-based approach allows for a high-throughput method of structure determination by a combination of experimental and modeling approaches. The principal difference between structural genomics and traditional structural prediction is that structural genomics attempts to determine the structure of every protein encoded by the genome, rather than focusing on one particular protein. With full-genome sequences available, structure prediction can be done more quickly through a combination of experimental and modeling approaches, especially because the availability of large number of sequenced genomes and previously solved protein structures allows scientists to model protein structure on the structures of previously solved homologs.

In the field of bioinformatics, a sequence database is a type of biological database that is composed of a large collection of computerized ("digital") nucleic acid sequences, protein sequences, or other polymer sequences stored on a computer. The UniProt database is an example of a protein sequence database. As of 2013 it contained over 40 million sequences and is growing at an exponential rate. Historically, sequences were published in paper form, but as the number of sequences grew, this storage method became unsustainable.

<span class="mw-page-title-main">Structural bioinformatics</span> Bioinformatics subfield

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

<span class="mw-page-title-main">ENCODE</span> Research consortium investigating functional elements in human and model organism DNA

The Encyclopedia of DNA Elements (ENCODE) is a public research project which aims "to build a comprehensive parts list of functional elements in the human genome."

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

In biochemistry, a hypothetical protein is a protein whose existence has been predicted, but for which there is a lack of experimental evidence that it is expressed in vivo. Sequencing of several genomes has resulted in numerous predicted open reading frames to which functions cannot be readily assigned. These proteins, either orphan or conserved hypothetical proteins, make up ~ 20% to 40% of proteins encoded in each newly sequenced genome. The real evidences for the hypothetical protein functioning in the metabolism of the organism can be predicted by comparing its sequence or structure homology by considering the conserved domain analysis. Even when there is enough evidence that the product of the gene is expressed, by techniques such as microarray and mass-spectrometry, it is difficult to assign a function to it given its lack of identity to protein sequences with annotated biochemical function. Nowadays, most protein sequences are inferred from computational analysis of genomic DNA sequence. Hypothetical proteins are created by gene prediction software during genome analysis. When the bioinformatic tool used for the gene identification finds a large open reading frame without a characterised homologue in the protein database, it returns "hypothetical protein" as an annotation remark.

<span class="mw-page-title-main">Tom Blundell</span> British biochemist

Sir Thomas Leon Blundell, is a British biochemist, structural biologist, and science administrator. He was a member of the team of Dorothy Hodgkin that solved in 1969 the first structure of a protein hormone, insulin. Blundell has made contributions to the structural biology of polypeptide hormones, growth factors, receptor activation, signal transduction, and DNA double-strand break repair, subjects important in cancer, tuberculosis, and familial diseases. He has developed software for protein modelling and understanding the effects of mutations on protein function, leading to new approaches to structure-guided and Fragment-based lead discovery. In 1999 he co-founded the oncology company Astex Therapeutics, which has moved ten drugs into clinical trials. Blundell has played central roles in restructuring British research councils and, as President of the UK Science Council, in developing professionalism in the practice of science.

DOPE, or Discrete Optimized Protein Energy, is a statistical potential used to assess homology models in protein structure prediction. DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. It is implemented in the popular homology modeling program MODELLER and used to assess the energy of the protein model generated through many iterations by MODELLER, which produces homology models by the satisfaction of spatial restraints. The models returning the minimum molpdfs can be chosen as best probable structures and can be further used for evaluating with the DOPE score. Like the current version of the MODELLER software, DOPE is implemented in Python and is run within the MODELLER environment. The DOPE method is generally used to assess the quality of a structure model as a whole. Alternatively, DOPE can also generate a residue-by-residue energy profile for the input model, making it possible for the user to spot the problematic region in the structure model.

<span class="mw-page-title-main">Homology modeling</span> Method of protein structure prediction using other known proteins

Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein. Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence. It has been seen that protein structures are more conserved than protein sequences amongst homologues, but sequences falling below a 20% sequence identity can have very different structure.

Modeller, often stylized as MODELLER, is a computer program used for homology modeling to produce models of protein tertiary structures and quaternary structures (rarer). It implements a method inspired by nuclear magnetic resonance spectroscopy of proteins, termed satisfaction of spatial restraints, by which a set of geometrical criteria are used to create a probability density function for the location of each atom in the protein. The method relies on an input sequence alignment between the target amino acid sequence to be modeled and a template protein which structure has been solved.

<span class="mw-page-title-main">Andrej Šali</span> American biologist (born 1963)

Andrej Šali is a computational structural biologist. Since 2003, he has been Professor in the Department of Bioengineering and Therapeutic Sciences at University of California, San Francisco. He also serves as an editor of the journal Structure.

Loop modeling is a problem in protein structure prediction requiring the prediction of the conformations of loop regions in proteins with or without the use of a structural template. Computer programs that solve these problems have been used to research a broad range of scientific topics from ADP to breast cancer. Because protein function is determined by its shape and the physiochemical properties of its exposed surface, it is important to create an accurate model for protein/ligand interaction studies. The problem arises often in homology modeling, where the tertiary structure of an amino acid sequence is predicted based on a sequence alignment to a template, or a second sequence whose structure is known. Because loops have highly variable sequences even within a given structural motif or protein fold, they often correspond to unaligned regions in sequence alignments; they also tend to be located at the solvent-exposed surface of globular proteins and thus are more conformationally flexible. Consequently, they often cannot be modeled using standard homology modeling techniques. More constrained versions of loop modeling are also used in the data fitting stages of solving a protein structure by X-ray crystallography, because loops can correspond to regions of low electron density and are therefore difficult to resolve.

In biology, a protein structure database is a database that is modeled around the various experimentally determined protein structures. The aim of most protein structure databases is to organize and annotate the protein structures, providing the biological community access to the experimental data in a useful way. Data included in protein structure databases often includes three-dimensional coordinates as well as experimental information, such as unit cell dimensions and angles for x-ray crystallography determined structures. Though most instances, in this case either proteins or a specific structure determinations of a protein, also contain sequence information and some databases even provide means for performing sequence based queries, the primary attribute of a structure database is structural information, whereas sequence databases focus on sequence information, and contain no structural information for the majority of entries. Protein structure databases are critical for many efforts in computational biology such as structure based drug design, both in developing the computational methods used and in providing a large experimental dataset used by some methods to provide insights about the function of a protein.

(See also: List of proteins in the human body)

<span class="mw-page-title-main">Ram Samudrala</span>

Ram Samudrala is a professor of computational biology and bioinformatics at the University at Buffalo, United States. He researches protein folding, structure, function, interaction, design, and evolution.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

Discovery Studio is a suite of software for simulating small molecule and macromolecule systems. It is developed and distributed by Dassault Systemes BIOVIA.

SWISS-MODEL is a structural bioinformatics web-server dedicated to homology modeling of 3D protein structures. Homology modeling is currently the most accurate method to generate reliable three-dimensional protein structure models and is routinely used in many practical applications. Homology modelling methods make use of experimental protein structures ("templates") to build models for evolutionary related proteins ("targets").

WormBase is an online biological database about the biology and genome of the nematode model organism Caenorhabditis elegans and contains information about other related nematodes. WormBase is used by the C. elegans research community both as an information resource and as a place to publish and distribute their results. The database is regularly updated with new versions being released every two months. WormBase is one of the organizations participating in the Generic Model Organism Database (GMOD) project.

References

  1. 1 2 Pieper, Ursula; Webb Benjamin M; Barkan David T; Schneidman-Duhovny Dina; Schlessinger Avner; Braberg Hannes; Yang Zheng; Meng Elaine C; Pettersen Eric F; Huang Conrad C; Datta Ruchira S; Sampathkumar Parthasarathy; Madhusudhan Mallur S; Sjölander Kimmen; Ferrin Thomas E; Burley Stephen K; Sali Andrej (Jan 2011). "ModBase, a database of annotated comparative protein structure models, and associated resources". Nucleic Acids Res. England. 39 (Database issue): D465-74. doi:10.1093/nar/gkq1091. PMC   3013688 . PMID   21097780.