Protein structure database

Last updated

In biology, a protein structure database is a database that is modeled around the various experimentally determined protein structures. The aim of most protein structure databases is to organize and annotate the protein structures, providing the biological community access to the experimental data in a useful way. Data included in protein structure databases often includes three-dimensional coordinates as well as experimental information, such as unit cell dimensions and angles for x-ray crystallography determined structures. Though most instances, in this case either proteins or a specific structure determinations of a protein, also contain sequence information and some databases even provide means for performing sequence based queries, the primary attribute of a structure database is structural information, whereas sequence databases focus on sequence information, and contain no structural information for the majority of entries. Protein structure databases are critical for many efforts in computational biology such as structure based drug design, both in developing the computational methods used and in providing a large experimental dataset used by some methods to provide insights about the function of a protein. [1]

Contents

The Protein Data Bank

The Protein Data Bank (PDB) was established in 1971 as the central archive of all experimentally determined protein structure data. Today the PDB is maintained by an international consortia collectively known as the Worldwide Protein Data Bank (wwPDB). The mission of the wwPDB is to maintain a single archive of macromolecular structural data that is freely and publicly available to the global community. [2] [3]

List of other protein structure databases

Because the PDB releases data into the public domain, the data has been used in various other protein structure databases.

Examples of protein structure databases include (in alphabetical order);

Database of Macromolecular Movements
describes the motions that occur in proteins and other macromolecules, particularly using movies
Dynameomics
a data warehouse of molecular dynamics simulations and analyses of proteins representing all known protein fold families
JenaLib
the Jena Library of Biological Macromolecules is aimed at a better dissemination of information on three-dimensional biopolymer structures with an emphasis on visualization and analysis.
ModBase
a database of three-dimensional protein models calculated by comparative modeling
OCA
a browser-database for protein structure/function - The OCA integrates information from KEGG, OMIM, PDBselect, Pfam, PubMed, SCOP, SwissProt, and others.
OPM
provides spatial positions of protein three-dimensional structures with respect to the lipid bilayer.
PDB Lite
derived from OCA, PDB Lite was provided to make it as easy as possible to find and view a macromolecule within the PDB
PDBsum
provides an overview macromolecular structures in the PDB, giving schematic diagrams of the molecules in each structure and of the interactions between them
PDBTM
the Protein Data Bank of Transmembrane Proteins a selection of the PDB.
PDBWiki
a community annotated knowledge base of biological molecular structures
ProtCID
The Protein Common Interface Database (ProtCID) is a database of similar protein–protein interfaces in crystal structures of homologous proteins.
Protein
the NIH protein database, a collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and Third Party Annotation, as well as records from SwissProt, PIR, PRF, and PDB
Proteopedia
the collaborative, 3D encyclopedia of proteins and other molecules. A wiki that contains a page for every entry in the PDB (>100,000 pages), with a Jmol view that highlights functional sites and ligands. Offers an easy-to-use scene-authoring tool so you don't have to learn Jmol script language to create customized molecular scenes. Custom scenes are easily attached to "green links" in descriptive text that display those scenes in Jmol.
ProteinLounge
a protein databases that includes visuals of protein structure. Also, includes protein pathways and gene sequences including other tools.
SCOP
the Structural Classification of Proteins a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known.
SWISS-MODEL Repository
a database of annotated protein models calculated by homology modeling
TOPSAN
the Open Protein Structure Annotation Network a wiki designed to collect, share and distribute information about protein three-dimensional structures.

Related Research Articles

Structural biology Study of molecular structures in biology

Structural biology is a branch of molecular biology, biochemistry, and biophysics concerned with the molecular structure of biological macromolecules, how they acquire the structures they have, and how alterations in their structures affect their function. This subject is of great interest to biologists because macromolecules carry out most of the functions of cells, and it is only by coiling into specific three-dimensional shapes that they are able to perform these functions. This architecture, the "tertiary structure" of molecules, depends in a complicated way on each molecule's basic composition, or "primary structure."

The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cryo-electron microscopy, and submitted by biologists and biochemists from around the world, are freely accessible on the Internet via the websites of its member organisations. The PDB is overseen by an organization called the Worldwide Protein Data Bank, wwPDB.

National Center for Biotechnology Information Database branch of the US National Library of Medicine

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper.

Structural bioinformatics

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

Structural Classification of Proteins database

The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins. Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are assumed to have only a very distant common ancestor. Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor.

Biological small-angle scattering

Biological small-angle scattering is a small-angle scattering method for structure analysis of biological materials. Small-angle scattering is used to study the structure of a variety of objects such as solutions of biological macromolecules, nanocomposites, alloys, and synthetic polymers. Small-angle X-ray scattering (SAXS) and small-angle neutron scattering (SANS) are the two complementary techniques known jointly as small-angle scattering (SAS). SAS is an analogous method to X-ray and neutron diffraction, wide angle X-ray scattering, as well as to static light scattering. In contrast to other X-ray and neutron scattering methods, SAS yields information on the sizes and shapes of both crystalline and non-crystalline particles. When used to study biological materials, which are very often in aqueous solution, the scattering pattern is orientation averaged.

The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet Thornton and David Jones, and continues to be developed by the Orengo group at University College London. CATH shares many broad features with the SCOP resource, however there are also many areas in which the detailed classification differs greatly.

Protein structure Three-dimensional arrangement of atoms in an amino acid-chain molecule

Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers – specifically polypeptides – formed from sequences of amino acids, the monomers of the polymer. A single amino acid monomer may also be called a residue indicating a repeating unit of a polymer. Proteins form by amino acids undergoing condensation reactions, in which the amino acids lose one water molecule per reaction in order to attach to one another with a peptide bond. By convention, a chain under 30 amino acids is often identified as a peptide, rather than a protein. To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of non-covalent interactions such as hydrogen bonding, ionic interactions, Van der Waals forces, and hydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure. This is the topic of the scientific field of structural biology, which employs techniques such as X-ray crystallography, NMR spectroscopy, cryo electron microscopy (cryo-EM) and dual polarisation interferometry to determine the structure of proteins.

This article discusses some common molecular file formats, including usage and converting between them.

RasMol

RasMol is a computer program written for molecular graphics visualization intended and used mainly to depict and explore biological macromolecule structures, such as those found in the Protein Data Bank. It was originally developed by Roger Sayle in the early 1990s.

The Protein Data Bank (pdb) file format is a textual file format describing the three-dimensional structures of molecules held in the Protein Data Bank. The pdb format accordingly provides for description and annotation of protein and nucleic acid structures including atomic coordinates, secondary structure assignments, as well as atomic connectivity. In addition experimental metadata are stored. PDB format is the legacy file format for the Protein Data Bank which now keeps data on biological macromolecules in the newer mmCIF file format.

Homology modeling Method of protein structure prediction using other known proteins

Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein. Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence has been shown that protein structures are more conserved than protein sequences amongst homologues, but sequences falling below a 20% sequence identity can have very different structure.

PDBWiki was a wiki that functioned as a user-contributed database of protein structure annotations, listing all the protein structures available in the Protein Data Bank (PDB). It ran on the MediaWiki wiki application from 2007 to 2013. The website went offline in 2014 and there has not been any way to subsequently access the information that was contributed. PDBWiki contained details of more than 50,000 protein structures and over 50 'user-contributed' annotations, making it a significant resource for the structural biology community.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

PDBsum is a database that provides an overview of the contents of each 3D macromolecular structure deposited in the Protein Data Bank. The original version of the database was developed around 1995 by Roman Laskowski and collaborators at University College London. As of 2014, PDBsum is maintained by Laskowski and collaborators in the laboratory of Janet Thornton at the European Bioinformatics Institute (EBI).

Phyre and Phyre2 are free web-based services for protein structure prediction. Phyre is among the most popular methods for protein structure prediction having been cited over 1500 times. Like other remote homology recognition techniques, it is able to regularly generate reliable protein models when other widely used methods such as PSI-BLAST cannot. Phyre2 has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods. Its development is funded by the Biotechnology and Biological Sciences Research Council.

ProtCID

The Protein Common Interface Database (ProtCID) is a database of similar protein-protein interfaces in crystal structures of homologous proteins.

Structure validation

Macromolecular structure validation is the process of evaluating reliability for 3-dimensional atomic models of large biological molecules such as proteins and nucleic acids. These models, which provide 3D coordinates for each atom in the molecule, come from structural biology experiments such as x-ray crystallography or nuclear magnetic resonance (NMR). The validation has three aspects: 1) checking on the validity of the thousands to millions of measurements in the experiment; 2) checking how consistent the atomic model is with those experimental data; and 3) checking consistency of the model with known physical and chemical properties.

In molecular biology, MobiDB is a curated biological database designed to offer a centralized resource for annotations of intrinsic protein disorder. Protein disorder is a structural feature characterizing a large number of proteins with prominent members known as intrinsically unstructured proteins. The database features three levels of annotation: manually curated, indirect and predicted. By combining different data sources of protein disorder into a consensus annotation, MobiDB aims at giving the best possible picture of the "disorder landscape" of a given protein of interest.

References

  1. Laskowski, RA (2011). "Protein structure databases". Mol Biotechnol. 48 (2): 183–98. doi:10.1007/s12033-010-9372-4. PMID   21225378. S2CID   45184564.
  2. Berman, H. M. (January 2008). "The Protein Data Bank: a historical perspective" (PDF). Acta Crystallographica Section A. A64 (1): 88–95. doi: 10.1107/S0108767307035623 . PMID   18156675.
  3. Laskowski RA, Hutchinson EG, Michie AD, Wallace AC, Jones ML, Thornton JM (December 1997). "PDBsum: a Web-based database of summaries and analyses of all PDB structures". Trends Biochem. Sci. 22 (12): 488–90. doi:10.1016/S0968-0004(97)01140-7. PMID   9433130.