PDBsum

Last updated
PDBSum
Pdbsum logo.gif
Content
DescriptionOverview of the structures contained within the Protein Data Bank.
Data types
captured
Protein structures
Organisms all
Contact
Research center European Bioinformatics Institute (EBI)
AuthorsRoman Laskowski & al. (1997)
Primary citation PMID   9433130
Access
Website www.ebi.ac.uk/pdbsum/
Miscellaneous
Bookmarkable
entities
yes

PDBsum is a database that provides an overview of the contents of each 3D macromolecular structure deposited in the Protein Data Bank (PDB). [1] [2] [3] [4]

Contents

Database

The original version of the database was developed around 1995 by Roman Laskowski and collaborators at University College London. [5] As of 2014, PDBsum is maintained by Laskowski and collaborators in the laboratory of Janet Thornton at the European Bioinformatics Institute (EBI).

Each structure in the PDBsum database includes an image of structure (main view, Bottom view and right view), molecular components contained in the complex(structure), enzyme reaction diagram if appropriate, Gene ontology functional assignments, a 1D sequence annotated by Pfam and InterPro domain assignments, description of bound molecules and graphic showing interactions between protein and secondary structure, schematic diagrams of protein–protein interactions, analysis of clefts contained within the structure and links to external databases. [6] The RasMol and Jmol molecular graphics software are used to provide a 3D view of molecules and their interactions within PDBsum. [5]

Since the release of the 1000 Genomes Project in October 2012, all single amino acid variants identified by the project have been mapped to the corresponding protein sequences in the Protein Data Bank. These variants are also displayed within PDBsum, cross-referenced to the relevant UniProt identifier. [6] PDBsum contains a number of protein structures that may be of interest in structure-based drug design. One branch of PDBsum, known as DrugPort, focuses on these models and is linked with the DrugBank drug target database. [6] [7]

See also

Related Research Articles

The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules such as proteins and nucleic acids, which is overseen by the Worldwide Protein Data Bank (wwPDB). These structural data are obtained and deposited by biologists and biochemists worldwide through the use of experimental methodologies such as X-ray crystallography, NMR spectroscopy, and, increasingly, cryo-electron microscopy. All submitted data are reviewed by expert biocurators and, once approved, are made freely available on the Internet under the CC0 Public Domain Dedication. Global access to the data is provided by the websites of the wwPDB member organisations.

<span class="mw-page-title-main">Biological database</span>

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

<span class="mw-page-title-main">Structural bioinformatics</span> Bioinformatics subfield

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.

In academia, computational immunology is a field of science that encompasses high-throughput genomic and bioinformatics approaches to immunology. The field's main aim is to convert immunological data into computational problems, solve these problems using mathematical and computational approaches and then convert these results into immunologically meaningful interpretations.

The DrugBank database is a comprehensive, freely accessible, online database containing information on drugs and drug targets created and maintained by the University of Alberta and The Metabolomics Innovation Centre located in Alberta, Canada. As both a bioinformatics and a cheminformatics resource, DrugBank combines detailed drug data with comprehensive drug target information. DrugBank has used content from Wikipedia; Wikipedia also often links to Drugbank, posing potential circular reporting issues.

<span class="mw-page-title-main">Pfam</span> Database of protein families

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. Last version of Pfam, 36.0, was released in September 2023 and contains 20,795 families. It is currently provided through InterPro database.

<span class="mw-page-title-main">KEGG</span> Collection of bioinformatics databases

KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.

The Database of Macromolecular Motions is a bioinformatics database and software-as-a-service tool that attempts to categorize macromolecular motions, sometimes also known as conformational change. It was originally developed by Mark B. Gerstein, Werner Krebs, and Nat Echols in the Molecular Biophysics & Biochemistry Department at Yale University.

<span class="mw-page-title-main">Protein domain</span> Self-stable region of a proteins chain that folds independently from the rest

In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

<span class="mw-page-title-main">Helen M. Berman</span> American chemist

Helen Miriam Berman is a Board of Governors Professor of Chemistry and Chemical Biology at Rutgers University and a former director of the RCSB Protein Data Bank. A structural biologist, her work includes structural analysis of protein-nucleic acid complexes, and the role of water in molecular interactions. She is also the founder and director of the Nucleic Acid Database, and led the Protein Structure Initiative Structural Genomics Knowledgebase.

In biology, a protein structure database is a database that is modeled around the various experimentally determined protein structures. The aim of most protein structure databases is to organize and annotate the protein structures, providing the biological community access to the experimental data in a useful way. Data included in protein structure databases often includes three-dimensional coordinates as well as experimental information, such as unit cell dimensions and angles for x-ray crystallography determined structures. Though most instances, in this case either proteins or a specific structure determinations of a protein, also contain sequence information and some databases even provide means for performing sequence based queries, the primary attribute of a structure database is structural information, whereas sequence databases focus on sequence information, and contain no structural information for the majority of entries. Protein structure databases are critical for many efforts in computational biology such as structure based drug design, both in developing the computational methods used and in providing a large experimental dataset used by some methods to provide insights about the function of a protein.

Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.

SWISS-MODEL is a structural bioinformatics web-server dedicated to homology modeling of 3D protein structures. Homology modeling is currently the most accurate method to generate reliable three-dimensional protein structure models and is routinely used in many practical applications. Homology modelling methods make use of experimental protein structures ("templates") to build models for evolutionary related proteins ("targets").

<span class="mw-page-title-main">Structure validation</span> Process of evaluating 3-dimensional atomic models of biomacromolecules

Macromolecular structure validation is the process of evaluating reliability for 3-dimensional atomic models of large biological molecules such as proteins and nucleic acids. These models, which provide 3D coordinates for each atom in the molecule, come from structural biology experiments such as x-ray crystallography or nuclear magnetic resonance (NMR). The validation has three aspects: 1) checking on the validity of the thousands to millions of measurements in the experiment; 2) checking how consistent the atomic model is with those experimental data; and 3) checking consistency of the model with known physical and chemical properties.

References

  1. Laskowski RA, Hutchinson EG, Michie AD, Wallace AC, Jones ML, Thornton JM (Dec 1997). "PDBsum: a Web-based database of summaries and analyses of all PDB structures". Trends in Biochemical Sciences. 22 (12): 488–90. doi:10.1016/S0968-0004(97)01140-7. PMID   9433130.
  2. Laskowski RA (Jan 2001). "PDBsum: summaries and analyses of PDB structures". Nucleic Acids Research. 29 (1): 221–2. doi:10.1093/nar/29.1.221. PMC   29784 . PMID   11125097.
  3. Laskowski RA, Chistyakov VV, Thornton JM (Jan 2005). "PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids". Nucleic Acids Research. 33 (Database issue): D266-8. doi:10.1093/nar/gki001. PMC   539955 . PMID   15608193.
  4. Laskowski RA (Jan 2009). "PDBsum new things". Nucleic Acids Research. 37 (Database issue): D355-9. doi:10.1093/nar/gkn860. PMC   2686501 . PMID   18996896.
  5. 1 2 "PDBsum documentation: About PDBsum". European Molecular Biology Laboratory – The European Bioinformatics Institute. Retrieved 9 September 2014.
  6. 1 2 3 de Beer TA, Berka K, Thornton JM, Laskowski RA (Jan 2014). "PDBsum additions". Nucleic Acids Research. 42 (Database issue): D292-6. doi:10.1093/nar/gkt940. PMC   3965036 . PMID   24153109.
  7. Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y, Eisner R, Guo AC, Wishart DS (Jan 2011). "DrugBank 3.0: a comprehensive resource for 'omics' research on drugs". Nucleic Acids Research. 39 (Database issue): D1035-41. doi:10.1093/nar/gkq1126. PMC   3013709 . PMID   21059682.