PDBsum

PDBSum
Content
Description	Overview of the structures contained within the Protein Data Bank.
Data types; captured	Protein structures
Organisms	all
Contact
Research center	European Bioinformatics Institute (EBI)
Authors	Roman Laskowski & al. (1997)
Primary citation	PMID 9433130
Access
Website	www.ebi.ac.uk/pdbsum/
Miscellaneous
Bookmarkable; entities	yes

Last updated August 17, 2024

PDBsum is a database that provides an overview of the contents of each 3D macromolecular structure deposited in the Protein Data Bank (PDB).^[1]^[2]^[3]^[4]

Database

The original version of the database was developed around 1995 by Roman Laskowski and collaborators at University College London.^[5] As of 2014, PDBsum is maintained by Laskowski and collaborators in the laboratory of Janet Thornton at the European Bioinformatics Institute (EBI).

Each structure in the PDBsum database includes an image of structure (main view, Bottom view and right view), molecular components contained in the complex(structure), enzyme reaction diagram if appropriate, Gene ontology functional assignments, a 1D sequence annotated by Pfam and InterPro domain assignments, description of bound molecules and graphic showing interactions between protein and secondary structure, schematic diagrams of protein–protein interactions, analysis of clefts contained within the structure and links to external databases.^[6] The RasMol and Jmol molecular graphics software are used to provide a 3D view of molecules and their interactions within PDBsum.^[5]

Since the release of the 1000 Genomes Project in October 2012, all single amino acid variants identified by the project have been mapped to the corresponding protein sequences in the Protein Data Bank. These variants are also displayed within PDBsum, cross-referenced to the relevant UniProt identifier.^[6] PDBsum contains a number of protein structures that may be of interest in structure-based drug design. One branch of PDBsum, known as DrugPort, focuses on these models and is linked with the DrugBank drug target database.^[6]^[7]

Related Research Articles

The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules such as proteins and nucleic acids, which is overseen by the Worldwide Protein Data Bank (wwPDB). These structural data are obtained and deposited by biologists and biochemists worldwide through the use of experimental methodologies such as X-ray crystallography, NMR spectroscopy, and, increasingly, cryo-electron microscopy. All submitted data are reviewed by expert biocurators and, once approved, are made freely available on the Internet under the CC0 Public Domain Dedication. Global access to the data is provided by the websites of the wwPDB member organisations.

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.

In academia, computational immunology is a field of science that encompasses high-throughput genomic and bioinformatics approaches to immunology. The field's main aim is to convert immunological data into computational problems, solve these problems using mathematical and computational approaches and then convert these results into immunologically meaningful interpretations.

The DrugBank database is a comprehensive, freely accessible, online database containing information on drugs and drug targets created and maintained by the University of Alberta and The Metabolomics Innovation Centre located in Alberta, Canada. As both a bioinformatics and a cheminformatics resource, DrugBank combines detailed drug data with comprehensive drug target information. DrugBank has used content from Wikipedia; Wikipedia also often links to Drugbank, posing potential circular reporting issues.

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. Last version of Pfam, 36.0, was released in September 2023 and contains 20,795 families. It is currently provided through InterPro database.

KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.

The Database of Macromolecular Motions is a bioinformatics database and software-as-a-service tool that attempts to categorize macromolecular motions, sometimes also known as conformational change. It was originally developed by Mark B. Gerstein, Werner Krebs, and Nat Echols in the Molecular Biophysics & Biochemistry Department at Yale University.

In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

<span class="mw-page-title-main">Helen M. Berman</span> American chemist

Helen Miriam Berman is a Board of Governors Professor of Chemistry and Chemical Biology at Rutgers University and a former director of the RCSB Protein Data Bank. A structural biologist, her work includes structural analysis of protein-nucleic acid complexes, and the role of water in molecular interactions. She is also the founder and director of the Nucleic Acid Database, and led the Protein Structure Initiative Structural Genomics Knowledgebase.

In biology, a protein structure database is a database that is modeled around the various experimentally determined protein structures. The aim of most protein structure databases is to organize and annotate the protein structures, providing the biological community access to the experimental data in a useful way. Data included in protein structure databases often includes three-dimensional coordinates as well as experimental information, such as unit cell dimensions and angles for x-ray crystallography determined structures. Though most instances, in this case either proteins or a specific structure determinations of a protein, also contain sequence information and some databases even provide means for performing sequence based queries, the primary attribute of a structure database is structural information, whereas sequence databases focus on sequence information, and contain no structural information for the majority of entries. Protein structure databases are critical for many efforts in computational biology such as structure based drug design, both in developing the computational methods used and in providing a large experimental dataset used by some methods to provide insights about the function of a protein.

Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.

SWISS-MODEL is a structural bioinformatics web-server dedicated to homology modeling of 3D protein structures. Homology modeling is currently the most accurate method to generate reliable three-dimensional protein structure models and is routinely used in many practical applications. Homology modelling methods make use of experimental protein structures ("templates") to build models for evolutionary related proteins ("targets").

Macromolecular structure validation is the process of evaluating reliability for 3-dimensional atomic models of large biological molecules such as proteins and nucleic acids. These models, which provide 3D coordinates for each atom in the molecule, come from structural biology experiments such as x-ray crystallography or nuclear magnetic resonance (NMR). The validation has three aspects: 1) checking on the validity of the thousands to millions of measurements in the experiment; 2) checking how consistent the atomic model is with those experimental data; and 3) checking consistency of the model with known physical and chemical properties.

References

↑ Laskowski RA, Hutchinson EG, Michie AD, Wallace AC, Jones ML, Thornton JM (Dec 1997). "PDBsum: a Web-based database of summaries and analyses of all PDB structures". Trends in Biochemical Sciences. 22 (12): 488–90. doi:10.1016/S0968-0004(97)01140-7. PMID 9433130.
↑ Laskowski RA (Jan 2001). "PDBsum: summaries and analyses of PDB structures". Nucleic Acids Research. 29 (1): 221–2. doi:10.1093/nar/29.1.221. PMC 29784 . PMID 11125097.
↑ Laskowski RA, Chistyakov VV, Thornton JM (Jan 2005). "PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids". Nucleic Acids Research. 33 (Database issue): D266-8. doi:10.1093/nar/gki001. PMC 539955 . PMID 15608193.
↑ Laskowski RA (Jan 2009). "PDBsum new things". Nucleic Acids Research. 37 (Database issue): D355-9. doi:10.1093/nar/gkn860. PMC 2686501 . PMID 18996896.
1 2 "PDBsum documentation: About PDBsum". European Molecular Biology Laboratory – The European Bioinformatics Institute. Retrieved 9 September 2014.
1 2 3 de Beer TA, Berka K, Thornton JM, Laskowski RA (Jan 2014). "PDBsum additions". Nucleic Acids Research. 42 (Database issue): D292-6. doi:10.1093/nar/gkt940. PMC 3965036 . PMID 24153109.
↑ Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y, Eisner R, Guo AC, Wishart DS (Jan 2011). "DrugBank 3.0: a comprehensive resource for 'omics' research on drugs". Nucleic Acids Research. 39 (Database issue): D1035-41. doi:10.1093/nar/gkq1126. PMC 3013709 . PMID 21059682.

External links

Laskowski RA, Thornton JM. "PDBsum home page". European Bioinformatics Institute.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[pmid9433130-1] Laskowski RA, Hutchinson EG, Michie AD, Wallace AC, Jones ML, Thornton JM (Dec 1997). "PDBsum: a Web-based database of summaries and analyses of all PDB structures". Trends in Biochemical Sciences. 22 (12): 488–90. doi:10.1016/S0968-0004(97)01140-7. PMID 9433130.

[pmid11125097-2] Laskowski RA (Jan 2001). "PDBsum: summaries and analyses of PDB structures". Nucleic Acids Research. 29 (1): 221–2. doi:10.1093/nar/29.1.221. PMC 29784 . PMID 11125097.

[pmid15608193-3] Laskowski RA, Chistyakov VV, Thornton JM (Jan 2005). "PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids". Nucleic Acids Research. 33 (Database issue): D266-8. doi:10.1093/nar/gki001. PMC 539955 . PMID 15608193.

[pmid18996896-4] Laskowski RA (Jan 2009). "PDBsum new things". Nucleic Acids Research. 37 (Database issue): D355-9. doi:10.1093/nar/gkn860. PMC 2686501 . PMID 18996896.

[about-pdbsum-5] 1 2 "PDBsum documentation: About PDBsum". European Molecular Biology Laboratory – The European Bioinformatics Institute. Retrieved 9 September 2014.

[pmid24153109-6] 1 2 3 de Beer TA, Berka K, Thornton JM, Laskowski RA (Jan 2014). "PDBsum additions". Nucleic Acids Research. 42 (Database issue): D292-6. doi:10.1093/nar/gkt940. PMC 3965036 . PMID 24153109.

[drugbank-7] Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y, Eisner R, Guo AC, Wishart DS (Jan 2011). "DrugBank 3.0: a comprehensive resource for 'omics' research on drugs". Nucleic Acids Research. 39 (Database issue): D1035-41. doi:10.1093/nar/gkq1126. PMC 3013709 . PMID 21059682.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

PDBsum

Contents

Database

See also

Related Research Articles

References

External links

Content

Description	Overview of the structures contained within the Protein Data Bank.
Data types captured	Protein structures
Organisms	all
Contact
Research center	European Bioinformatics Institute (EBI)
Authors	Roman Laskowski & al. (1997)
Primary citation	PMID 9433130
Access
Website	www.ebi.ac.uk/pdbsum/
Miscellaneous
Bookmarkable entities	yes