ProtCID

Last updated
ProtCID
Database.png
Content
DescriptionSimilar interactions of homologous proteins in multiple crystal forms
Contact
Research center Fox Chase Cancer Center
LaboratoryInstitute for Cancer Research
AuthorsQifang Xu, Roland Dunbrack
Primary citationXu & Dunbrack (2011) [1]
Release date2010
Access
Website http://dunbrack2.fccc.edu/protcid
Example of cluster of similar interfaces of homologous proteins identified by ProtCID -- similar homodimers of ERBB kinases (EGFR, ERBB2, ERBB4) associated with kinase activation. Each monomer is colored from blue to red from N to C terminus. ProtCID provides PyMol scripts for each cluster to produce similar images. Activation of ERBB kinases.png
Example of cluster of similar interfaces of homologous proteins identified by ProtCID -- similar homodimers of ERBB kinases (EGFR, ERBB2, ERBB4) associated with kinase activation. Each monomer is colored from blue to red from N to C terminus. ProtCID provides PyMol scripts for each cluster to produce similar images.

The Protein Common Interface Database (ProtCID) is a database of similar protein-protein interfaces in crystal structures of homologous proteins. [1] [5]

Contents

Its main goal is to identify and cluster homodimeric and heterodimeric interfaces observed in multiple crystal forms of homologous proteins. Such interfaces, especially of non-identical proteins or protein complexes, have been associated with biologically relevant interactions. [6]

A common interface in ProtCID indicates chain-chain or domain-domain interactions that occur in different crystal forms. All protein sequences of known structure in the Protein Data Bank (PDB) [7] are assigned a ”Pfam chain architecture”, which denotes the ordered Pfam [8] assignments for that sequence, e.g. (Pkinase) or (Cyclin_N)_(Cyclin_C). Homodimeric interfaces in all crystals that contain particular domain or chain architectures are compared, regardless of whether there are other protein types in the crystals. All interfaces between two different Pfam domains or Pfam architectures in all PDB entries that contain them are also compared (e.g., (Pkinase) and (Cyclin_N)_(Cyclin_C) ). For both homodimers and heterodimers, the interfaces are clustered into common interfaces based on a similarity score.

ProtCID reports the number of crystal forms that contain a common interface, the number of PDB entries, the number of PDB and PISA [9] biological assembly annotations that contain the same interface, the average surface area, and the minimum sequence identity of proteins that contain the interface. ProtCID provides an independent check on publicly available annotations of biological interactions for PDB entries.

ProtCID also contains interface clusters between protein domains and peptides, nucleic acids, and ligands.

See also

Related Research Articles

<span class="mw-page-title-main">Protein quaternary structure</span> Number and arrangement of multiple folded protein subunits in a multi-subunit complex

Protein quaternary structure is the fourth classification level of protein structure. Protein quaternary structure refers to the structure of proteins which are themselves composed of two or more smaller protein chains. Protein quaternary structure describes the number and arrangement of multiple folded protein subunits in a multi-subunit complex. It includes organizations from simple dimers to large homooligomers and complexes with defined or variable numbers of subunits. In contrast to the first three levels of protein structure, not all proteins will have a quaternary structure since some proteins function as single units. Protein quaternary structure can also refer to biomolecular complexes of proteins with nucleic acids and other cofactors.

<span class="mw-page-title-main">Structural bioinformatics</span> Bioinformatics subfield

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

<span class="mw-page-title-main">Structural Classification of Proteins database</span> Biological database of proteins

The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins. Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are assumed to have only a very distant common ancestor. Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor.

<span class="mw-page-title-main">UniProt</span> Database of protein sequences and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, USA.

<span class="mw-page-title-main">Nucleoprotein</span> Type of protein

Nucleoproteins are proteins conjugated with nucleic acids. Typical nucleoproteins include ribosomes, nucleosomes and viral nucleocapsid proteins.

<span class="mw-page-title-main">Pfam</span> Database of protein families

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. Last version of Pfam, 36.0, was released in September 2023 and contains 20,795 families. It is currently provided through InterPro database.

<span class="mw-page-title-main">Ku (protein)</span>

Ku is a dimeric protein complex that binds to DNA double-strand break ends and is required for the non-homologous end joining (NHEJ) pathway of DNA repair. Ku is evolutionarily conserved from bacteria to humans. The ancestral bacterial Ku is a homodimer. Eukaryotic Ku is a heterodimer of two polypeptides, Ku70 (XRCC6) and Ku80 (XRCC5), so named because the molecular weight of the human Ku proteins is around 70 kDa and 80 kDa. The two Ku subunits form a basket-shaped structure that threads onto the DNA end. Once bound, Ku can slide down the DNA strand, allowing more Ku molecules to thread onto the end. In higher eukaryotes, Ku forms a complex with the DNA-dependent protein kinase catalytic subunit (DNA-PKcs) to form the full DNA-dependent protein kinase, DNA-PK. Ku is thought to function as a molecular scaffold to which other proteins involved in NHEJ can bind, orienting the double-strand break for ligation.

<span class="mw-page-title-main">BCR (gene)</span>

The breakpoint cluster region protein (BCR) also known as renal carcinoma antigen NY-REN-26 is a protein that in humans is encoded by the BCR gene. BCR is one of the two genes in the BCR-ABL fusion protein, which is associated with the Philadelphia chromosome. Two transcript variants encoding different isoforms have been found for this gene.

<span class="mw-page-title-main">Protein domain</span> Self-stable region of a proteins chain that folds independently from the rest

In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

<span class="mw-page-title-main">GAK (protein)</span> Protein-coding gene in the species Homo sapiens

Cyclin G-associated kinase (GAK) is a serine/threonine kinase that in humans is encoded by the GAK gene.

<span class="mw-page-title-main">HNRNPK</span> Human protein and coding gene

Heterogeneous nuclear ribonucleoprotein K is a protein that in humans is encoded by the HNRNPK gene. It is found in the cell nucleus that binds to pre-messenger RNA (mRNA) as a component of heterogeneous ribonucleoprotein particles. The simian homolog is known as protein H16. Both proteins bind to single-stranded DNA as well as to RNA and can stimulate the activity of RNA polymerase II, the protein responsible for most gene transcription. The relative affinities of the proteins for DNA and RNA vary with solution conditions and are inversely correlated, so that conditions promoting strong DNA binding result in weak RNA binding.

<span class="mw-page-title-main">Cyclin D2</span> Protein-coding gene in humans

G1/S-specific cyclin-D2 is a protein that in humans is encoded by the CCND2 gene.

<span class="mw-page-title-main">Two-component regulatory system</span> Method of stimulus sensing and response in cells

In molecular biology, a two-component regulatory system serves as a basic stimulus-response coupling mechanism to allow organisms to sense and respond to changes in many different environmental conditions. Two-component systems typically consist of a membrane-bound histidine kinase that senses a specific environmental stimulus, and a corresponding response regulator that mediates the cellular response, mostly through differential expression of target genes. Although two-component signaling systems are found in all domains of life, they are most common by far in bacteria, particularly in Gram-negative and cyanobacteria; both histidine kinases and response regulators are among the largest gene families in bacteria. They are much less common in archaea and eukaryotes; although they do appear in yeasts, filamentous fungi, and slime molds, and are common in plants, two-component systems have been described as "conspicuously absent" from animals.

<span class="mw-page-title-main">Cyclin K</span> Protein-coding gene in the species Homo sapiens

Cyclin-K is a protein that in humans is encoded by the CCNK gene.

In biology, a protein structure database is a database that is modeled around the various experimentally determined protein structures. The aim of most protein structure databases is to organize and annotate the protein structures, providing the biological community access to the experimental data in a useful way. Data included in protein structure databases often includes three-dimensional coordinates as well as experimental information, such as unit cell dimensions and angles for x-ray crystallography determined structures. Though most instances, in this case either proteins or a specific structure determinations of a protein, also contain sequence information and some databases even provide means for performing sequence based queries, the primary attribute of a structure database is structural information, whereas sequence databases focus on sequence information, and contain no structural information for the majority of entries. Protein structure databases are critical for many efforts in computational biology such as structure based drug design, both in developing the computational methods used and in providing a large experimental dataset used by some methods to provide insights about the function of a protein.

<span class="mw-page-title-main">B3 domain</span> DNA binding domain

The B3 DNA binding domain (DBD) is a highly conserved domain found exclusively in transcription factors combined with other domains. It consists of 100-120 residues, includes seven beta strands and two alpha helices that form a DNA-binding pseudobarrel protein fold ; it interacts with the major groove of DNA.

PDBsum is a database that provides an overview of the contents of each 3D macromolecular structure deposited in the Protein Data Bank (PDB).

<span class="mw-page-title-main">KIX domain</span>

In biochemistry, the KIX domain (kinase-inducible domain (KID) interacting domain) or CREB binding domain is a protein domain of the eukaryotic transcriptional coactivators CBP and P300. It serves as a docking site for the formation of heterodimers between the coactivator and specific transcription factors. Structurally, the KIX domain is a globular domain consisting of three α-helices and two short 310-helices.

<span class="mw-page-title-main">Autophosphorylation</span>

Autophosphorylation is a type of post-translational modification of proteins. It is generally defined as the phosphorylation of the kinase by itself. In eukaryotes, this process occurs by the addition of a phosphate group to serine, threonine or tyrosine residues within protein kinases, normally to regulate the catalytic activity. Autophosphorylation may occur when a kinases' own active site catalyzes the phosphorylation reaction, or when another kinase of the same type provides the active site that carries out the chemistry. The latter often occurs when kinase molecules dimerize. In general, the phosphate groups introduced are gamma phosphates from nucleoside triphosphates, most commonly ATP.

In molecular biology, MobiDB is a curated biological database designed to offer a centralized resource for annotations of intrinsic protein disorder. Protein disorder is a structural feature characterizing a large number of proteins with prominent members known as intrinsically unstructured proteins. The database features three levels of annotation: manually curated, indirect and predicted. By combining different data sources of protein disorder into a consensus annotation, MobiDB aims at giving the best possible picture of the "disorder landscape" of a given protein of interest.

References

  1. 1 2 Xu, Q.; Dunbrack, R. L. (2010). "The protein common interface database (ProtCID)—a comprehensive database of interactions of homologous proteins in multiple crystal forms". Nucleic Acids Research. 39 (Database issue): D761–70. doi:10.1093/nar/gkq1059. PMC   3013667 . PMID   21036862.
  2. Zhang, X.; Gureasko, J.; Shen, K.; Cole, P. A.; Kuriyan, J. (2006). "An Allosteric Mechanism for Activation of the Kinase Domain of Epidermal Growth Factor Receptor". Cell. 125 (6): 1137–1149. doi: 10.1016/j.cell.2006.05.013 . PMID   16777603.
  3. Aertgeerts, K.; Skene, R.; Yano, J.; Sang, B. -C.; Zou, H.; Snell, G.; Jennings, A.; Iwamoto, K.; Habuka, N.; Hirokawa, A.; Ishikawa, T.; Tanaka, T.; Miki, H.; Ohta, Y.; Sogabe, S. (2011). "Structural Analysis of the Mechanism of Inhibition and Allosteric Activation of the Kinase Domain of HER2 Protein". Journal of Biological Chemistry. 286 (21): 18756–18765. doi: 10.1074/jbc.M110.206193 . PMC   3099692 . PMID   21454582.
  4. Qiu, C.; Tarrant, M. K.; Choi, S. H.; Sathyamurthy, A.; Bose, R.; Banjade, S.; Pal, A.; Bornmann, W. G.; Lemmon, M. A.; Cole, P. A.; Leahy, D. J. (2008). "Mechanism of Activation and Inhibition of the HER4/ErbB4 Kinase". Structure. 16 (3): 460–467. doi:10.1016/j.str.2007.12.016. PMC   2858219 . PMID   18334220.
  5. Xu, Q; Dunbrack, RL (5 February 2020). "ProtCID: a data resource for structural information on protein interactions". Nature Communications. 11 (1): 711. Bibcode:2020NatCo..11..711X. doi:10.1038/s41467-020-14301-4. PMC   7002494 . PMID   32024829.
  6. Xu, Qifang; Canutescu, Adrian A.; Wang, Guoli; Shapovalov, Maxim; Obradovic, Zoran; Dunbrack, Roland L. (2008). "Statistical Analysis of Interface Similarity in Crystals of Homologous Proteins". Journal of Molecular Biology. 381 (2): 487–507. doi:10.1016/j.jmb.2008.06.002. PMC   2573399 . PMID   18599072.
  7. Berman, H. M.; Battistuz, T.; Bhat, T. N.; Bluhm, W. F.; Bourne, P. E.; Burkhardt, K.; Feng, Z.; Gilliland, G. L.; Iype, L.; Jain, S.; Fagan, P.; Marvin, J.; Padilla, D.; Ravichandran, V.; Schneider, B.; Thanki, N.; Weissig, H.; Westbrook, J. D.; Zardecki, C. (2002). "The Protein Data Bank". Acta Crystallographica Section D. 58 (Pt 6 No 1): 899–907. doi: 10.1107/S0907444902003451 . PMID   12037327.
  8. Punta, M.; Coggill, P. C.; Eberhardt, R. Y.; Mistry, J.; Tate, J.; Boursnell, C.; Pang, N.; Forslund, K.; Ceric, G.; Clements, J.; Heger, A.; Holm, L.; Sonnhammer, E. L. L.; Eddy, S. R.; Bateman, A.; Finn, R. D. (2011). "The Pfam protein families database". Nucleic Acids Research. 40 (Database issue): D290–D301. doi:10.1093/nar/gkr1065. PMC   3245129 . PMID   22127870.
  9. Krissinel, E.; Henrick, K. (2007). "Inference of Macromolecular Assemblies from Crystalline State". Journal of Molecular Biology. 372 (3): 774–797. doi:10.1016/j.jmb.2007.05.022. PMID   17681537.