ProtCID

Last updated
ProtCID
Database.png
Content
DescriptionSimilar interactions of homologous proteins in multiple crystal forms
Contact
Research center Fox Chase Cancer Center
Laboratory Institute for Cancer Research
Authors Qifang Xu, Roland Dunbrack
Primary citationXu & Dunbrack (2011) [1]
Release date2010
Access
Website http://dunbrack2.fccc.edu/protcid
Example of cluster of similar interfaces of homologous proteins identified by ProtCID -- similar homodimers of ERBB kinases (EGFR, ERBB2, ERBB4) associated with kinase activation. Each monomer is colored from blue to red from N to C terminus. ProtCID provides PyMol scripts for each cluster to produce similar images. Activation of ERBB kinases.png
Example of cluster of similar interfaces of homologous proteins identified by ProtCID -- similar homodimers of ERBB kinases (EGFR, ERBB2, ERBB4) associated with kinase activation. Each monomer is colored from blue to red from N to C terminus. ProtCID provides PyMol scripts for each cluster to produce similar images.

The Protein Common Interface Database (ProtCID) is a database of similar protein-protein interfaces in crystal structures of homologous proteins. [1] [5]

Contents

Its main goal is to identify and cluster homodimeric and heterodimeric interfaces observed in multiple crystal forms of homologous proteins. Such interfaces, especially of non-identical proteins or protein complexes, have been associated with biologically relevant interactions. [6]

A common interface in ProtCID indicates chain-chain or domain-domain interactions that occur in different crystal forms. All protein sequences of known structure in the Protein Data Bank (PDB) [7] are assigned a ”Pfam chain architecture”, which denotes the ordered Pfam [8] assignments for that sequence, e.g. (Pkinase) or (Cyclin_N)_(Cyclin_C). Homodimeric interfaces in all crystals that contain particular domain or chain architectures are compared, regardless of whether there are other protein types in the crystals. All interfaces between two different Pfam domains or Pfam architectures in all PDB entries that contain them are also compared (e.g., (Pkinase) and (Cyclin_N)_(Cyclin_C) ). For both homodimers and heterodimers, the interfaces are clustered into common interfaces based on a similarity score.

ProtCID reports the number of crystal forms that contain a common interface, the number of PDB entries, the number of PDB and PISA [9] biological assembly annotations that contain the same interface, the average surface area, and the minimum sequence identity of proteins that contain the interface. ProtCID provides an independent check on publicly available annotations of biological interactions for PDB entries.

ProtCID also contains interface clusters between protein domains and peptides, nucleic acids, and ligands.

See also

Related Research Articles

Protein quaternary structure Number and arrangement of multiple folded protein subunits in a multi-subunit complex

Protein quaternary structure is the fourth classification level of protein structure. Protein quaternary structure refers to the structure of proteins which are themselves composed of two or more smaller protein chains. Protein quaternary structure describes the number and arrangement of multiple folded protein subunits in a multi-subunit complex. It includes organizations from simple dimers to large homooligomers and complexes with defined or variable numbers of subunits. In contrast to the first three levels of protein structure, not all proteins will have a quaternary structure since some proteins function as single units. Protein quaternary structure can also refer to biomolecular complexes of proteins with nucleic acids and other cofactors.

Structural bioinformatics Bioinformatics subfield

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

UniProt Database of protein sequences and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, United States.

Nucleoprotein

Nucleoproteins are proteins conjugated with nucleic acids. Typical nucleoproteins include ribosomes, nucleosomes and viral nucleocapsid proteins.

Pfam

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 34.0, was released in March 2021 and contains 19,179 families.

Ku (protein)

Ku is a dimeric protein complex that binds to DNA double-strand break ends and is required for the non-homologous end joining (NHEJ) pathway of DNA repair. Ku is evolutionarily conserved from bacteria to humans. The ancestral bacterial Ku is a homodimer. Eukaryotic Ku is a heterodimer of two polypeptides, Ku70 (XRCC6) and Ku80 (XRCC5), so named because the molecular weight of the human Ku proteins is around 70 kDa and 80 kDa. The two Ku subunits form a basket-shaped structure that threads onto the DNA end. Once bound, Ku can slide down the DNA strand, allowing more Ku molecules to thread onto the end. In higher eukaryotes, Ku forms a complex with the DNA-dependent protein kinase catalytic subunit (DNA-PKcs) to form the full DNA-dependent protein kinase, DNA-PK. Ku is thought to function as a molecular scaffold to which other proteins involved in NHEJ can bind, orienting the double-strand break for ligation.

Cyclin-dependent kinase 2

Cyclin-dependent kinase 2, also known as cell division protein kinase 2, or Cdk2, is an enzyme that in humans is encoded by the CDK2 gene. The protein encoded by this gene is a member of the cyclin-dependent kinase family of Ser/Thr protein kinases. This protein kinase is highly similar to the gene products of S. cerevisiae cdc28, and S. pombe cdc2, also known as Cdk1 in humans. It is a catalytic subunit of the cyclin-dependent kinase complex, whose activity is restricted to the G1-S phase of the cell cycle, where cells make proteins necessary for mitosis and replicate their DNA. This protein associates with and is regulated by the regulatory subunits of the complex including cyclin E or A. Cyclin E binds G1 phase Cdk2, which is required for the transition from G1 to S phase while binding with Cyclin A is required to progress through the S phase. Its activity is also regulated by phosphorylation. Multiple alternatively spliced variants and multiple transcription initiation sites of this gene have been reported. The role of this protein in G1-S transition has been recently questioned as cells lacking Cdk2 are reported to have no problem during this transition.

Cyclin-dependent kinase 6

Cell division protein kinase 6 (CDK6) is an enzyme encoded by the CDK6 gene. It is regulated by cyclins, more specifically by Cyclin D proteins and Cyclin-dependent kinase inhibitor proteins. The protein encoded by this gene is a member of the cyclin-dependent kinase, (CDK) family, which includes CDK4. CDK family members are highly similar to the gene products of Saccharomyces cerevisiae cdc28, and Schizosaccharomyces pombe cdc2, and are known to be important regulators of cell cycle progression in the point of regulation named R or restriction point.

Protein domain Conserved part of a protein

A protein domain is a region of the protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains. One domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

GAK (protein)

Cyclin G-associated kinase (GAK) is a serine/threonine kinase that in humans is encoded by the GAK gene.

Ribosomal particles are denoted according to their sedimentation coefficients in Svedberg units. The 60S subunit is the large subunit of eukaryotic 80S ribosomes. It is structurally and functionally related to the 50S subunit of 70S prokaryotic ribosomes. However, the 60S subunit is much larger than the prokaryotic 50S subunit and contains many additional protein segments, as well as ribosomal RNA expansion segments.

Cyclin-dependent kinase 1 Mammalian protein found in Homo sapiens

Cyclin-dependent kinase 1 also known as CDK1 or cell division cycle protein 2 homolog is a highly conserved protein that functions as a serine/threonine kinase, and is a key player in cell cycle regulation. It has been highly studied in the budding yeast S. cerevisiae, and the fission yeast S. pombe, where it is encoded by genes cdc28 and cdc2, respectively. In humans, Cdk1 is encoded by the CDC2 gene. With its cyclin partners, Cdk1 forms complexes that phosphorylate a variety of target substrates ; phosphorylation of these proteins leads to cell cycle progression.

POU2F1

POU domain, class 2, transcription factor 1 is a protein that in humans is encoded by the POU2F1 gene.

Two-component regulatory system

In the field of molecular biology, a two-component regulatory system serves as a basic stimulus-response coupling mechanism to allow organisms to sense and respond to changes in many different environmental conditions. Two-component systems typically consist of a membrane-bound histidine kinase that senses a specific environmental stimulus and a corresponding response regulator that mediates the cellular response, mostly through differential expression of target genes. Although two-component signaling systems are found in all domains of life, they are most common by far in bacteria, particularly in Gram-negative and cyanobacteria; both histidine kinases and response regulators are among the largest gene families in bacteria. They are much less common in archaea and eukaryotes; although they do appear in yeasts, filamentous fungi, and slime molds, and are common in plants, two-component systems have been described as "conspicuously absent" from animals.

Cyclin K

Cyclin-K is a protein that in humans is encoded by the CCNK gene.

In biology, a protein structure database is a database that is modeled around the various experimentally determined protein structures. The aim of most protein structure databases is to organize and annotate the protein structures, providing the biological community access to the experimental data in a useful way. Data included in protein structure databases often includes three-dimensional coordinates as well as experimental information, such as unit cell dimensions and angles for x-ray crystallography determined structures. Though most instances, in this case either proteins or a specific structure determinations of a protein, also contain sequence information and some databases even provide means for performing sequence based queries, the primary attribute of a structure database is structural information, whereas sequence databases focus on sequence information, and contain no structural information for the majority of entries. Protein structure databases are critical for many efforts in computational biology such as structure based drug design, both in developing the computational methods used and in providing a large experimental dataset used by some methods to provide insights about the function of a protein.

The Walker A and Walker B motifs are protein sequence motifs, known to have highly conserved three-dimensional structures. These were first reported in ATP-binding proteins by Walker and co-workers in 1982.

PDBsum is a database that provides an overview of the contents of each 3D macromolecular structure deposited in the Protein Data Bank. The original version of the database was developed around 1995 by Roman Laskowski and collaborators at University College London. As of 2014, PDBsum is maintained by Laskowski and collaborators in the laboratory of Janet Thornton at the European Bioinformatics Institute (EBI).

Autophosphorylation

Autophosphorylation is a type of post-translational modification of proteins. It is generally defined as the phosphorylation of the kinase by itself. In eukaryotes, this process occurs by the addition of a phosphate group to serine, threonine or tyrosine residues within protein kinases, normally to regulate the catalytic activity. Autophosphorylation may occur when a kinases' own active site catalyzes the phosphorylation reaction, or when another kinase of the same type provides the active site that carries out the chemistry. The latter often occurs when kinase molecules dimerize. In general, the phosphate groups introduced are gamma phosphates from nucleoside triphosphates, most commonly ATP.

In molecular biology, MobiDB is a curated biological database designed to offer a centralized resource for annotations of intrinsic protein disorder. Protein disorder is a structural feature characterizing a large number of proteins with prominent members known as intrinsically unstructured proteins. The database features three levels of annotation: manually curated, indirect and predicted. By combining different data sources of protein disorder into a consensus annotation, MobiDB aims at giving the best possible picture of the "disorder landscape" of a given protein of interest.

References

  1. 1 2 Xu, Q.; Dunbrack, R. L. (2010). "The protein common interface database (ProtCID)—a comprehensive database of interactions of homologous proteins in multiple crystal forms". Nucleic Acids Research. 39 (Database issue): D761–70. doi:10.1093/nar/gkq1059. PMC   3013667 . PMID   21036862.
  2. Zhang, X.; Gureasko, J.; Shen, K.; Cole, P. A.; Kuriyan, J. (2006). "An Allosteric Mechanism for Activation of the Kinase Domain of Epidermal Growth Factor Receptor". Cell. 125 (6): 1137–1149. doi: 10.1016/j.cell.2006.05.013 . PMID   16777603.
  3. Aertgeerts, K.; Skene, R.; Yano, J.; Sang, B. -C.; Zou, H.; Snell, G.; Jennings, A.; Iwamoto, K.; Habuka, N.; Hirokawa, A.; Ishikawa, T.; Tanaka, T.; Miki, H.; Ohta, Y.; Sogabe, S. (2011). "Structural Analysis of the Mechanism of Inhibition and Allosteric Activation of the Kinase Domain of HER2 Protein". Journal of Biological Chemistry. 286 (21): 18756–18765. doi: 10.1074/jbc.M110.206193 . PMC   3099692 . PMID   21454582.
  4. Qiu, C.; Tarrant, M. K.; Choi, S. H.; Sathyamurthy, A.; Bose, R.; Banjade, S.; Pal, A.; Bornmann, W. G.; Lemmon, M. A.; Cole, P. A.; Leahy, D. J. (2008). "Mechanism of Activation and Inhibition of the HER4/ErbB4 Kinase". Structure. 16 (3): 460–467. doi:10.1016/j.str.2007.12.016. PMC   2858219 . PMID   18334220.
  5. Xu, Q; Dunbrack, RL (5 February 2020). "ProtCID: a data resource for structural information on protein interactions". Nature Communications. 11 (1): 711. Bibcode:2020NatCo..11..711X. doi:10.1038/s41467-020-14301-4. PMC   7002494 . PMID   32024829.
  6. Xu, Qifang; Canutescu, Adrian A.; Wang, Guoli; Shapovalov, Maxim; Obradovic, Zoran; Dunbrack, Roland L. (2008). "Statistical Analysis of Interface Similarity in Crystals of Homologous Proteins". Journal of Molecular Biology. 381 (2): 487–507. doi:10.1016/j.jmb.2008.06.002. PMC   2573399 . PMID   18599072.
  7. Berman, H. M.; Battistuz, T.; Bhat, T. N.; Bluhm, W. F.; Bourne, P. E.; Burkhardt, K.; Feng, Z.; Gilliland, G. L.; Iype, L.; Jain, S.; Fagan, P.; Marvin, J.; Padilla, D.; Ravichandran, V.; Schneider, B.; Thanki, N.; Weissig, H.; Westbrook, J. D.; Zardecki, C. (2002). "The Protein Data Bank". Acta Crystallographica Section D. 58 (Pt 6 No 1): 899–907. doi: 10.1107/S0907444902003451 . PMID   12037327.
  8. Punta, M.; Coggill, P. C.; Eberhardt, R. Y.; Mistry, J.; Tate, J.; Boursnell, C.; Pang, N.; Forslund, K.; Ceric, G.; Clements, J.; Heger, A.; Holm, L.; Sonnhammer, E. L. L.; Eddy, S. R.; Bateman, A.; Finn, R. D. (2011). "The Pfam protein families database". Nucleic Acids Research. 40 (Database issue): D290–D301. doi:10.1093/nar/gkr1065. PMC   3245129 . PMID   22127870.
  9. Krissinel, E.; Henrick, K. (2007). "Inference of Macromolecular Assemblies from Crystalline State". Journal of Molecular Biology. 372 (3): 774–797. doi:10.1016/j.jmb.2007.05.022. PMID   17681537.