Human Protein Reference Database

Last updated August 25, 2024

The Human Protein Reference Database (HPRD) is a protein database accessible through the Internet.^[1] It is closely associated with the premier Indian Non-Profit research organisation Institute of Bioinformatics (IOB), Bangalore, India. This database is a collaborative output of IOB and the Pandey Lab of Johns Hopkins University.

Overview

The HPRD is a result of an international collaborative effort between the Institute of Bioinformatics in Bangalore, India and the Pandey lab at Johns Hopkins University in Baltimore, USA. HPRD contains manually curated scientific information pertaining to the biology of most human proteins. Information regarding proteins involved in human diseases is annotated and linked to Online Mendelian Inheritance in Man (OMIM) database. The National Center for Biotechnology Information provides link to HPRD through its human protein databases (e.g. Entrez Gene, RefSeq protein pertaining to genes and proteins.

This resource depicts information on human protein functions including protein–protein interactions, post-translational modifications, enzyme-substrate relationships and disease associations. Protein annotation information that is catalogued was derived through manual curation using published literature by expert biologists and through bioinformatics analyses of the protein sequence. The protein–protein interaction and subcellular localization data from HPRD have been used to develop a human protein interaction network.^[2]

Highlights of HPRD as follows:

From 10,000 protein–protein interactions (PPIs) annotated for 3,000 proteins in 2003, HPRD has grown to over 36,500 unique PPIs annotated for 25,000 proteins including 6,360 isoforms by the end of 2007.^[3]
More than 50% of molecules annotated in HPRD have at least one PPI and 10% have more than 10 PPIs.
Experiments for PPIs are broadly grouped into three categories namely in vitro, in vivo and yeast two hybrid (Y2H). Sixty percent of PPIs annotated in HPRD are supported by a single experiment whereas 26% of them are found to have two of the three experimental methods annotated.
HPRD contains 18,000 manually curated PTMs data belonging to 26 different types. Phosphorylation is the leading type of modification of protein contributing to 63% of PTM data annotated in HPRD. Glycosylation, proteolytic cleavage and disulfide bridge events are the next leading contributors of PTM data.
HPRD data is available for download in tab delimited and XML file formats.^[4]

HPRD also integrates data from Human Proteinpedia, a community portal for integrating human protein data. The data from HPRD can be freely accessed and used by academic users while commercial entities are required to obtain a license for use. Human Proteinpedia^[5] content is freely available for anyone to download and use.

PhosphoMotif Finder

PhosphoMotif Finder^[6] contains known kinase/phosphatase substrate as well as binding motifs that are curated from the published literature. It reports the PRESENCE of any literature-derived motif in the query sequence. PhosphoMotif Finder does NOT PREDICT any motifs in the query protein sequence using any algorithm or other computational strategies.

Comparison of protein data

There are other databases that deal with human proteome (e.g. BioGRID, BIND, DIP, HPRD, IntAct, MINT, MIPS, PDZBase and Reactome). Each database has its own style of presenting the data. It is a difficult task for most investigators to compare the voluminous data from these databases in order to conclude strengths and weaknesses of each database. Mathivanan and colleagues ^[7] tried to address this issue while analyzing protein data by asking various questions. This analysis will help biologists to choose among these databases based on their needs.

Related Research Articles

In molecular biology, post-translational modification (PTM) is the covalent process of changing proteins following protein biosynthesis. PTMs may involve enzymes or occur spontaneously. Proteins are created by ribosomes, which translate mRNA into polypeptide chains, which may then change to form the mature protein product. PTMs are important components in cell signalling, as for example when prohormones are converted to hormones.

In molecular biology, an interactome is the whole set of molecular interactions in a particular cell. The term specifically refers to physical interactions among molecules but can also describe sets of indirect interactions among genes.

A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to that sequence. Summaries and aggregate results are provided in standardized format describing the information that would otherwise have required visits to many smaller sites or direct literature searches to compile. Many sequence profiling tools are software portals or gateways that simplify the process of finding information about a query in the large and growing number of bioinformatics databases. The access to these kinds of tools is either web based or locally downloadable executables.

BRENDA is the world's most comprehensive online database for functional, biochemical and molecular biological data on enzymes, metabolites and metabolic pathways. It contains data on the properties, function and significance of all enzymes classified by the Enzyme Commission of the International Union of Biochemistry and Molecular Biology (IUBMB) classified enzymes. As ELIXIR Core Data Resource, BRENDA is considered a data resource of critical importance to the international life sciences research community. The database compiles a representative overview of enzymes and metabolites using current research data from primary scientific literature and thus serves the purpose of facilitating information retrieval for researchers. BRENDA is subject to the terms of the Creative Commons license, is accessible worldwide and can be used free of charge. As one of the digital resources of the Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, BRENDA is part of the integrated biodata infrastructure DSMZ Digital Diversity.

Protein–protein interactions (PPIs) are physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by interactions that include electrostatic forces, hydrogen bonding and the hydrophobic effect. Many are physical contacts with molecular associations between chains that occur in a cell or in a living organism in a specific biomolecular context.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

The Biological General Repository for Interaction Datasets (BioGRID) is a curated biological database of protein-protein interactions, genetic interactions, chemical interactions, and post-translational modifications created in 2003 (originally referred to as simply the General Repository for Interaction Datasets by Mike Tyers, Bobby-Joe Breitkreutz, and Chris Stark at the Lunenfeld-Tanenbaum Research Institute at Mount Sinai Hospital. It strives to provide a comprehensive curated resource for all major model organism species while attempting to remove redundancy to create a single mapping of data. Users of The BioGRID can search for their protein, chemical or publication of interest and retrieve annotation, as well as curated data as reported, by the primary literature and compiled by in house large-scale curation efforts. The BioGRID is hosted in Toronto, Ontario, Canada and Dallas, Texas, United States and is partnered with the Saccharomyces Genome Database, FlyBase, WormBase, PomBase, and the Alliance of Genome Resources. The BioGRID is funded by the NIH and CIHR. BioGRID is an observer member of the International Molecular Exchange Consortium.

<span class="mw-page-title-main">NetPath</span>

NetPath is a manually curated resource of human signal transduction pathways. It is a joint effort between Pandey Lab at the Johns Hopkins University and the Institute of Bioinformatics (IOB), Bangalore, India, and is also worked on by other parties.

Human Proteinpedia, which is closely associated with Institute of Bioinformatics (IOB), Bangalore and Johns Hopkins University, is a portal for sharing and integration of human proteomic data. It allows research laboratories to contribute and maintain protein annotations. Human Protein Reference Database (HPRD) integrates data, that is deposited in Human Proteinpedia along with the existing literature curated information at the context of an individual protein. In essence, researchers can add new data to HPRD by registering to Human Proteinpedia. The data deposited in Human Proteinpedia is freely available for download. Emphasizing the importance of proteomics data disposition to public repositories, Nature Methods recommends Human Proteinpedia in their editorial. More than 70 labs participate in this effort.

DnaJ homolog subfamily B member 11 is a protein that in humans is encoded by the DNAJB11 gene.

40S ribosomal protein S10 is a protein that in humans is encoded by the RPS10 gene.

The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences and their protein products. RefSeq was introduced in 2000. This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes.

The Eukaryotic Linear Motif (ELM) resource is a computational biology resource for investigating short linear motifs (SLiMs) in eukaryotic proteins. It is currently the largest collection of linear motif classes with annotated and experimentally validated linear motif instances.

In molecular biology short linear motifs (SLiMs), linear motifs or minimotifs are short stretches of protein sequence that mediate protein–protein interaction.

The Death Domain database is a secondary database of protein-protein interactions (PPI) of the death domain superfamily. Members of this superfamily are key players in apoptosis, inflammation, necrosis, and immune cell signaling pathways. Negative death domain superfamily-mediated signaling events result in various human diseases which include, cancers, neurodegenerative diseases, and immunological disorders. Creating death domain databases are of particular interest to researchers in the biomedical field as it enables a further understanding of the molecular mechanisms involved in death domain interactions while also providing easy access to tools such as an interaction map that illustrates the protein-protein interaction network and information. There is currently only one database that exclusively looks at death domains but there are other databases and resources that have information on this superfamily. According to PubMed, this database has been cited by seven peer-reviewed articles to date because of its extensive and specific information on the death domains and their PPI summaries.

Solute carrier family 46 member 3 (SLC46A3) is a protein that in humans is encoded by the SLC46A3 gene. Also referred to as FKSG16, the protein belongs to the major facilitator superfamily (MFS) and SLC46A family. Most commonly found in the plasma membrane and endoplasmic reticulum (ER), SLC46A3 is a multi-pass membrane protein with 11 α-helical transmembrane domains. It is mainly involved in the transport of small molecules across the membrane through the substrate translocation pores featured in the MFS domain. The protein is associated with breast and prostate cancer, hepatocellular carcinoma (HCC), papilloma, glioma, obesity, and SARS-CoV. Based on the differential expression of SLC46A3 in antibody-drug conjugate (ADC)-resistant cells and certain cancer cells, current research is focused on the potential of SLC46A3 as a prognostic biomarker and therapeutic target for cancer. While protein abundance is relatively low in humans, high expression has been detected particularly in the liver, small intestine, and kidney.

The human interactome is the set of protein–protein interactions that occur in human cells. The sequencing of reference genomes, in particular the Human Genome Project, has revolutionized human genetics, molecular biology, and clinical medicine. Genome-wide association study results have led to the association of genes with most Mendelian disorders, and over 140 000 germline mutations have been associated with at least one genetic disease. However, it became apparent that inherent to these studies is an emphasis on clinical outcome rather than a comprehensive understanding of human disease; indeed to date the most significant contributions of GWAS have been restricted to the “low-hanging fruit” of direct single mutation disorders, prompting a systems biology approach to genomic analysis. The connection between genotype and phenotype remain elusive, especially in the context of multigenic complex traits and cancer. To assign functional context to genotypic changes, much of recent research efforts have been devoted to the mapping of the networks formed by interactions of cellular and genetic components in humans, as well as how these networks are altered by genetic and somatic disease.

CCDC188 or coiled-coil domain containing protein is a protein that in humans is encoded by the CCDC188 gene.

The Institute of Bioinformatics, often referred to as IOB, is an Indian not-for-profit academic research organization based in Bangalore, India. It is involved in research in the fields of bioinformatics, multi-omics, systems biology and neurological disorders. In 2002, the institute was set up by The Genomics Research Trust and the Johns Hopkins University of Baltimore, Maryland. This organization is recognized as a 'Scientific and Industrial Research Organization' (SIRO) of the Department of Scientific and Industrial Research, Government of India. Renowned Proteomicist Akhilesh Pandey, Professor at Department of Laboratory Medicine and Pathology, Center for Individualized Medicine of Mayo Clinic in Rochester, Minnesota, USA is the Founding and current Director of IOB, and eminent Proteomicist Ravi Sirdeshmukh, Founder President of the 'Proteomic Society of India' is the current Associate Director of IOB.

References

↑ Peri S, et al. (2003). "Development of human protein reference database as an initial platform for approaching systems biology in humans". Genome Research. 13 (10): 2363–71. doi:10.1101/gr.1680803. PMC 403728 . PMID 14525934.
↑ Gandhi T.K.B.; et al. (March 2006). "Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets". Nature Genetics. 38 (3): 285–293. doi:10.1038/ng1747. PMID 16501559. S2CID 1446423.
↑ Mathivanan S.; et al. (2006). "An evaluation of human protein–protein interaction data in the public domain". BMC Bioinformatics. 2006 (7): S19. doi: 10.1186/1471-2105-7-s5-s19 . PMC 1764475 . PMID 17254303.
↑ Mishra G.; et al. (2006). "Human protein reference database—2006 update". Nucleic Acids Research. 34 (Database issue): 411–414. doi:10.1093/nar/gkj141. PMC 1347503 . PMID 16381900.
↑ Mathivanan S.; et al. (2008). "Human Proteinpedia enables sharing of human protein data" (PDF). Nature Biotechnology. 26 (2): 164–167. doi:10.1038/nbt0208-164. hdl: 10261/60528 . PMID 18259167. S2CID 205265347.
↑ Amanchy R.; et al. (2007). "A compendium of curated phosphorylation-based substrate and binding motifs". Nature Biotechnology. 2007 (25): 285–286. doi:10.1038/nbt0307-285. PMID 17344875. S2CID 38824337.
↑ Mathivanan S, Periaswamy B, Gandhi TK, et al. (2006). "An evaluation of human protein-protein interaction data in the public domain". BMC Bioinformatics . 7 (Suppl 5): S19. doi: 10.1186/1471-2105-7-S5-S19 . PMC 1764475 . PMID 17254303.

External links

http://www.humanproteinpedia.org Archived 2007-03-14 at the Wayback Machine
http://www.hprd.org Archived 2006-04-24 at the Wayback Machine

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Peri S, et al. (2003). "Development of human protein reference database as an initial platform for approaching systems biology in humans". Genome Research. 13 (10): 2363–71. doi:10.1101/gr.1680803. PMC 403728 . PMID 14525934.

[2] Gandhi T.K.B.; et al. (March 2006). "Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets". Nature Genetics. 38 (3): 285–293. doi:10.1038/ng1747. PMID 16501559. S2CID 1446423.

[3] Mathivanan S.; et al. (2006). "An evaluation of human protein–protein interaction data in the public domain". BMC Bioinformatics. 2006 (7): S19. doi: 10.1186/1471-2105-7-s5-s19 . PMC 1764475 . PMID 17254303.

[4] Mishra G.; et al. (2006). "Human protein reference database—2006 update". Nucleic Acids Research. 34 (Database issue): 411–414. doi:10.1093/nar/gkj141. PMC 1347503 . PMID 16381900.

[5] Mathivanan S.; et al. (2008). "Human Proteinpedia enables sharing of human protein data" (PDF). Nature Biotechnology. 26 (2): 164–167. doi:10.1038/nbt0208-164. hdl: 10261/60528 . PMID 18259167. S2CID 205265347.

[6] Amanchy R.; et al. (2007). "A compendium of curated phosphorylation-based substrate and binding motifs". Nature Biotechnology. 2007 (25): 285–286. doi:10.1038/nbt0307-285. PMID 17344875. S2CID 38824337.

[pmid17254303-7] Mathivanan S, Periaswamy B, Gandhi TK, et al. (2006). "An evaluation of human protein-protein interaction data in the public domain". BMC Bioinformatics . 7 (Suppl 5): S19. doi: 10.1186/1471-2105-7-S5-S19 . PMC 1764475 . PMID 17254303.

[1]

[2]

[3]

[4]

[5]

[6]

[7]