Human Protein Reference Database

Last updated

The Human Protein Reference Database (HPRD) is a protein database accessible through the Internet. [1] It is closely associated with the premier Indian Non-Profit research organisation Institute of Bioinformatics (IOB), Bangalore, India. This database is a collaborative output of IOB and the Pandey Lab of Johns Hopkins University.

Contents

Overview

The HPRD is a result of an international collaborative effort between the Institute of Bioinformatics in Bangalore, India and the Pandey lab at Johns Hopkins University in Baltimore, USA. HPRD contains manually curated scientific information pertaining to the biology of most human proteins. Information regarding proteins involved in human diseases is annotated and linked to Online Mendelian Inheritance in Man (OMIM) database. The National Center for Biotechnology Information provides link to HPRD through its human protein databases (e.g. Entrez Gene, RefSeq protein pertaining to genes and proteins.

This resource depicts information on human protein functions including protein–protein interactions, post-translational modifications, enzyme-substrate relationships and disease associations. Protein annotation information that is catalogued was derived through manual curation using published literature by expert biologists and through bioinformatics analyses of the protein sequence. The protein–protein interaction and subcellular localization data from HPRD have been used to develop a human protein interaction network. [2]

Highlights of HPRD as follows:

HPRD also integrates data from Human Proteinpedia, a community portal for integrating human protein data. The data from HPRD can be freely accessed and used by academic users while commercial entities are required to obtain a license for use. Human Proteinpedia [5] content is freely available for anyone to download and use.

PhosphoMotif Finder

PhosphoMotif Finder [6] contains known kinase/phosphatase substrate as well as binding motifs that are curated from the published literature. It reports the PRESENCE of any literature-derived motif in the query sequence. PhosphoMotif Finder does NOT PREDICT any motifs in the query protein sequence using any algorithm or other computational strategies.

Comparison of protein data

There are other databases that deal with human proteome (e.g. BioGRID, BIND, DIP, HPRD, IntAct, MINT, MIPS, PDZBase and Reactome). Each database has its own style of presenting the data. It is a difficult task for most investigators to compare the voluminous data from these databases in order to conclude strengths and weaknesses of each database. Mathivanan and colleagues [7] tried to address this issue while analyzing protein data by asking various questions. This analysis will help biologists to choose among these databases based on their needs.

Related Research Articles

<span class="mw-page-title-main">Post-translational modification</span> Biological processes

Post-translational modification (PTM) is the covalent process of changing proteins following protein biosynthesis. PTMs may involve enzymes or occur spontaneously. Proteins are created by ribosomes translating mRNA into polypeptide chains, which may then change to form the mature protein product. PTMs are important components in cell signalling, as for example when prohormones are converted to hormones.

<span class="mw-page-title-main">Interactome</span> Complete set of molecular interactions in a biological cell

In molecular biology, an interactome is the whole set of molecular interactions in a particular cell. The term specifically refers to physical interactions among molecules but can also describe sets of indirect interactions among genes.

A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to that sequence. Summaries and aggregate results are provided in standardized format describing the information that would otherwise have required visits to many smaller sites or direct literature searches to compile. Many sequence profiling tools are software portals or gateways that simplify the process of finding information about a query in the large and growing number of bioinformatics databases. The access to these kinds of tools is either web based or locally downloadable executables.

<span class="mw-page-title-main">Protein–protein interaction</span> Physical interactions and constructions between multiple proteins

Protein–protein interactions (PPIs) are physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by interactions that include electrostatic forces, hydrogen bonding and the hydrophobic effect. Many are physical contacts with molecular associations between chains that occur in a cell or in a living organism in a specific biomolecular context.

<span class="mw-page-title-main">Pfam</span> Database of protein families

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 36.0, was released in September 2023 and contains 20,795 families.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

<span class="mw-page-title-main">BioGRID</span> Biological database

The Biological General Repository for Interaction Datasets (BioGRID) is a curated biological database of protein-protein interactions, genetic interactions, chemical interactions, and post-translational modifications created in 2003 (originally referred to as simply the General Repository for Interaction Datasets by Mike Tyers, Bobby-Joe Breitkreutz, and Chris Stark at the Lunenfeld-Tanenbaum Research Institute at Mount Sinai Hospital. It strives to provide a comprehensive curated resource for all major model organism species while attempting to remove redundancy to create a single mapping of data. Users of The BioGRID can search for their protein, chemical or publication of interest and retrieve annotation, as well as curated data as reported, by the primary literature and compiled by in house large-scale curation efforts. The BioGRID is hosted in Toronto, Ontario, Canada and Dallas, Texas, United States and is partnered with the Saccharomyces Genome Database, FlyBase, WormBase, PomBase, and the Alliance of Genome Resources. The BioGRID is funded by the NIH and CIHR. BioGRID is an observer member of the International Molecular Exchange Consortium.

<span class="mw-page-title-main">NetPath</span>

NetPath is a manually curated resource of human signal transduction pathways. It is a joint effort between Pandey Lab at the Johns Hopkins University and the Institute of Bioinformatics (IOB), Bangalore, India, and is also worked on by other parties.

Human Proteinpedia, which is closely associated with Institute of Bioinformatics (IOB), Bangalore and Johns Hopkins University, is a portal for sharing and integration of human proteomic data. It allows research laboratories to contribute and maintain protein annotations. Human Protein Reference Database (HPRD) integrates data, that is deposited in Human Proteinpedia along with the existing literature curated information at the context of an individual protein. In essence, researchers can add new data to HPRD by registering to Human Proteinpedia. The data deposited in Human Proteinpedia is freely available for download. Emphasizing the importance of proteomics data disposition to public repositories, Nature Methods recommends Human Proteinpedia in their editorial. More than 70 labs participate in this effort.

<span class="mw-page-title-main">DNAJB11</span> Protein-coding gene in the species Homo sapiens

DnaJ homolog subfamily B member 11 is a protein that in humans is encoded by the DNAJB11 gene.

<span class="mw-page-title-main">MPZL1</span> Protein-coding gene in the species Homo sapiens

Myelin protein zero-like protein 1 is a protein that in humans is encoded by the MPZL1 gene.

<span class="mw-page-title-main">40S ribosomal protein S10</span> Protein-coding gene in the species Homo sapiens

40S ribosomal protein S10 is a protein that in humans is encoded by the RPS10 gene.

The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences and their protein products. RefSeq was introduced in 2000. This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes.

The Eukaryotic Linear Motif (ELM) resource is a computational biology resource for investigating short linear motifs (SLiMs) in eukaryotic proteins. It is currently the largest collection of linear motif classes with annotated and experimentally validated linear motif instances.

<span class="mw-page-title-main">Short linear motif</span>

In molecular biology short linear motifs (SLiMs), linear motifs or minimotifs are short stretches of protein sequence that mediate protein–protein interaction.

The Death Domain database is a secondary database of protein-protein interactions (PPI) of the death domain superfamily. Members of this superfamily are key players in apoptosis, inflammation, necrosis, and immune cell signaling pathways. Negative death domain superfamily-mediated signaling events result in various human diseases which include, cancers, neurodegenerative diseases, and immunological disorders. Creating death domain databases are of particular interest to researchers in the biomedical field as it enables a further understanding of the molecular mechanisms involved in death domain interactions while also providing easy access to tools such as an interaction map that illustrates the protein-protein interaction network and information. There is currently only one database that exclusively looks at death domains but there are other databases and resources that have information on this superfamily. According to PubMed, this database has been cited by seven peer-reviewed articles to date because of its extensive and specific information on the death domains and their PPI summaries.

<span class="mw-page-title-main">SLC46A3</span> Protein-coding gene in the species Homo sapiens

Solute carrier family 46 member 3 (SLC46A3) is a protein that in humans is encoded by the SLC46A3 gene. Also referred to as FKSG16, the protein belongs to the major facilitator superfamily (MFS) and SLC46A family. Most commonly found in the plasma membrane and endoplasmic reticulum (ER), SLC46A3 is a multi-pass membrane protein with 11 α-helical transmembrane domains. It is mainly involved in the transport of small molecules across the membrane through the substrate translocation pores featured in the MFS domain. The protein is associated with breast and prostate cancer, hepatocellular carcinoma (HCC), papilloma, glioma, obesity, and SARS-CoV. Based on the differential expression of SLC46A3 in antibody-drug conjugate (ADC)-resistant cells and certain cancer cells, current research is focused on the potential of SLC46A3 as a prognostic biomarker and therapeutic target for cancer. While protein abundance is relatively low in humans, high expression has been detected particularly in the liver, small intestine, and kidney.

The human interactome is the set of protein–protein interactions that occur in human cells. The sequencing of reference genomes, in particular the Human Genome Project, has revolutionized human genetics, molecular biology, and clinical medicine. Genome-wide association study results have led to the association of genes with most Mendelian disorders, and over 140 000 germline mutations have been associated with at least one genetic disease. However, it became apparent that inherent to these studies is an emphasis on clinical outcome rather than a comprehensive understanding of human disease; indeed to date the most significant contributions of GWAS have been restricted to the “low-hanging fruit” of direct single mutation disorders, prompting a systems biology approach to genomic analysis. The connection between genotype and phenotype remain elusive, especially in the context of multigenic complex traits and cancer. To assign functional context to genotypic changes, much of recent research efforts have been devoted to the mapping of the networks formed by interactions of cellular and genetic components in humans, as well as how these networks are altered by genetic and somatic disease.

The Institute of Bioinformatics, often referred to as IOB, is an Indian not-for-profit academic research organization based in Bangalore, India. It is involved in research in the fields of bioinformatics, multi-omics, systems biology and neurological disorders. In 2002, the institute was set up by The Genomics Research Trust and the Johns Hopkins University of Baltimore, Maryland. This organization is recognized as a 'Scientific and Industrial Research Organization' (SIRO) of the Department of Scientific and Industrial Research, Government of India. Renowned Proteomicist Akhilesh Pandey, Professor at Department of Laboratory Medicine and Pathology, Center for Individualized Medicine of Mayo Clinic in Rochester, Minnesota, USA is the Founding and current Director of IOB, and eminent Proteomicist Ravi Sirdeshmukh, Founder President of the 'Proteomic Society of India' is the current Associate Director of IOB.

References

  1. Peri S, et al. (2003). "Development of human protein reference database as an initial platform for approaching systems biology in humans". Genome Research. 13 (10): 2363–71. doi:10.1101/gr.1680803. PMC   403728 . PMID   14525934.
  2. Gandhi T.K.B.; et al. (March 2006). "Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets". Nature Genetics. 38 (3): 285–293. doi:10.1038/ng1747. PMID   16501559. S2CID   1446423.
  3. Mathivanan S.; et al. (2006). "An evaluation of human protein–protein interaction data in the public domain". BMC Bioinformatics. 2006 (7): S19. doi: 10.1186/1471-2105-7-s5-s19 . PMC   1764475 . PMID   17254303.
  4. Mishra G.; et al. (2006). "Human protein reference database—2006 update". Nucleic Acids Research. 34 (Database issue): 411–414. doi:10.1093/nar/gkj141. PMC   1347503 . PMID   16381900.
  5. Mathivanan S.; et al. (2008). "Human Proteinpedia enables sharing of human protein data" (PDF). Nature Biotechnology. 26 (2): 164–167. doi:10.1038/nbt0208-164. hdl: 10261/60528 . PMID   18259167. S2CID   205265347.
  6. Amanchy R.; et al. (2007). "A compendium of curated phosphorylation-based substrate and binding motifs". Nature Biotechnology. 2007 (25): 285–286. doi:10.1038/nbt0307-285. PMID   17344875. S2CID   38824337.
  7. Mathivanan S, Periaswamy B, Gandhi TK, et al. (2006). "An evaluation of human protein-protein interaction data in the public domain". BMC Bioinformatics . 7 (Suppl 5): S19. doi: 10.1186/1471-2105-7-S5-S19 . PMC   1764475 . PMID   17254303.