The PRIDE (PRoteomics IDEntifications database) is a public data repository of mass spectrometry (MS) based proteomics data, and is maintained by the European Bioinformatics Institute as part of the Proteomics Team. [1]
Originally designed by Lennart Martens in 2003 during a stay at the European Bioinformatics Institute as a Marie Curie fellow of the European Commission in the "Quality of Life" Programme (Contract number: QLRI-1999-50595), PRIDE was established as a production service in 2005. [2] The original grant application document from June 2013 to start construction of PRIDE has since been published in a viewpoint article. [3] [4] Several similar proteomics databases have been built, including the GPMDB, PeptideAtlas, Proteinpedia and the NCBI Peptidome. [1]
The PRIDE database constitutes a structured data repository, and stores the original experimental data from the researchers without editorial control over the submitted data.
In total, PRIDE contains data from about 60 species, the biggest fraction of it coming from human samples (including the data from the two draft human proteomes [5] [6] ) followed by the fruit fly Drosophila melanogaster and mouse. [1]
Since detailed proteomics data currently cannot be curated from the existing literature, the source of PRIDE data is solely submissions by academic researchers.
PRIDE is a standards-compliant public repository, meaning that its own XML-based data exchange format for submissions, PRIDE XML, was built around the Proteomics Standards Initiative mzData standard for mass spectrometry. Recently, PRIDE has been adapted to work with the modern mzML [7] and mzIdentML [8] standards of the Proteomics Standards Initiative. [9] An additional format, dubbed mzTab, can be used as a simplified way to submit quantitative proteomics data. [10]
As there are many types of different mass spectrometry instruments and software formats are currently on the market, wet-lab scientists without a strong bioinformatics background or informatics support were having problems converting their data to PRIDE XML. The development of PRIDE Converter helped to tackle this situation. [11] PRIDE Converter is a tool, written in the Java programming language, that converts 15 different input mass spectrometry data formats into PRIDE XML via a wizard-like graphical user interface. It is freely available and is open source under the permissive Apache License. A new version of PRIDE Converter was released in 2012 as PRIDE Converter 2. [12] This new version constituted a complete rewrite, focused on easy adaptability to different (and evolving) data sources.
Currently, data can be queried from PRIDE via the PRIDE web interface, through the stand-alone Java client PRIDE Inspector, [13] or coupled directly to several search engines through PeptideShaker. [14] Moreover, a new RESTful API allows convenient programmatic access to the PRIDE archive. [15]
The extensive use of controlled vocabularies (CVs) and ontologies for flexible yet context-sensitive annotation of data, along with the ability to perform intelligent queries by these annotations, are key features of PRIDE. [16]
The ProteomeXchange consortium has been set up to provide a coordinated submission of MS proteomics data to the main existing proteomics repositories, and to encourage optimal data dissemination. [17] The consortium contains several member databases, including PRIDE and PeptideAtlas. The earliest conception of ProteomeXchange stems from a meeting at the HUPO 2005 conference in Munich, [18] where the main proteomics data repositories at the time agreed in principle to exchange their data, and thus provide a means for the user to find public proteomics data at any of the participating databases. Due to the rapid development of the field, and the need to first develop suitable standards for data exchange, it took almost ten years from that meeting to actually implement this system, an effort that was funded by the 'ProteomeXchange' Coordination Action grant of the European Commission's Seventh Framework Programme. [19]
The NCBI Peptidome database was discontinued in 2011, yet a joint effort by the PRIDE and Peptidome teams resulted in the transfer of all Peptidome data to PRIDE. [20] [21] [22]
The proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. Proteomics is the study of the proteome.
Proteomics is the large-scale study of proteins. Proteins are vital macromolecules of all living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In addition, other kinds of proteins include antibodies that protect an organism from infection, and hormones that send important signals throughout the body.
Mass spectrometry is a scientific technique for measuring the mass-to-charge ratio of ions. It is often coupled to chromatographic techniques such as gas- or liquid chromatography and has found widespread adoption in the fields of analytical chemistry and biochemistry where it can be used to identify and characterize small molecules and proteins (proteomics). The large volume of data produced in a typical mass spectrometry experiment requires that computers be used for data storage and processing. Over the years, different manufacturers of mass spectrometers have developed various proprietary data formats for handling such data which makes it difficult for academic scientists to directly manipulate their data. To address this limitation, several open, XML-based data formats have recently been developed by the Trans-Proteomic Pipeline at the Institute for Systems Biology to facilitate data manipulation and innovation in the public sector. These data formats are described here.
Amos Bairoch is a Swiss bioinformatician and Professor of Bioinformatics at the Department of Human Protein Sciences of the University of Geneva where he leads the CALIPHO group at the Swiss Institute of Bioinformatics (SIB) combining bioinformatics, curation, and experimental efforts to functionally characterize human proteins.
Rudolf Aebersold is a Swiss biologist, regarded as a pioneer in the fields of proteomics and systems biology. He has primarily researched techniques for measuring proteins in complex samples, in many cases via mass spectrometry. Ruedi Aebersold is a professor of Systems biology at the Institute of Molecular Systems Biology (IMSB) in ETH Zurich. He was one of the founders of the Institute for Systems Biology in Seattle, Washington, United States where he previously had a research group.
Trifunctional enzyme subunit beta, mitochondrial (TP-beta) also known as 3-ketoacyl-CoA thiolase, acetyl-CoA acyltransferase, or beta-ketothiolase is an enzyme that in humans is encoded by the HADHB gene.
A tandem mass tag (TMT) is a chemical label that facilitates sample multiplexing in mass spectrometry (MS)-based quantification and identification of biological macromolecules such as proteins, peptides and nucleic acids. TMT belongs to a family of reagents referred to as isobaric mass tags which are a set of molecules with the same mass, but yield reporter ions of differing mass after fragmentation. The relative ratio of the measured reporter ions represents the relative abundance of the tagged molecule, although ion suppression has a detrimental effect on accuracy. Despite these complications, TMT-based proteomics has been shown to afford higher precision than Label-free quantification. In addition to aiding in protein quantification, TMT tags can also increase the detection sensitivity of certain highly hydrophilic analytes, such as phosphopeptides, in RPLC-MS analyses.
Protein mass spectrometry refers to the application of mass spectrometry to the study of proteins. Mass spectrometry is an important method for the accurate mass determination and characterization of proteins, and a variety of methods and instrumentations have been developed for its many uses. Its applications include the identification of proteins and their post-translational modifications, the elucidation of protein complexes, their subunits and functional interactions, as well as the global measurement of proteins in proteomics. It can also be used to localize proteins to the various organelles, and determine the interactions between different proteins as well as with membrane lipids.
Shotgun proteomics refers to the use of bottom-up proteomics techniques in identifying proteins in complex mixtures using a combination of high performance liquid chromatography combined with mass spectrometry. The name is derived from shotgun sequencing of DNA which is itself named after the rapidly expanding, quasi-random firing pattern of a shotgun. The most common method of shotgun proteomics starts with the proteins in the mixture being digested and the resulting peptides are separated by liquid chromatography. Tandem mass spectrometry is then used to identify the peptides.
Elongation factor Tu, mitochondrial is a protein that in humans is encoded by the TUFM gene. It is an EF-Tu homolog.
F-actin-capping protein subunit beta, also known as CapZβ, is a protein that in humans is encoded by the CAPZB gene. CapZβ functions to cap actin filaments at barbed ends in muscle and other tissues.
The Proteomics Standards Initiative (PSI) is a working group of the Human Proteome Organization. It aims to define data standards for proteomics to facilitate data comparison, exchange and verification.
OpenMS is an open-source project for data analysis and processing in mass spectrometry and is released under the 3-clause BSD licence. It supports most common operating systems including Microsoft Windows, MacOS and Linux.
The Human Proteome Project (HPP) is a collaborative effort coordinated by the Human Proteome Organization. Its stated goal is to experimentally observe all of the proteins produced by the sequences translated from the human genome.
Albert J.R. Heck is a Dutch scientist and professor at Utrecht University, the Netherlands in the field of mass spectrometry and proteomics. He is known for his work on technologies to study proteins in their natural environment, with the aim to understand their biological function. Albert Heck was awarded the Spinoza Prize in 2017, the highest scientific award in the Netherlands.
Ronald Charles Beavis is a Canadian protein biochemist, who has been involved in the application of mass spectrometry to protein primary structure, with applications in the fields of proteomics and analytical biochemistry. He has developed methods for measuring the identity and post-translational modification state of proteins obtained from biological samples using mass spectrometry. He is currently best known for developing new methods for analyzing proteomics data and applying the results of these methods to problems in computational biology.
David Fenyö is a Hungarian-Swedish-American computational biologist, physicist and businessman. He is currently professor in the Department of Biochemistry and Molecular Pharmacology at NYU Langone Medical Center. Fenyö's research focuses on the development of methods to identify, characterize and quantify proteins and in the integration of data from multiple modalities including mass spectrometry, sequencing and microscopy.
Ancient proteins are complex mixtures and the term palaeoproteomics is used to characterise the study of proteomes in the past. Ancients proteins have been recovered from a wide range of archaeological materials, including bones, teeth, eggshells, leathers, parchments, ceramics, painting binders and well-preserved soft tissues like gut intestines. These preserved proteins have provided valuable information about taxonomic identification, evolution history (phylogeny), diet, health, disease, technology and social dynamics in the past.
Catherine E. Costello is the William Fairfield Warren distinguished professor in the department of biochemistry, Cell Biology and Genomics, and the director of the Center for Biomedical Mass Spectrometry at the Boston University School of Medicine.