The PRIDE (PRoteomics IDEntifications database) is a public data repository of mass spectrometry (MS) based proteomics data, and is maintained by the European Bioinformatics Institute as part of the Proteomics Team. [1]
Originally designed by Lennart Martens in 2003 during a stay at the European Bioinformatics Institute as a Marie Curie fellow of the European Commission in the "Quality of Life" Programme (Contract number: QLRI-1999-50595), PRIDE was established as a production service in 2005. [2] The original grant application document from June 2013 to start construction of PRIDE has since been published in a viewpoint article. [3] [4] Several similar proteomics databases have been built, including the GPMDB, PeptideAtlas, Proteinpedia and the NCBI Peptidome. [1]
The PRIDE database constitutes a structured data repository, and stores the original experimental data from the researchers without editorial control over the submitted data.
In total, PRIDE contains data from about 60 species, the biggest fraction of it coming from human samples (including the data from the two draft human proteomes [5] [6] ) followed by the fruit fly Drosophila melanogaster and mouse. [1]
Since detailed proteomics data currently cannot be curated from the existing literature, the source of PRIDE data is solely submissions by academic researchers.
PRIDE is a standards-compliant public repository, meaning that its own XML-based data exchange format for submissions, PRIDE XML, was built around the Proteomics Standards Initiative mzData standard for mass spectrometry. Recently, PRIDE has been adapted to work with the modern mzML [7] and mzIdentML [8] standards of the Proteomics Standards Initiative. [9] An additional format, dubbed mzTab, can be used as a simplified way to submit quantitative proteomics data. [10]
As there are many types of different mass spectrometry instruments and software formats are currently on the market, wet-lab scientists without a strong bioinformatics background or informatics support were having problems converting their data to PRIDE XML. The development of PRIDE Converter helped to tackle this situation. [11] PRIDE Converter is a tool, written in the Java programming language, that converts 15 different input mass spectrometry data formats into PRIDE XML via a wizard-like graphical user interface. It is freely available and is open source under the permissive Apache License. A new version of PRIDE Converter was released in 2012 as PRIDE Converter 2. [12] This new version constituted a complete rewrite, focused on easy adaptability to different (and evolving) data sources.
Currently, data can be queried from PRIDE via the PRIDE web interface, through the stand-alone Java client PRIDE Inspector, [13] or coupled directly to several search engines through PeptideShaker. [14] Moreover, a new RESTful API allows convenient programmatic access to the PRIDE archive. [15]
The extensive use of controlled vocabularies (CVs) and ontologies for flexible yet context-sensitive annotation of data, along with the ability to perform intelligent queries by these annotations, are key features of PRIDE. [16]
The ProteomeXchange consortium has been set up to provide a coordinated submission of MS proteomics data to the main existing proteomics repositories, and to encourage optimal data dissemination. [17] The consortium contains several member databases, including PRIDE and PeptideAtlas. The earliest conception of ProteomeXchange stems from a meeting at the HUPO 2005 conference in Munich, [18] where the main proteomics data repositories at the time agreed in principle to exchange their data, and thus provide a means for the user to find public proteomics data at any of the participating databases. Due to the rapid development of the field, and the need to first develop suitable standards for data exchange, it took almost ten years from that meeting to actually implement this system, an effort that was funded by the 'ProteomeXchange' Coordination Action grant of the European Commission's Seventh Framework Programme. [19]
The NCBI Peptidome database was discontinued in 2011, yet a joint effort by the PRIDE and Peptidome teams resulted in the transfer of all Peptidome data to PRIDE. [20] [21] [22]
The proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. Proteomics is the study of the proteome.
Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In addition, other kinds of proteins include antibodies that protect an organism from infection, and hormones that send important signals throughout the body.
Mass spectrometry is a scientific technique for measuring the mass-to-charge ratio of ions. It is often coupled to chromatographic techniques such as gas- or liquid chromatography and has found widespread adoption in the fields of analytical chemistry and biochemistry where it can be used to identify and characterize small molecules and proteins (proteomics). The large volume of data produced in a typical mass spectrometry experiment requires that computers be used for data storage and processing. Over the years, different manufacturers of mass spectrometers have developed various proprietary data formats for handling such data which makes it difficult for academic scientists to directly manipulate their data. To address this limitation, several open, XML-based data formats have recently been developed by the Trans-Proteomic Pipeline at the Institute for Systems Biology to facilitate data manipulation and innovation in the public sector. These data formats are described here.
Rudolf Aebersold is a Swiss biologist, regarded as a pioneer in the fields of proteomics and systems biology. He has primarily researched techniques for measuring proteins in complex samples, in many cases via mass spectrometry. Ruedi Aebersold is a professor of Systems biology at the Institute of Molecular Systems Biology (IMSB) in ETH Zurich. He was one of the founders of the Institute for Systems Biology in Seattle, Washington, United States where he previously had a research group.
Trifunctional enzyme subunit beta, mitochondrial (TP-beta) also known as 3-ketoacyl-CoA thiolase, acetyl-CoA acyltransferase, or beta-ketothiolase is an enzyme that in humans is encoded by the HADHB gene.
A tandem mass tag (TMT) is a chemical label that facilitates sample multiplexing in mass spectrometry (MS)-based quantification and identification of biological macromolecules such as proteins, peptides and nucleic acids. TMT belongs to a family of reagents referred to as isobaric mass tags which are a set of molecules with the same mass, but yield reporter ions of differing mass after fragmentation. The relative ratio of the measured reporter ions represents the relative abundance of the tagged molecule, although ion suppression has a detrimental effect on accuracy. Despite these complications, TMT-based proteomics has been shown to afford higher precision than Label-free quantification. In addition to aiding in protein quantification, TMT tags can also increase the detection sensitivity of certain highly hydrophilic analytes, such as phosphopeptides, in RPLC-MS analyses.
Shotgun proteomics refers to the use of bottom-up proteomics techniques in identifying proteins in complex mixtures using a combination of high performance liquid chromatography combined with mass spectrometry. The name is derived from shotgun sequencing of DNA which is itself named after the rapidly expanding, quasi-random firing pattern of a shotgun. The most common method of shotgun proteomics starts with the proteins in the mixture being digested and the resulting peptides are separated by liquid chromatography. Tandem mass spectrometry is then used to identify the peptides.
Top-down proteomics is a method of protein identification that either uses an ion trapping mass spectrometer to store an isolated protein ion for mass measurement and tandem mass spectrometry (MS/MS) analysis or other protein purification methods such as two-dimensional gel electrophoresis in conjunction with MS/MS. Top-down proteomics is capable of identifying and quantitating unique proteoforms through the analysis of intact proteins. The name is derived from the similar approach to DNA sequencing. During mass spectrometry intact proteins are typically ionized by electrospray ionization and trapped in a Fourier transform ion cyclotron resonance, quadrupole ion trap or Orbitrap mass spectrometer. Fragmentation for tandem mass spectrometry is accomplished by electron-capture dissociation or electron-transfer dissociation. Effective fractionation is critical for sample handling before mass-spectrometry-based proteomics. Proteome analysis routinely involves digesting intact proteins followed by inferred protein identification using mass spectrometry (MS). Top-down MS (non-gel) proteomics interrogates protein structure through measurement of an intact mass followed by direct ion dissociation in the gas phase.
Elongation factor Tu, mitochondrial is a protein that in humans is encoded by the TUFM gene. It is an EF-Tu homolog.
F-actin-capping protein subunit beta, also known as CapZβ is a protein that in humans is encoded by the CAPZB gene. CapZβ functions to cap actin filaments at barbed ends in muscle and other tissues.
The Proteomics Standards Initiative (PSI) is a working group of the Human Proteome Organization. It aims to define data standards for proteomics to facilitate data comparison, exchange and verification.
OpenMS is an open-source project for data analysis and processing in mass spectrometry and is released under the 3-clause BSD licence. It supports most common operating systems including Microsoft Windows, MacOS and Linux.
Proteogenomics is a field of biological research that utilizes a combination of proteomics, genomics, and transcriptomics to aid in the discovery and identification of peptides. Proteogenomics is used to identify new peptides by comparing MS/MS spectra against a protein database that has been derived from genomic and transcriptomic information. Proteogenomics often refers to studies that use proteomic information, often derived from mass spectrometry, to improve gene annotations. The utilization of both proteomics and genomics data alongside advances in the availability and power of spectrographic and chromatographic technology led to the emergence of proteogenomics as its own field in 2004.
The Human Proteome Project (HPP) is a collaborative effort coordinated by the Human Proteome Organization. Its stated goal is to experimentally observe all of the proteins produced by the sequences translated from the human genome.
Albert J.R. Heck is a Dutch scientist and professor at Utrecht University, the Netherlands in the field of mass spectrometry and proteomics. He is known for his work on technologies to study proteins in their natural environment, with the aim to understand their biological function. Albert Heck was awarded the Spinoza Prize in 2017, the highest scientific award in the Netherlands.
Ronald Charles Beavis is a Canadian protein biochemist, who has been involved in the application of mass spectrometry to protein primary structure, with applications in the fields of proteomics and analytical biochemistry. He has developed methods for measuring the identity and post-translational modification state of proteins obtained from biological samples using mass spectrometry. He is currently best known for developing new methods for analyzing proteomics data and applying the results of these methods to problems in computational biology.
David Fenyö is a Swedish-American physicist and mass spectrometrist. He is currently professor in the Department of Biochemistry and Molecular Pharmacology at NYU Langone Medical Center. Fenyö's research focuses on the development of methods to identify, characterize and quantify proteins and in the integration of data from multiple modalities including mass spectrometry, sequencing and microscopy.
Ancient proteins are complex mixtures and the term palaeoproteomics is used to characterise the study of proteomes in the past. Ancients proteins have been recovered from a wide range of archaeological materials, including bones, teeth, eggshells, leathers, parchments, ceramics, painting binders and well-preserved soft tissues like gut intestines. These preserved proteins have provided valuable information about taxonomic identification, evolution history (phylogeny), diet, health, disease, technology and social dynamics in the past.
Catherine E. Costello is the William Fairfield Warren distinguished professor in the department of biochemistry, Cell Biology and Genomics, and the director of the Center for Biomedical Mass Spectrometry at the Boston University School of Medicine.