Proteomics Identifications Database

Last updated

The PRIDE (PRoteomics IDEntifications database) is a public data repository of mass spectrometry (MS) based proteomics data, and is maintained by the European Bioinformatics Institute as part of the Proteomics Team. [1]

Contents

Originally designed by Lennart Martens in 2003 during a stay at the European Bioinformatics Institute as a Marie Curie fellow of the European Commission in the "Quality of Life" Programme (Contract number: QLRI-1999-50595), PRIDE was established as a production service in 2005. [2] The original grant application document from June 2013 to start construction of PRIDE has since been published in a viewpoint article. [3] [4] Several similar proteomics databases have been built, including the GPMDB, PeptideAtlas, Proteinpedia and the NCBI Peptidome. [1]

The PRIDE database constitutes a structured data repository, and stores the original experimental data from the researchers without editorial control over the submitted data.

In total, PRIDE contains data from about 60 species, the biggest fraction of it coming from human samples (including the data from the two draft human proteomes [5] [6] ) followed by the fruit fly Drosophila melanogaster and mouse. [1]

Formats and the submission process

Since detailed proteomics data currently cannot be curated from the existing literature, the source of PRIDE data is solely submissions by academic researchers.

PRIDE is a standards-compliant public repository, meaning that its own XML-based data exchange format for submissions, PRIDE XML, was built around the Proteomics Standards Initiative mzData standard for mass spectrometry. Recently, PRIDE has been adapted to work with the modern mzML [7] and mzIdentML [8] standards of the Proteomics Standards Initiative. [9] An additional format, dubbed mzTab, can be used as a simplified way to submit quantitative proteomics data. [10]

As there are many types of different mass spectrometry instruments and software formats are currently on the market, wet-lab scientists without a strong bioinformatics background or informatics support were having problems converting their data to PRIDE XML. The development of PRIDE Converter helped to tackle this situation. [11] PRIDE Converter is a tool, written in the Java programming language, that converts 15 different input mass spectrometry data formats into PRIDE XML via a wizard-like graphical user interface. It is freely available and is open source under the permissive Apache License. A new version of PRIDE Converter was released in 2012 as PRIDE Converter 2. [12] This new version constituted a complete rewrite, focused on easy adaptability to different (and evolving) data sources.

Browsing, searching and data mining PRIDE

Currently, data can be queried from PRIDE via the PRIDE web interface, through the stand-alone Java client PRIDE Inspector, [13] or coupled directly to several search engines through PeptideShaker. [14] Moreover, a new RESTful API allows convenient programmatic access to the PRIDE archive. [15]

The extensive use of controlled vocabularies (CVs) and ontologies for flexible yet context-sensitive annotation of data, along with the ability to perform intelligent queries by these annotations, are key features of PRIDE. [16]

Involvement in ProteomeXchange

The ProteomeXchange consortium has been set up to provide a coordinated submission of MS proteomics data to the main existing proteomics repositories, and to encourage optimal data dissemination. [17] The consortium contains several member databases, including PRIDE and PeptideAtlas. The earliest conception of ProteomeXchange stems from a meeting at the HUPO 2005 conference in Munich, [18] where the main proteomics data repositories at the time agreed in principle to exchange their data, and thus provide a means for the user to find public proteomics data at any of the participating databases. Due to the rapid development of the field, and the need to first develop suitable standards for data exchange, it took almost ten years from that meeting to actually implement this system, an effort that was funded by the 'ProteomeXchange' Coordination Action grant of the European Commission's Seventh Framework Programme. [19]

Data recovery after the discontinuation of Peptidome

The NCBI Peptidome database was discontinued in 2011, yet a joint effort by the PRIDE and Peptidome teams resulted in the transfer of all Peptidome data to PRIDE. [20] [21] [22]

Related Research Articles

<span class="mw-page-title-main">Proteome</span> Set of proteins that can be expressed by a genome, cell, tissue, or organism

The proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. Proteomics is the study of the proteome.

<span class="mw-page-title-main">Proteomics</span> Large-scale study of proteins

Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In addition, other kinds of proteins include antibodies that protect an organism from infection, and hormones that send important signals throughout the body.

Mass spectrometry is a scientific technique for measuring the mass-to-charge ratio of ions. It is often coupled to chromatographic techniques such as gas- or liquid chromatography and has found widespread adoption in the fields of analytical chemistry and biochemistry where it can be used to identify and characterize small molecules and proteins (proteomics). The large volume of data produced in a typical mass spectrometry experiment requires that computers be used for data storage and processing. Over the years, different manufacturers of mass spectrometers have developed various proprietary data formats for handling such data which makes it difficult for academic scientists to directly manipulate their data. To address this limitation, several open, XML-based data formats have recently been developed by the Trans-Proteomic Pipeline at the Institute for Systems Biology to facilitate data manipulation and innovation in the public sector. These data formats are described here.

<span class="mw-page-title-main">Ruedi Aebersold</span> Swiss biologist (born 1954)

Rudolf Aebersold is a Swiss biologist, regarded as a pioneer in the fields of proteomics and systems biology. He has primarily researched techniques for measuring proteins in complex samples, in many cases via mass spectrometry. Ruedi Aebersold is a professor of Systems biology at the Institute of Molecular Systems Biology (IMSB) in ETH Zurich. He was one of the founders of the Institute for Systems Biology in Seattle, Washington, United States where he previously had a research group.

<span class="mw-page-title-main">HADHB</span> Protein-coding gene in the species Homo sapiens

Trifunctional enzyme subunit beta, mitochondrial (TP-beta) also known as 3-ketoacyl-CoA thiolase, acetyl-CoA acyltransferase, or beta-ketothiolase is an enzyme that in humans is encoded by the HADHB gene.

A tandem mass tag (TMT) is a chemical label that facilitates sample multiplexing in mass spectrometry (MS)-based quantification and identification of biological macromolecules such as proteins, peptides and nucleic acids. TMT belongs to a family of reagents referred to as isobaric mass tags which are a set of molecules with the same mass, but yield reporter ions of differing mass after fragmentation. The relative ratio of the measured reporter ions represents the relative abundance of the tagged molecule, although ion suppression has a detrimental effect on accuracy. Despite these complications, TMT-based proteomics has been shown to afford higher precision than Label-free quantification. In addition to aiding in protein quantification, TMT tags can also increase the detection sensitivity of certain highly hydrophilic analytes, such as phosphopeptides, in RPLC-MS analyses.

Shotgun proteomics refers to the use of bottom-up proteomics techniques in identifying proteins in complex mixtures using a combination of high performance liquid chromatography combined with mass spectrometry. The name is derived from shotgun sequencing of DNA which is itself named after the rapidly expanding, quasi-random firing pattern of a shotgun. The most common method of shotgun proteomics starts with the proteins in the mixture being digested and the resulting peptides are separated by liquid chromatography. Tandem mass spectrometry is then used to identify the peptides.

<span class="mw-page-title-main">Top-down proteomics</span>

Top-down proteomics is a method of protein identification that either uses an ion trapping mass spectrometer to store an isolated protein ion for mass measurement and tandem mass spectrometry (MS/MS) analysis or other protein purification methods such as two-dimensional gel electrophoresis in conjunction with MS/MS. Top-down proteomics is capable of identifying and quantitating unique proteoforms through the analysis of intact proteins. The name is derived from the similar approach to DNA sequencing. During mass spectrometry intact proteins are typically ionized by electrospray ionization and trapped in a Fourier transform ion cyclotron resonance, quadrupole ion trap or Orbitrap mass spectrometer. Fragmentation for tandem mass spectrometry is accomplished by electron-capture dissociation or electron-transfer dissociation. Effective fractionation is critical for sample handling before mass-spectrometry-based proteomics. Proteome analysis routinely involves digesting intact proteins followed by inferred protein identification using mass spectrometry (MS). Top-down MS (non-gel) proteomics interrogates protein structure through measurement of an intact mass followed by direct ion dissociation in the gas phase.

<span class="mw-page-title-main">TUFM</span> Mitochondrial protein and coding gene in humans

Elongation factor Tu, mitochondrial is a protein that in humans is encoded by the TUFM gene. It is an EF-Tu homolog.

<span class="mw-page-title-main">CAPZB</span> Protein-coding gene in the species Homo sapiens

F-actin-capping protein subunit beta, also known as CapZβ is a protein that in humans is encoded by the CAPZB gene. CapZβ functions to cap actin filaments at barbed ends in muscle and other tissues.

The Proteomics Standards Initiative (PSI) is a working group of the Human Proteome Organization. It aims to define data standards for proteomics to facilitate data comparison, exchange and verification.

OpenMS is an open-source project for data analysis and processing in mass spectrometry and is released under the 3-clause BSD licence. It supports most common operating systems including Microsoft Windows, MacOS and Linux.

<span class="mw-page-title-main">Proteogenomics</span>

Proteogenomics is a field of biological research that utilizes a combination of proteomics, genomics, and transcriptomics to aid in the discovery and identification of peptides. Proteogenomics is used to identify new peptides by comparing MS/MS spectra against a protein database that has been derived from genomic and transcriptomic information. Proteogenomics often refers to studies that use proteomic information, often derived from mass spectrometry, to improve gene annotations. The utilization of both proteomics and genomics data alongside advances in the availability and power of spectrographic and chromatographic technology led to the emergence of proteogenomics as its own field in 2004.

The Human Proteome Project (HPP) is a collaborative effort coordinated by the Human Proteome Organization. Its stated goal is to experimentally observe all of the proteins produced by the sequences translated from the human genome.

<span class="mw-page-title-main">Albert J. R. Heck</span> Dutch chemist

Albert J.R. Heck is a Dutch scientist and professor at Utrecht University, the Netherlands in the field of mass spectrometry and proteomics. He is known for his work on technologies to study proteins in their natural environment, with the aim to understand their biological function. Albert Heck was awarded the Spinoza Prize in 2017, the highest scientific award in the Netherlands.

<span class="mw-page-title-main">Ronald Beavis</span> Canadian protein biochemist

Ronald Charles Beavis is a Canadian protein biochemist, who has been involved in the application of mass spectrometry to protein primary structure, with applications in the fields of proteomics and analytical biochemistry. He has developed methods for measuring the identity and post-translational modification state of proteins obtained from biological samples using mass spectrometry. He is currently best known for developing new methods for analyzing proteomics data and applying the results of these methods to problems in computational biology.

David Fenyö is a Swedish-American physicist and mass spectrometrist. He is currently professor in the Department of Biochemistry and Molecular Pharmacology at NYU Langone Medical Center. Fenyö's research focuses on the development of methods to identify, characterize and quantify proteins and in the integration of data from multiple modalities including mass spectrometry, sequencing and microscopy.

<span class="mw-page-title-main">Ancient protein</span>

Ancient proteins are complex mixtures and the term palaeoproteomics is used to characterise the study of proteomes in the past. Ancients proteins have been recovered from a wide range of archaeological materials, including bones, teeth, eggshells, leathers, parchments, ceramics, painting binders and well-preserved soft tissues like gut intestines. These preserved proteins have provided valuable information about taxonomic identification, evolution history (phylogeny), diet, health, disease, technology and social dynamics in the past.

Catherine E. Costello is the William Fairfield Warren distinguished professor in the department of biochemistry, Cell Biology and Genomics, and the director of the Center for Biomedical Mass Spectrometry at the Boston University School of Medicine.

References

  1. 1 2 3 Vizcaíno, JA; Côté, R; Reisinger, F; Barsnes, H; Foster, JM; Rameseder, J; Hermjakob, H; Martens, L (2010). "The Proteomics Identifications database: 2010 update". Nucleic Acids Res. 38 (Database): D736–42. doi:10.1093/nar/gkp964. PMC   2808904 . PMID   19906717.
  2. Martens, L; Hermjakob, H; Jones, P; Adamski, M; Taylor, C; States, D; Gevaert, K; Vandekerckhove, J; Apweiler, R (Aug 2005). "PRIDE: The PRoteomics IDEntifications database". Proteomics. 5 (13): 3537–45. doi: 10.1002/pmic.200401303 . PMID   16041671. S2CID   28998489.
  3. "Application for Training at the EMBL-EBI EU Marie Curie Training Site" (PDF).
  4. Martens, Lennart (March 2016). "Public proteomics data: how the field has evolved from sceptical inquiry to the promise of in silico proteomics". EuPA Open Proteomics. 11: 42–44. doi:10.1016/j.euprot.2016.02.005. PMC   5988554 . PMID   29900110.
  5. Wilhelm, M; Schlegl, J; Hahne, H; Moghaddas Gholami, A; Lieberenz, M; Savitski, MM; Ziegler, E; Butzmann, L; Gessulat, S; Marx, H; Mathieson, T; Lemeer, S; Schnatbaum, K; Reimer, U; Wenschuh, H; Mollenhauer, M; Slotta-Huspenina, J; Boese, JH; Bantscheff, M; Gerstmair, A; Faerber, F; Kuster, B (29 May 2014). "Mass-spectrometry-based draft of the human proteome". Nature. 509 (7502): 582–7. Bibcode:2014Natur.509..582W. doi:10.1038/nature13319. PMID   24870543. S2CID   4467721.
  6. Kim, MS; Pinto, SM; Getnet, D; Nirujogi, RS; Manda, SS; Chaerkady, R; Madugundu, AK; Kelkar, DS; Isserlin, R; Jain, S; Thomas, JK; Muthusamy, B; Leal-Rojas, P; Kumar, P; Sahasrabuddhe, NA; Balakrishnan, L; Advani, J; George, B; Renuse, S; Selvan, LD; Patil, AH; Nanjappa, V; Radhakrishnan, A; Prasad, S; Subbannayya, T; Raju, R; Kumar, M; Sreenivasamurthy, SK; Marimuthu, A; Sathe, GJ; Chavan, S; Datta, KK; Subbannayya, Y; Sahu, A; Yelamanchi, SD; Jayaram, S; Rajagopalan, P; Sharma, J; Murthy, KR; Syed, N; Goel, R; Khan, AA; Ahmad, S; Dey, G; Mudgal, K; Chatterjee, A; Huang, TC; Zhong, J; Wu, X; Shaw, PG; Freed, D; Zahari, MS; Mukherjee, KK; Shankar, S; Mahadevan, A; Lam, H; Mitchell, CJ; Shankar, SK; Satishchandra, P; Schroeder, JT; Sirdeshmukh, R; Maitra, A; Leach, SD; Drake, CG; Halushka, MK; Prasad, TS; Hruban, RH; Kerr, CL; Bader, GD; Iacobuzio-Donahue, CA; Gowda, H; Pandey, A (29 May 2014). "A draft map of the human proteome". Nature. 509 (7502): 575–81. Bibcode:2014Natur.509..575K. doi:10.1038/nature13302. PMC   4403737 . PMID   24870542.
  7. Martens, L; Chambers, M; Sturm, M; Kessner, D; Levander, F; Shofstahl, J; Tang, WH; Römpp, A; Neumann, S; Pizarro, AD; Montecchi-Palazzi, L; Tasman, N; Coleman, M; Reisinger, F; Souda, P; Hermjakob, H; Binz, PA; Deutsch, EW (January 2011). "mzML--a community standard for mass spectrometry data". Molecular & Cellular Proteomics. 10 (1): R110.000133. doi: 10.1074/mcp.R110.000133 . PMC   3013463 . PMID   20716697.
  8. Jones, AR; Eisenacher, M; Mayer, G; Kohlbacher, O; Siepen, J; Hubbard, SJ; Selley, JN; Searle, BC; Shofstahl, J; Seymour, SL; Julian, R; Binz, PA; Deutsch, EW; Hermjakob, H; Reisinger, F; Griss, J; Vizcaíno, JA; Chambers, M; Pizarro, A; Creasy, D (July 2012). "The mzIdentML data standard for mass spectrometry-based proteomics results". Molecular & Cellular Proteomics. 11 (7): M111.014381. doi: 10.1074/mcp.M111.014381 . PMC   3394945 . PMID   22375074.
  9. Deutsch, EW; Albar, JP; Binz, PA; Eisenacher, M; Jones, AR; Mayer, G; Omenn, GS; Orchard, S; Vizcaíno, JA; Hermjakob, H (May 2015). "Development of data representation standards by the human proteome organization proteomics standards initiative". Journal of the American Medical Informatics Association. 22 (3): 495–506. doi:10.1093/jamia/ocv001. PMC   4457114 . PMID   25726569.
  10. Griss, J; Jones, AR; Sachsenberg, T; Walzer, M; Gatto, L; Hartler, J; Thallinger, GG; Salek, RM; Steinbeck, C; Neuhauser, N; Cox, J; Neumann, S; Fan, J; Reisinger, F; Xu, QW; Del Toro, N; Pérez-Riverol, Y; Ghali, F; Bandeira, N; Xenarios, I; Kohlbacher, O; Vizcaíno, JA; Hermjakob, H (October 2014). "The mzTab data exchange format: communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience". Molecular & Cellular Proteomics. 13 (10): 2765–75. doi: 10.1074/mcp.o113.036681 . PMC   4189001 . PMID   24980485.
  11. Barsnes, H; Vizcaíno, JA; Eidhammer, I; Martens, L (2009). "PRIDE Converter: making proteomics data-sharing easy". Nat Biotechnol. 27 (7): 598–9. doi:10.1038/nbt0709-598. PMID   19587657. S2CID   205269351.
  12. Côté, RG; Griss, J; Dianes, JA; Wang, R; Wright, JC; van den Toorn, HW; van Breukelen, B; Heck, AJ; Hulstaert, N; Martens, L; Reisinger, F; Csordas, A; Ovelleiro, D; Perez-Rivevol, Y; Barsnes, H; Hermjakob, H; Vizcaíno, JA (December 2012). "The PRoteomics IDEntification (PRIDE) Converter 2 framework: an improved suite of tools to facilitate data submission to the PRIDE database and the ProteomeXchange consortium". Molecular & Cellular Proteomics. 11 (12): 1682–9. doi: 10.1074/mcp.o112.021543 . PMC   3518121 . PMID   22949509.
  13. Wang, R; Fabregat, A; Ríos, D; Ovelleiro, D; Foster, JM; Côté, RG; Griss, J; Csordas, A; Perez-Riverol, Y; Reisinger, F; Hermjakob, H; Martens, L; Vizcaíno, JA (Feb 2012). "PRIDE Inspector: a tool to visualize and validate MS proteomics data". Nature Biotechnology. 30 (2): 135–7. doi:10.1038/nbt.2112. PMC   3277942 . PMID   22318026.
  14. Vaudel, M; Burkhart, JM; Zahedi, RP; Oveland, E; Berven, FS; Sickmann, A; Martens, L; Barsnes, H (January 2015). "PeptideShaker enables reanalysis of MS-derived proteomics data sets". Nature Biotechnology. 33 (1): 22–4. doi:10.1038/nbt.3109. PMID   25574629. S2CID   27922651.
  15. Reisinger, F; Del-Toro, N; Ternent, T; Hermjakob, H; Vizcaíno, JA (22 April 2015). "Introducing the PRIDE Archive RESTful web services". Nucleic Acids Research. 43 (W1): W599–604. doi:10.1093/nar/gkv382. PMC   4489246 . PMID   25904633.
  16. Vizcaíno, JA; Côté, R; Reisinger, F; Mueller, M; Foster, JM; Rameseder, J; Hermjakob, H; Martens, L (2009). "A guide to the Proteomics". Identifications Database Proteomics Data Repository. 9 (18): 4276–83. doi:10.1002/pmic.200900402. PMC   2970915 . PMID   19662629.
  17. Vizcaíno, JA; Deutsch, EW; Wang, R; Csordas, A; Reisinger, F; Ríos, D; Dianes, JA; Sun, Z; Farrah, T; Bandeira, N; Binz, PA; Xenarios, I; Eisenacher, M; Mayer, G; Gatto, L; Campos, A; Chalkley, RJ; Kraus, HJ; Albar, JP; Martinez-Bartolomé, S; Apweiler, R; Omenn, GS; Martens, L; Jones, AR; Hermjakob, H (March 2014). "ProteomeXchange provides globally coordinated proteomics data submission and dissemination". Nature Biotechnology. 32 (3): 223–6. doi:10.1038/nbt.2839. PMC   3986813 . PMID   24727771.
  18. Hermjakob, H; Apweiler, R (February 2006). "The Proteomics Identifications Database (PRIDE) and the ProteomExchange Consortium: making proteomics data accessible". Expert Review of Proteomics. 3 (1): 1–3. doi: 10.1586/14789450.3.1.1 . PMID   16445344.
  19. "European Commission : CORDIS : Projects and Results : International Data Exchange and Data Representation Standards for Proteomics". cordis.europa.eu. Retrieved 2017-09-22.
  20. Csordas, A; Wang, R; Ríos, D; Reisinger, F; Foster, JM; Slotta, DJ; Vizcaíno, JA; Hermjakob, H (May 2013). "From Peptidome to PRIDE: public proteomics data migration at a large scale". Proteomics. 13 (10–11): 1692–5. doi:10.1002/pmic.201200514. PMC   3717177 . PMID   23533138.
  21. Martens, L (May 2013). "Resilience in the proteomics data ecosystem: how the field cares for its data". Proteomics. 13 (10–11): 1548–50. doi:10.1002/pmic.201300118. hdl: 1854/LU-4166053 . PMID   23596016. S2CID   8041195.
  22. "Peptidome - NCBI Peptide Data Resource". www.ncbi.nlm.nih.gov. Archived from the original on 2009-07-07.