The Proteomics Standards Initiative (PSI) is a working group of the Human Proteome Organization. It aims to define data standards for proteomics to facilitate data comparison, exchange and verification. [1] [2]
The Proteomics Standards Initiative focuses on the following subjects: minimum information about a proteomics experiment defines the metadata that should be provided along with a proteomics experiment. [3] a data markup language for encoding the data, and metadata ontologies for consistent annotation and representation.
Minimum information about a proteomics experiment (MIAPE) is a minimum information standard, created by the Proteomics Standards Initiative of the Human Proteome Organization, for reporting proteomics experiments. [4] You can't just introduce the results of an analysis, it is intended to specify all the information necessary to interpret the experiment results unambiguously and to potentially reproduce the experiment. [5] While the MIAPE guidelines define the content required for compliant reports, it does not specify the format in which this data should be presented (which is left to the corresponding *ML format, also defined by PSI [6] ), nor does it define how to perform experiments. [7]
Several working groups work on several documents covering the different areas of proteomics: [8]
The gel electrophoresis working group defined reporting requirements for gel electrophoresis experiments. The document is at the stage of a recommendation and has been published. [9] The corresponding data exchange format is called GelML, and a stable version was released in late 2007. [10]
The gel electrophoresis working group also focuses on image analysis with the gel image informatics recommendation that is currently in the public review phase while the corresponding exchange format is only a draft (as of April 2009). [10]
The sample processing working group defines requirements concerning all the sample pre-processing steps that are carried out before gel electrophoresis or mass spectrometry is applied. Two documents concerning column chromatography and capillary electrophoresis are in the early draft stages and the Sample preparation and handling is still a project (as of April 2009). The data exchange format (spML) is also under development. [11]
Mass spectrometry [12] and mass spectrometry informatics [13] documents have been published as recommendations by the mass spectrometry working group.
The working group has released several data exchange format: the mzML, for the capture of data generated by a mass spectrometer, which is a merge of the previous mzData (developed by PSI) and mzXML (developed at the Seattle Proteome Center at the Institute for Systems Biology); mzIdentML, for Mass spectra informatics analysis that capture the results of the identification of proteins and peptides from mass spectrometry data; and TraML, for selected reaction monitoring input file. Finally, they develop MS CV, a controlled vocabulary to use with the previous file formats. [14]
The molecular interactions working group of PSI only works on PSI MI XML, a data exchange format, and on its corresponding ontologies. They have published the MIMIx guidelines (minimum information about a molecular interaction experiment)
Study design and sample generation and statistical analysis of data MIAPE recommendations are also being planned or drafted. [8]
Several standard-compliant proteomics repositories exist, allowing researchers to publish their data while enforcing MIAPE guidelines. For example: MIAPEGelDB [15] (for gel electrophoresis data), PRIDE [16] (for mass spectrometry data), and ProteoRed MIAPE Generator tool [17] (for gel electrophoresis and mass spectrometry data)
It is expected that journal editors will eventually request authors to publish all their data to such repositories before publication[ citation needed ].
There are similar initiatives that try to define minimal requirements. For microarrays the MGED Society defined the minimum information about a microarray experiment (MIAME). [18] The standards for reporting of diagnostic accuracy (STARD) is available for studies reporting medical diagnosis accuracies. [19]
The proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. Proteomics is the study of the proteome.
Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In addition, other kinds of proteins include antibodies that protect an organism from infection, and hormones that send important signals throughout the body.
Two-dimensional gel electrophoresis, abbreviated as 2-DE or 2-D electrophoresis, is a form of gel electrophoresis commonly used to analyze proteins. Mixtures of proteins are separated by two properties in two dimensions on 2D gels. 2-DE was first independently introduced by O'Farrell and Klose in 1975.
Mass spectrometry is a scientific technique for measuring the mass-to-charge ratio of ions. It is often coupled to chromatographic techniques such as gas- or liquid chromatography and has found widespread adoption in the fields of analytical chemistry and biochemistry where it can be used to identify and characterize small molecules and proteins (proteomics). The large volume of data produced in a typical mass spectrometry experiment requires that computers be used for data storage and processing. Over the years, different manufacturers of mass spectrometers have developed various proprietary data formats for handling such data which makes it difficult for academic scientists to directly manipulate their data. To address this limitation, several open, XML-based data formats have recently been developed by the Trans-Proteomic Pipeline at the Institute for Systems Biology to facilitate data manipulation and innovation in the public sector. These data formats are described here.
Insilicos is a life science software company founded in 2002 by Erik Nilsson, Brian Pratt and Bryan Prazen. Insilicos develops scientific computing software to provide software for disease diagnoses.
Immunoproteomics is the study of large sets of proteins (proteomics) involved in the immune response.
Rudolf Aebersold is a Swiss biologist, regarded as a pioneer in the fields of proteomics and systems biology. He has primarily researched techniques for measuring proteins in complex samples, in many cases via mass spectrometry. Ruedi Aebersold is a professor of Systems biology at the Institute of Molecular Systems Biology (IMSB) in ETH Zurich. He was one of the founders of the Institute for Systems Biology in Seattle, Washington, where he previously had a research group.
Tetrasodium tris(bathophenanthroline disulfonate)ruthenium(II) (Na4Ru(bps)3) is a sodium salt of coordination compound. In this form, it is the salt of a sulfonic acid. This compound is an extension of the phenanthroline series of coordination compounds. Ruthenium(II) tris(bathophenanthroline disulfonate), referring to the anionic fragment, is used as a protein dye in biochemistry for differentiating and detecting different proteins in laboratory settings.
Shotgun proteomics refers to the use of bottom-up proteomics techniques in identifying proteins in complex mixtures using a combination of high performance liquid chromatography combined with mass spectrometry. The name is derived from shotgun sequencing of DNA which is itself named after the rapidly expanding, quasi-random firing pattern of a shotgun. The most common method of shotgun proteomics starts with the proteins in the mixture being digested and the resulting peptides are separated by liquid chromatography. Tandem mass spectrometry is then used to identify the peptides.
Top-down proteomics is a method of protein identification that either uses an ion trapping mass spectrometer to store an isolated protein ion for mass measurement and tandem mass spectrometry (MS/MS) analysis or other protein purification methods such as two-dimensional gel electrophoresis in conjunction with MS/MS. Top-down proteomics is capable of identifying and quantitating unique proteoforms through the analysis of intact proteins. The name is derived from the similar approach to DNA sequencing. During mass spectrometry intact proteins are typically ionized by electrospray ionization and trapped in a Fourier transform ion cyclotron resonance, quadrupole ion trap or Orbitrap mass spectrometer. Fragmentation for tandem mass spectrometry is accomplished by electron-capture dissociation or electron-transfer dissociation. Effective fractionation is critical for sample handling before mass-spectrometry-based proteomics. Proteome analysis routinely involves digesting intact proteins followed by inferred protein identification using mass spectrometry (MS). Top-down MS (non-gel) proteomics interrogates protein structure through measurement of an intact mass followed by direct ion dissociation in the gas phase.
OpenMS is an open-source project for data analysis and processing in mass spectrometry and is released under the 3-clause BSD licence. It supports most common operating systems including Microsoft Windows, MacOS and Linux.
The PRIDE is a public data repository of mass spectrometry (MS) based proteomics data, and is maintained by the European Bioinformatics Institute as part of the Proteomics Team.
The Human Proteome Project (HPP) is a collaborative effort coordinated by the Human Proteome Organization. Its stated goal is to experimentally observe all of the proteins produced by the sequences translated from the human genome.
Ronald Charles Beavis is a Canadian protein biochemist, who has been involved in the application of mass spectrometry to protein primary structure, with applications in the fields of proteomics and analytical biochemistry. He has developed methods for measuring the identity and post-translational modification state of proteins obtained from biological samples using mass spectrometry. He is currently best known for developing new methods for analyzing proteomics data and applying the results of these methods to problems in computational biology.
Degradomics is a sub-discipline of biology encompassing all the genomic and proteomic approaches devoted to the study of proteases, their inhibitors, and their substrates on a system-wide scale. This includes the analysis of the protease and protease-substrate repertoires, also called "protease degradomes". The scope of these degradomes can range from cell, tissue, and organism-wide scales.
The Minimum Information Required About a Glycomics Experiment (MIRAGE) initiative is part of the Minimum Information Standards and specifically applies to guidelines for reporting on a glycomics experiment. The initiative is supported by the Beilstein Institute for the Advancement of Chemical Sciences. The MIRAGE project focuses on the development of publication guidelines for interaction and structural glycomics data as well as the development of data exchange formats. The project was launched in 2011 in Seattle and set off with the description of the aims of the MIRAGE project.
Minimum information standards are sets of guidelines and formats for reporting data derived by specific high-throughput methods. Their purpose is to ensure the data generated by these methods can be easily verified, analysed and interpreted by the wider scientific community. Ultimately, they facilitate the transfer of data from journal articles into databases in a form that enables data to be mined across multiple data sets. Minimal information standards are available for a vast variety of experiment types including microarray (MIAME), RNAseq (MINSEQE), metabolomics (MSI) and proteomics (MIAPE).
Pier Giorgio Righetti is a professor emeritus of chemistry. He worked primarily at the University of Milano (1971-1995) and at the Department of Chemistry of the Politecnico di Milano in Milan, Italy (2005-2011). He has served as the President of the Società Italiana di Proteomica.
Catherine E. Costello is the William Fairfield Warren distinguished professor in the Department of Biochemistry, Cell Biology and Genomics, and the director of the Center for Biomedical Mass Spectrometry at the Boston University School of Medicine.