OpenMS

Last updated
OpenMS
Developer(s) Over 65 individuals
Initial release1 July 2007;16 years ago (2007-07-01)
Stable release
3.0.0 / 1 August 2023;4 months ago (2023-08-01)
Repository
Written in C++ (with bindings to Python)
Operating system Linux, Windows, MacOS
Size 215 MB [1]
Available inEnglish
Type Bioinformatics / Mass spectrometry software
License BSD licenses 3-clause
Website openms.de

OpenMS is an open-source project for data analysis and processing in mass spectrometry and is released under the 3-clause BSD licence. It supports most common operating systems including Microsoft Windows, MacOS and Linux. [2]

Contents

OpenMS has tools for analysis of proteomics data, providing algorithms for signal processing, feature finding (including de-isotoping), visualization in 1D (spectra or chromatogram level), 2D and 3D, map mapping and peptide identification. It supports label-free and isotopic-label based quantification (such as iTRAQ and TMT and SILAC). OpenMS also supports metabolomics workflows and targeted analysis of DIA/SWATH data. [2] Furthermore, OpenMS provides tools for the analysis of cross linking data, including protein-protein, protein-RNA and protein-DNA cross linking. Lastly, OpenMS provides tools for analysis of RNA mass spectrometry data.

History

OpenMS was originally released in 2007 in version 1.0 and was described in two articles published in Bioinformatics in 2007 and 2008 and has since seen continuous releases. [3] [4] In 2009, the visualization tool TOPPView was published [5] and in 2012, the workflow manager and editor TOPPAS was described. [6] In 2013, a complete high-throughput label-free analysis pipeline using OpenMS 1.8 was described and compared with similar, proprietary software (such as MaxQuant and Progenesis QI). The authors conclude that "[...] all three software solutions produce adequate and largely comparable quantification results; all have some weaknesses, and none can outperform the other two in every aspect that we examined. However, the performance of OpenMS is on par with that of its two tested competitors [...]". [7]

The OpenMS 1.10 release contained several new analysis tools, including OpenSWATH (a tool for targeted DIA data analysis), a metabolomics feature finder and a TMT analysis tool. Furthermore, full support for TraML 1.0.0 and the search engine MyriMatch were added. [8] The OpenMS 1.11 release was the first release to contain fully integrated bindings to the Python programming language (termed pyOpenMS). [9] In addition, new tools were added to support QcML (for quality control) and for metabolomics accurate mass analysis. Multiple tools were significantly improved with regard to memory and CPU performance. [10]

With OpenMS 2.0, released in April 2015, the project provides a new version that has been completely cleared of GPL code and uses git (in combination with GitHub) for its version control and ticketing system. Other changes include support for mzIdentML, mzQuantML and mzTab while improvements in the kernel allow for faster access to data stored in mzML and provide a novel API for accessing mass spectrometric data. [11] In 2016, the new features of OpenMS 2.0 were described in an article in Nature Methods. [2]

OpenMS is currently developed with contributions from the group of Knut Reinert [12] at the Free University of Berlin, the group of Oliver Kohlbacher [13] at the University of Tübingen and the group of Ruedi Aebersold [14] at ETH Zurich.

Features

OpenMS provides a set of over 100 different executable tools than can be chained together into pipelines for mass spectrometry data analysis (the TOPP Tools). It also provides visualization tools for spectra and chromatograms (1D), mass spectrometric heat maps (2D m/z vs RT) as well as a three-dimensional visualization of a mass spectrometry experiment. Finally, OpenMS also provides a C++ library (with bindings to Python available since 1.11) for LC/MS data management and analyses accessible to developers to create new tools and implement their own algorithms using the OpenMS library. OpenMS is free software available under the 3-clause BSD licence (previously under the LGPL).

Among others, it provides algorithms for signal processing, feature finding (including de-isotoping), visualization, map mapping and peptide identification. It supports label-free and isotopic-label based quantification (such as iTRAQ and TMT and SILAC).

TOPPView is a viewer that allows visualization of mass spectrometric data on MS1 and MS2 level as well as in 3D; additionally it also displays chromatographic data from SRM experiments (in version 1.10). OpenMS is compatible with current and upcoming Proteomics Standard Initiative (PSI) formats for mass spectrometric data.

Releases

VersionDateFeatures
Old version, no longer maintained: 1.6.0November 2009New version of TOPPAS, reading of compressed XML files, identification-based alignment
Old version, no longer maintained: 1.7.0September 2010Protein quantification, protXML support, create Inclusion/Exclusion lists
Old version, no longer maintained: 1.8.0March 2011Display identification results, QT Clustering-based feature linking
Old version, no longer maintained: 1.9.0February 2012 metabolomics support, feature detection in raw (profile) data
Old version, no longer maintained: 1.10.0March 2013 KNIME integration, support for targeted SWATH-MS analysis, TraML support, SuperHirn integration, MyriMatch support
Old version, no longer maintained: 1.11.0August 2013Support for Python bindings, performance improvements, Mascot 2.4 support
Old version, no longer maintained: 2.0April 2015mzQuantL, mzIdentML, mzTab, indexed mzML, Removal of GPL code, Switch to git, Support for Fido, MSGF+, Percolator
Old version, no longer maintained: 2.0.1April 2016faster file reading, improved support for mzIdentML and mzTab, elemental flux analysis, targeted assay generation, Support for Comet and Luciphor
Old version, no longer maintained: 2.1.0November 2016Metabolite SWATH-MS support, lowess-transformations for RT alignment, improved metabolic feature finding
Old version, no longer maintained: 2.2.0July 2017Fast feature linking using a KD tree, RNA cross-linking support, SpectraST support, scanning SWATH support, SQLite file formats
Old version, no longer maintained: 2.3.0January 2018Protein-Protein Crosslinking, support for Comet, support for fractions, TMT 11plex, improved build for Python bindings
Old version, no longer maintained: 2.4.0October 2018Support MaraCluster, Crux, MSFragger, MSstats, SIRIUS, visualization of ion mobility and DIA, library improvements
Old version, no longer maintained: 2.5.0February 2020Support RNA mass spectrometry, QualityControl workflow, extended OpenSWATH support, ProteomicsLFQ
Old version, no longer maintained: 2.6.0September 2020PyOpenMS wheel builds, Database suitability tool, SLIM labelling support
Old version, no longer maintained: 2.7.0July 2021Improved support of NOVOR and MSFragger and for SIRIUS 4.9.0, export of mzQC format in QCCalculator, improved reading and writing of NIST MSP files
Current stable version:3.0.0July 2023Added FLASHDeconv, and FLASHDeconvWizard GUI. Removed obsolete tool adapters. Major improvements to documentation.
Legend:
Old version
Older version, still maintained
Latest version
Latest preview version
Future release

See also

Related Research Articles

<span class="mw-page-title-main">Proteomics</span> Large-scale study of proteins

Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In addition, other kinds of proteins include antibodies that protect an organism from infection, and hormones that send important signals throughout the body.

<span class="mw-page-title-main">Lipidomics</span>

Lipidomics is the large-scale study of pathways and networks of cellular lipids in biological systems The word "lipidome" is used to describe the complete lipid profile within a cell, tissue, organism, or ecosystem and is a subset of the "metabolome" which also includes other major classes of biological molecules. Lipidomics is a relatively recent research field that has been driven by rapid advances in technologies such as mass spectrometry (MS), nuclear magnetic resonance (NMR) spectroscopy, fluorescence spectroscopy, dual polarisation interferometry and computational methods, coupled with the recognition of the role of lipids in many metabolic diseases such as obesity, atherosclerosis, stroke, hypertension and diabetes. This rapidly expanding field complements the huge progress made in genomics and proteomics, all of which constitute the family of systems biology.

<span class="mw-page-title-main">Metabolomics</span> Scientific study of chemical processes involving metabolites

Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerprints that specific cellular processes leave behind", the study of their small-molecule metabolite profiles. The metabolome represents the complete set of metabolites in a biological cell, tissue, organ, or organism, which are the end products of cellular processes. Messenger RNA (mRNA), gene expression data, and proteomic analyses reveal the set of gene products being produced in the cell, data that represents one aspect of cellular function. Conversely, metabolic profiling can give an instantaneous snapshot of the physiology of that cell, and thus, metabolomics provides a direct "functional readout of the physiological state" of an organism. There are indeed quantifiable correlations between the metabolome and the other cellular ensembles, which can be used to predict metabolite abundances in biological samples from, for example mRNA abundances. One of the ultimate challenges of systems biology is to integrate metabolomics with all other -omics information to provide a better understanding of cellular biology.

Mass spectrometry is a scientific technique for measuring the mass-to-charge ratio of ions. It is often coupled to chromatographic techniques such as gas- or liquid chromatography and has found widespread adoption in the fields of analytical chemistry and biochemistry where it can be used to identify and characterize small molecules and proteins (proteomics). The large volume of data produced in a typical mass spectrometry experiment requires that computers be used for data storage and processing. Over the years, different manufacturers of mass spectrometers have developed various proprietary data formats for handling such data which makes it difficult for academic scientists to directly manipulate their data. To address this limitation, several open, XML-based data formats have recently been developed by the Trans-Proteomic Pipeline at the Institute for Systems Biology to facilitate data manipulation and innovation in the public sector. These data formats are described here.

Insilicos is a life science software company founded in 2002 by Erik Nilsson, Brian Pratt and Bryan Prazen. Insilicos develops scientific computing software to provide software for disease diagnoses.

<span class="mw-page-title-main">Ion mobility spectrometry</span> Analytical technique used to separate and identify ionized molecules in the gas phase

Ion mobility spectrometry (IMS) It is a method of conducting analytical research that separates and identifies ionized molecules present in the gas phase based on the mobility of the molecules in a carrier buffer gas. Even though it is used extensively for military or security objectives, such as detecting drugs and explosives, the technology also has many applications in laboratory analysis, including studying small and big biomolecules. IMS instruments are extremely sensitive stand-alone devices, but are often coupled with mass spectrometry, gas chromatography or high-performance liquid chromatography in order to achieve a multi-dimensional separation. They come in various sizes, ranging from a few millimeters to several meters depending on the specific application, and are capable of operating under a broad range of conditions. IMS instruments such as microscale high-field asymmetric-waveform ion mobility spectrometry can be palm-portable for use in a range of applications including volatile organic compound (VOC) monitoring, biological sample analysis, medical diagnosis and food quality monitoring. Systems operated at higher pressure are often accompanied by elevated temperature, while lower pressure systems (1-20 hPa) do not require heating.

<span class="mw-page-title-main">Ruedi Aebersold</span> Swiss biologist (born 1954)

Rudolf Aebersold is a Swiss biologist, regarded as a pioneer in the fields of proteomics and systems biology. He has primarily researched techniques for measuring proteins in complex samples, in many cases via mass spectrometry. Ruedi Aebersold is a professor of Systems biology at the Institute of Molecular Systems Biology (IMSB) in ETH Zurich. He was one of the founders of the Institute for Systems Biology in Seattle, Washington, where he previously had a research group.

The Trans-Proteomic Pipeline (TPP) is an open-source data analysis software for proteomics developed at the Institute for Systems Biology (ISB) by the Ruedi Aebersold group under the Seattle Proteome Center. The TPP includes PeptideProphet, ProteinProphet, ASAPRatio, XPRESS and Libra.

Mass spectrometry imaging (MSI) is a technique used in mass spectrometry to visualize the spatial distribution of molecules, as biomarkers, metabolites, peptides or proteins by their molecular masses. After collecting a mass spectrum at one spot, the sample is moved to reach another region, and so on, until the entire sample is scanned. By choosing a peak in the resulting spectra that corresponds to the compound of interest, the MS data is used to map its distribution across the sample. This results in pictures of the spatially resolved distribution of a compound pixel by pixel. Each data set contains a veritable gallery of pictures because any peak in each spectrum can be spatially mapped. Despite the fact that MSI has been generally considered a qualitative method, the signal generated by this technique is proportional to the relative abundance of the analyte. Therefore, quantification is possible, when its challenges are overcome. Although widely used traditional methodologies like radiochemistry and immunohistochemistry achieve the same goal as MSI, they are limited in their abilities to analyze multiple samples at once, and can prove to be lacking if researchers do not have prior knowledge of the samples being studied. Most common ionization technologies in the field of MSI are DESI imaging, MALDI imaging, secondary ion mass spectrometry imaging and Nanoscale SIMS (NanoSIMS).

<span class="mw-page-title-main">Quantitative proteomics</span> Analytical chemistry technique

Quantitative proteomics is an analytical chemistry technique for determining the amount of proteins in a sample. The methods for protein identification are identical to those used in general proteomics, but include quantification as an additional dimension. Rather than just providing lists of proteins identified in a certain sample, quantitative proteomics yields information about the physiological differences between two biological samples. For example, this approach can be used to compare samples from healthy and diseased patients. Quantitative proteomics is mainly performed by two-dimensional gel electrophoresis (2-DE), preparative native PAGE, or mass spectrometry (MS). However, a recent developed method of quantitative dot blot (QDB) analysis is able to measure both the absolute and relative quantity of an individual proteins in the sample in high throughput format, thus open a new direction for proteomic research. In contrast to 2-DE, which requires MS for the downstream protein identification, MS technology can identify and quantify the changes.

The Proteomics Standards Initiative (PSI) is a working group of the Human Proteome Organization. It aims to define data standards for proteomics to facilitate data comparison, exchange and verification.

<span class="mw-page-title-main">Capillary electrophoresis–mass spectrometry</span>

Capillary electrophoresis–mass spectrometry (CE–MS) is an analytical chemistry technique formed by the combination of the liquid separation process of capillary electrophoresis with mass spectrometry. CE–MS combines advantages of both CE and MS to provide high separation efficiency and molecular mass information in a single analysis. It has high resolving power and sensitivity, requires minimal volume and can analyze at high speed. Ions are typically formed by electrospray ionization, but they can also be formed by matrix-assisted laser desorption/ionization or other ionization techniques. It has applications in basic research in proteomics and quantitative analysis of biomolecules as well as in clinical medicine. Since its introduction in 1987, new developments and applications have made CE-MS a powerful separation and identification technique. Use of CE–MS has increased for protein and peptides analysis and other biomolecules. However, the development of online CE–MS is not without challenges. Understanding of CE, the interface setup, ionization technique and mass detection system is important to tackle problems while coupling capillary electrophoresis to mass spectrometry.

The OpenMS Proteomics Pipeline (TOPP) is a set of computational tools that can be chained together to tailor problem-specific analysis pipelines for HPLC-MS data. It transforms most of the OpenMS functionality into small command line tools that are the building blocks for more complex analysis pipelines. The functionality of the tools ranges from data preprocessing over quantitation to identification.

The Netherlands Bioinformatics for Proteomics Platform (NBPP) is joint initiative of the Netherlands Bioinformatics Centre (NBIC) and the Netherlands Proteomics Centre (NPC).

<span class="mw-page-title-main">OpenChrom</span>

OpenChrom is an open source software for the analysis and visualization of mass spectrometric and chromatographic data. Its focus is to handle native data files from several mass spectrometry systems, vendors like Agilent Technologies, Varian, Shimadzu, Thermo Fisher, PerkinElmer and others. But also data formats from other detector types are supported recently.

The PRIDE is a public data repository of mass spectrometry (MS) based proteomics data, and is maintained by the European Bioinformatics Institute as part of the Proteomics Team.

ProteoWizard is a set of open-source, cross-platform tools and libraries for proteomics data analyses. It provides a framework for unified mass spectrometry data file access and performs standard chemistry and LCMS dataset computations. Specifically, it is able to read many of the vendor-specific, proprietary formats and converting the data into an open data format.

Skyline is an open source software for targeted proteomics and metabolomics data analysis. It runs on Microsoft Windows and supports the raw data formats from multiple mass spectrometric vendors. It contains a graphical user interface to display chromatographic data for individual peptide or small molecule analytes.

The 'German Network for Bioinformatics Infrastructure – de.NBI' is a national, academic and non-profit infrastructure initiated by the Federal Ministry of Education and Research funding 2015-2021. The network provides bioinformatics services to users in life sciences research and biomedicine in Germany and Europe. The partners organize training events, courses and summer schools on tools, standards and compute services provided by de.NBI to assist researchers to more effectively exploit their data. From 2022, the network will be integrated into Forschungszentrum Jülich.

References

  1. OpenMS releases
  2. 1 2 3 Röst HL, Sachsenberg T, Aiche S, Bielow C, Weisser H, Aicheler F, Andreotti S, Ehrlich HC, Gutenbrunner P, Kenar E, Liang X, Nahnsen S, Nilse L, Pfeuffer J, Rosenberger G, Rurik M, Schmitt U, Veit J, Walzer M, Wojnar D, Wolski WE, Schilling O, Choudhary JS, Malmström L, Aebersold R, Reinert K, Kohlbacher O (2016). "OpenMS: a flexible open-source software platform for mass spectrometry data analysis" (PDF). Nat. Methods. 13 (9): 741–8. doi:10.1038/nmeth.3959. PMID   27575624. S2CID   873670.
  3. Sturm, M.; Bertsch, A.; Gröpl, C.; Hildebrandt, A.; Hussong, R.; Lange, E.; Pfeifer, N.; Schulz-Trieglaff, O.; Zerck, A.; Reinert, K.; Kohlbacher, O. (2008). "OpenMS – an open-source software framework for mass spectrometry". BMC Bioinformatics. 9: 163. doi: 10.1186/1471-2105-9-163 . PMC   2311306 . PMID   18366760.
  4. Kohlbacher, O.; Reinert, K.; Gropl, C.; Lange, E.; Pfeifer, N.; Schulz-Trieglaff, O.; Sturm, M. (2007). "TOPP--the OpenMS proteomics pipeline". Bioinformatics. 23 (2): e191–e197. doi:10.1093/bioinformatics/btl299. PMID   17237091.
  5. Sturm, M.; Kohlbacher, O. (2009). "TOPPView: An Open-Source Viewer for Mass Spectrometry Data". Journal of Proteome Research. 8 (7): 3760–3763. doi:10.1021/pr900171m. PMID   19425593.
  6. Junker, J.; Bielow, C.; Bertsch, A.; Sturm, M.; Reinert, K.; Kohlbacher, O. (2012). "TOPPAS: A Graphical Workflow Editor for the Analysis of High-Throughput Proteomics Data". Journal of Proteome Research. 11 (7): 3914–3920. doi:10.1021/pr300187f. PMID   22583024.
  7. Weisser, H.; Nahnsen, S.; Grossmann, J.; Nilse, L.; Quandt, A.; Brauer, H.; Sturm, M.; Kenar, E.; Kohlbacher, O.; Aebersold, R.; Malmström, L. (2013). "An Automated Pipeline for High-Throughput Label-Free Quantitative Proteomics". Journal of Proteome Research. 12 (4): 1628–44. doi:10.1021/pr300992u. PMID   23391308.
  8. "OpenMS 1.10 released" . Retrieved 4 July 2013.
  9. "pyopenms 1.11 : Python Package Index" . Retrieved 27 October 2013.
  10. "OpenMS 1.11 released" . Retrieved 27 October 2013.
  11. Röst HL, Schmitt U, Aebersold R, Malmström L (2015). "Fast and Efficient XML Data Access for Next-Generation Mass Spectrometry". PLOS ONE. 10 (4): e0125108. Bibcode:2015PLoSO..1025108R. doi: 10.1371/journal.pone.0125108 . PMC   4416046 . PMID   25927999.
  12. Reinert group
  13. Kohlbacher group
  14. "Aebersold group". Archived from the original on 2011-07-20. Retrieved 2013-07-01.