Developer(s) | Over 65 individuals |
---|---|
Initial release | 1 July 2007 |
Stable release | 3.2.0 / 18 September 2024 |
Repository | |
Written in | C++ (with bindings to Python) |
Operating system | Linux, Windows, MacOS |
Platform | x86-64, ARM |
Size | 215 MB [1] |
Available in | English |
Type | Bioinformatics / Mass spectrometry software |
License | BSD licenses 3-clause |
Website | openms |
OpenMS is an open-source project for data analysis and processing in mass spectrometry and is released under the 3-clause BSD licence. It supports most common operating systems including Microsoft Windows, MacOS and Linux. [2] [3]
OpenMS has tools for analysis of proteomics data, providing algorithms for signal processing, feature finding (including de-isotoping), visualization in 1D (spectra or chromatogram level), 2D and 3D, map mapping and peptide identification. It supports label-free and isotopic-label based quantification (such as iTRAQ and TMT and SILAC). OpenMS also supports metabolomics workflows and targeted analysis of DIA/SWATH data. [2] Furthermore, OpenMS provides tools for the analysis of cross linking data, including protein-protein, protein-RNA and protein-DNA cross linking. Lastly, OpenMS provides tools for analysis of RNA mass spectrometry data.
OpenMS was originally released in 2007 in version 1.0 and was described in two articles published in Bioinformatics in 2007 and 2008 and has since seen continuous releases. [4] [5] In 2009, the visualization tool TOPPView was published [6] and in 2012, the workflow manager and editor TOPPAS was described. [7] In 2013, a complete high-throughput label-free analysis pipeline using OpenMS 1.8 was described and compared with similar, proprietary software (such as MaxQuant and Progenesis QI). The authors conclude that "[...] all three software solutions produce adequate and largely comparable quantification results; all have some weaknesses, and none can outperform the other two in every aspect that we examined. However, the performance of OpenMS is on par with that of its two tested competitors [...]". [8]
The OpenMS 1.10 release contained several new analysis tools, including OpenSWATH (a tool for targeted DIA data analysis), a metabolomics feature finder and a TMT analysis tool. Furthermore, full support for TraML 1.0.0 and the search engine MyriMatch were added. [9] The OpenMS 1.11 release was the first release to contain fully integrated bindings to the Python programming language (termed pyOpenMS). [10] In addition, new tools were added to support QcML (for quality control) and for metabolomics accurate mass analysis. Multiple tools were significantly improved with regard to memory and CPU performance. [11]
With OpenMS 2.0, released in April 2015, the project provides a new version that has been completely cleared of GPL code and uses git (in combination with GitHub) for its version control and ticketing system. Other changes include support for mzIdentML, mzQuantML and mzTab while improvements in the kernel allow for faster access to data stored in mzML and provide a novel API for accessing mass spectrometric data. [12] In 2016, the new features of OpenMS 2.0 were described in an article in Nature Methods. [2]
In 2024, OpenMS 3.0 [3] was released, providing support for a wide array of data analysis task in proteomics, metabolomics and MS-based transcriptomics.
OpenMS is currently developed with contributions from the group of Knut Reinert [13] at the Free University of Berlin, the group of Oliver Kohlbacher [14] at the University of Tübingen and the group of Hannes Roest [15] at University of Toronto.
OpenMS provides a set of over 100 different executable tools than can be chained together into pipelines for mass spectrometry data analysis (the TOPP Tools). It also provides visualization tools for spectra and chromatograms (1D), mass spectrometric heat maps (2D m/z vs RT) as well as a three-dimensional visualization of a mass spectrometry experiment. Finally, OpenMS also provides a C++ library (with bindings to Python available since 1.11) for LC/MS data management and analyses accessible to developers to create new tools and implement their own algorithms using the OpenMS library. OpenMS is free software available under the 3-clause BSD licence (previously under the LGPL).
Among others, it provides algorithms for signal processing, feature finding (including de-isotoping), visualization, map mapping and peptide identification. It supports label-free and isotopic-label based quantification (such as iTRAQ and TMT and SILAC).
The following graphical applications are part an OpenMS release:
Version | Date | Features | |
---|---|---|---|
1.6.0 | November 2009 | New version of TOPPAS, reading of compressed XML files, identification-based alignment | |
1.7.0 | September 2010 | Protein quantification, protXML support, create Inclusion/Exclusion lists | |
1.8.0 | March 2011 | Display identification results, QT Clustering-based feature linking | |
1.9.0 | February 2012 | metabolomics support, feature detection in raw (profile) data | |
1.10.0 | March 2013 | KNIME integration, support for targeted SWATH-MS analysis, TraML support, SuperHirn integration, MyriMatch support | |
1.11.0 | August 2013 | Support for Python bindings, performance improvements, Mascot 2.4 support | |
2.0 | April 2015 | mzQuantL, mzIdentML, mzTab, indexed mzML, Removal of GPL code, Switch to git, Support for Fido, MSGF+, Percolator | |
2.0.1 | April 2016 | faster file reading, improved support for mzIdentML and mzTab, elemental flux analysis, targeted assay generation, Support for Comet and Luciphor | |
2.1.0 | November 2016 | Metabolite SWATH-MS support, lowess-transformations for RT alignment, improved metabolic feature finding | |
2.2.0 | July 2017 | Fast feature linking using a KD tree, RNA cross-linking support, SpectraST support, scanning SWATH support, SQLite file formats | |
2.3.0 | January 2018 | Protein-Protein Crosslinking, support for Comet, support for fractions, TMT 11plex, improved build for Python bindings | |
2.4.0 | October 2018 | Support MaraCluster, Crux, MSFragger, MSstats, SIRIUS, visualization of ion mobility and DIA, library improvements | |
2.5.0 | February 2020 | Support RNA mass spectrometry, QualityControl workflow, extended OpenSWATH support, ProteomicsLFQ | |
2.6.0 | September 2020 | PyOpenMS wheel builds, Database suitability tool, SLIM labelling support | |
2.7.0 | July 2021 | Improved support of NOVOR and MSFragger and for SIRIUS 4.9.0, export of mzQC format in QCCalculator, improved reading and writing of NIST MSP files | |
3.1.0 | July 2023 | Added FLASHDeconv, and FLASHDeconvWizard GUI. Removed obsolete tool adapters. Major improvements to documentation. | |
3.1.0 | October 2023 | Added SageAdapter; Require some advanced instruction sets (SSE3, AVX, Neon). Documentation fixes (TOPPAS and developer tutorial). | |
3.2.0 | September 2024 | Support SubsetNeighborSearch (SNS). SiriusAdapter reworked. Various improvements to TOPPView and TOPPAS. Export for Common Workflow Language (CWL). |
Proteomics is the large-scale study of proteins. Proteins are vital macromolecules of all living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In addition, other kinds of proteins include antibodies that protect an organism from infection, and hormones that send important signals throughout the body.
Lipidomics is the large-scale study of pathways and networks of cellular lipids in biological systems. The word "lipidome" is used to describe the complete lipid profile within a cell, tissue, organism, or ecosystem and is a subset of the "metabolome" which also includes other major classes of biological molecules. Lipidomics is a relatively recent research field that has been driven by rapid advances in technologies such as mass spectrometry (MS), nuclear magnetic resonance (NMR) spectroscopy, fluorescence spectroscopy, dual polarisation interferometry and computational methods, coupled with the recognition of the role of lipids in many metabolic diseases such as obesity, atherosclerosis, stroke, hypertension and diabetes. This rapidly expanding field complements the huge progress made in genomics and proteomics, all of which constitute the family of systems biology.
Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerprints that specific cellular processes leave behind", the study of their small-molecule metabolite profiles. The metabolome represents the complete set of metabolites in a biological cell, tissue, organ, or organism, which are the end products of cellular processes. Messenger RNA (mRNA), gene expression data, and proteomic analyses reveal the set of gene products being produced in the cell, data that represents one aspect of cellular function. Conversely, metabolic profiling can give an instantaneous snapshot of the physiology of that cell, and thus, metabolomics provides a direct "functional readout of the physiological state" of an organism. There are indeed quantifiable correlations between the metabolome and the other cellular ensembles, which can be used to predict metabolite abundances in biological samples from, for example mRNA abundances. One of the ultimate challenges of systems biology is to integrate metabolomics with all other -omics information to provide a better understanding of cellular biology.
Mass spectrometry is a scientific technique for measuring the mass-to-charge ratio of ions. It is often coupled to chromatographic techniques such as gas- or liquid chromatography and has found widespread adoption in the fields of analytical chemistry and biochemistry where it can be used to identify and characterize small molecules and proteins (proteomics). The large volume of data produced in a typical mass spectrometry experiment requires that computers be used for data storage and processing. Over the years, different manufacturers of mass spectrometers have developed various proprietary data formats for handling such data which makes it difficult for academic scientists to directly manipulate their data. To address this limitation, several open, XML-based data formats have recently been developed by the Trans-Proteomic Pipeline at the Institute for Systems Biology to facilitate data manipulation and innovation in the public sector. These data formats are described here.
Insilicos is a life science software company founded in 2002 by Erik Nilsson, Brian Pratt and Bryan Prazen. Insilicos develops scientific computing software to provide software for disease diagnoses.
Ion mobility spectrometry (IMS) It is a method of conducting analytical research that separates and identifies ionized molecules present in the gas phase based on the mobility of the molecules in a carrier buffer gas. Even though it is used extensively for military or security objectives, such as detecting drugs and explosives, the technology also has many applications in laboratory analysis, including studying small and big biomolecules. IMS instruments are extremely sensitive stand-alone devices, but are often coupled with mass spectrometry, gas chromatography or high-performance liquid chromatography in order to achieve a multi-dimensional separation. They come in various sizes, ranging from a few millimetres to several metres depending on the specific application, and are capable of operating under a broad range of conditions. IMS instruments such as microscale high-field asymmetric-waveform ion mobility spectrometry can be palm-portable for use in a range of applications including volatile organic compound (VOC) monitoring, biological sample analysis, medical diagnosis and food quality monitoring. Systems operated at higher pressure are often accompanied by elevated temperature, while lower pressure systems (1–20 hPa) do not require heating.
MALDI mass spectrometry imaging (MALDI-MSI) is the use of matrix-assisted laser desorption ionization as a mass spectrometry imaging technique in which the sample, often a thin tissue section, is moved in two dimensions while the mass spectrum is recorded. Advantages, like measuring the distribution of a large amount of analytes at one time without destroying the sample, make it a useful method in tissue-based study.
Mass spectrometry imaging (MSI) is a technique used in mass spectrometry to visualize the spatial distribution of molecules, as biomarkers, metabolites, peptides or proteins by their molecular masses. After collecting a mass spectrum at one spot, the sample is moved to reach another region, and so on, until the entire sample is scanned. By choosing a peak in the resulting spectra that corresponds to the compound of interest, the MS data is used to map its distribution across the sample. This results in pictures of the spatially resolved distribution of a compound pixel by pixel. Each data set contains a veritable gallery of pictures because any peak in each spectrum can be spatially mapped. Despite the fact that MSI has been generally considered a qualitative method, the signal generated by this technique is proportional to the relative abundance of the analyte. Therefore, quantification is possible, when its challenges are overcome. Although widely used traditional methodologies like radiochemistry and immunohistochemistry achieve the same goal as MSI, they are limited in their abilities to analyze multiple samples at once, and can prove to be lacking if researchers do not have prior knowledge of the samples being studied. Most common ionization technologies in the field of MSI are DESI imaging, MALDI imaging, secondary ion mass spectrometry imaging and Nanoscale SIMS (NanoSIMS).
The Proteomics Standards Initiative (PSI) is a working group of the Human Proteome Organization. It aims to define data standards for proteomics to facilitate data comparison, exchange and verification.
Capillary electrophoresis–mass spectrometry (CE–MS) is an analytical chemistry technique formed by the combination of the liquid separation process of capillary electrophoresis with mass spectrometry. CE–MS combines advantages of both CE and MS to provide high separation efficiency and molecular mass information in a single analysis. It has high resolving power and sensitivity, requires minimal volume and can analyze at high speed. Ions are typically formed by electrospray ionization, but they can also be formed by matrix-assisted laser desorption/ionization or other ionization techniques. It has applications in basic research in proteomics and quantitative analysis of biomolecules as well as in clinical medicine. Since its introduction in 1987, new developments and applications have made CE-MS a powerful separation and identification technique. Use of CE–MS has increased for protein and peptides analysis and other biomolecules. However, the development of online CE–MS is not without challenges. Understanding of CE, the interface setup, ionization technique and mass detection system is important to tackle problems while coupling capillary electrophoresis to mass spectrometry.
The OpenMS Proteomics Pipeline (TOPP) is a set of computational tools that can be chained together to tailor problem-specific analysis pipelines for HPLC-MS data. It transforms most of the OpenMS functionality into small command line tools that are the building blocks for more complex analysis pipelines. The functionality of the tools ranges from data preprocessing over quantitation to identification.
The Netherlands Bioinformatics for Proteomics Platform (NBPP) is joint initiative of the Netherlands Bioinformatics Centre (NBIC) and the Netherlands Proteomics Centre (NPC).
OpenChrom is an open source software for the analysis and visualization of mass spectrometric and chromatographic data. Its focus is to handle native data files from several mass spectrometry systems, vendors like Agilent Technologies, Varian, Shimadzu, Thermo Fisher, PerkinElmer and others. But also data formats from other detector types are supported recently.
The PRIDE is a public data repository of mass spectrometry-based proteomics data, and is maintained by the European Bioinformatics Institute as part of the Proteomics Team.
ProteoWizard is a set of open-source, cross-platform tools and libraries for proteomics data analyses. It provides a framework for unified mass spectrometry data file access and performs standard chemistry and LCMS dataset computations. Specifically, it is able to read many of the vendor-specific, proprietary formats and converting the data into an open data format.
David Fenyö is a Hungarian-Swedish-American computational biologist, physicist and businessman. He is currently professor in the Department of Biochemistry and Molecular Pharmacology at NYU Langone Medical Center. Fenyö's research focuses on the development of methods to identify, characterize and quantify proteins and in the integration of data from multiple modalities including mass spectrometry, sequencing and microscopy.
Skyline is an open source software for targeted proteomics and metabolomics data analysis. It runs on Microsoft Windows and supports the raw data formats from multiple mass spectrometric vendors. It contains a graphical user interface to display chromatographic data for individual peptide or small molecule analytes.
The 'German Network for Bioinformatics Infrastructure – de.NBI' is a national, academic and non-profit infrastructure initiated by the Federal Ministry of Education and Research funding 2015-2021. The network provides bioinformatics services to users in life sciences research and biomedicine in Germany and Europe. The partners organize training events, courses and summer schools on tools, standards and compute services provided by de.NBI to assist researchers to more effectively exploit their data. From 2022, the network will be integrated into Forschungszentrum Jülich.
SIRIUS is a Java-based open-source software for the identification of small molecules from fragmentation mass spectrometry data without the use of spectral libraries. It combines the analysis of isotope patterns in MS1 spectra with the analysis of fragmentation patterns in MS2 spectra. SIRIUS is the umbrella application comprising CSI:FingerID, CANOPUS, COSMIC and ZODIAC.