CellProfiler

Last updated
CellProfiler
Developer(s) Anne E. Carpenter, Thouis Jones, Lee Kamentsky, Beth Cimini, Allen Goodman, Claire McQuin, Madison Swain-Bowden, David Stirling, Nodar Gogoberidze, and others (Broad Institute)
Stable release
4.2.1 / July 22, 2021;2 years ago (2021-07-22)
Repository
Operating system Any (Python-based)
Type Image processing & Image analysis
License BSD 3-clause
Website www.cellprofiler.org

CellProfiler [1] [2] is free, open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically. Advanced algorithms for image analysis are available as individual modules that can be placed in sequential order together to form a pipeline; the pipeline is then used to identify and measure biological objects and features in images, particularly those obtained through fluorescence microscopy.

Contents

Distributions are available for Microsoft Windows, macOS, and Linux. The source code for CellProfiler is freely available. [3] CellProfiler is developed by the Broad Institute's Imaging Platform. [4]

Features

CellProfiler can read and analyze most common microscopy image formats. [5] Biologists typically use CellProfiler to identify objects of interest (e.g. cells, colonies, C. elegans worms) and then measure their properties of interest. [6] Specialized modules for illumination correction may be applied as pre-processing step to remove distortions due to uneven lighting. [7] Object identification (segmentation) is performed through machine learning or image thresholding, recognition and division of clumped objects, and removal or merging of objects on the basis of size or shape. [8] Each of these steps are customizable by the user for their unique image assay.

A wide variety of measurements can be generated for each identified cell or subcellular compartment, including morphology, intensity, and texture among others. These measurements are accessible by using built-in viewing and plotting data tools, exporting in a comma-delimited spreadsheet format, [9] or importing into a MySQL or SQLite database. [10]

CellProfiler interfaces with the high-performance scientific libraries NumPy and SciPy for many mathematical operations, the Open Microscopy Environment [11] Consortium’s Bio-Formats library for reading more than 100 image file formats, ImageJ for use of plugins and macros, and ilastik for pixel-based classification. [12] While designed and optimized for large numbers of two-dimensional images (the most common high-content screening image format), CellProfiler supports analysis of small-scale experiments and time-lapse movies. [13]

History

CellProfiler was released in December 2005 by scientists from the Whitehead Institute for Biomedical Research and Massachusetts Institute of Technology. [14] It is currently developed and maintained by the Cimini Lab at the Imaging Platform of the Broad Institute. [15]

Originally developed in MATLAB, [14] it was re-written in Python and released as CellProfiler 2.0 in 2010. [2] Version 3.0, supporting volumetric analysis of 3D image stacks and optional deep learning modules, was released in October 2017. [16] CellProfiler 4.0 was released in September 2020 and focused on speed, usability, and utility improvements with most notable example of migration to Python 3. [17]

Community

Because CellProfiler is a free, open-source project, anyone can develop their own image processing algorithms as a new module for CellProfiler and contribute it to the project. [18] The CellProfiler website contains a forum for discussion where new users can have their questions answered, usually by the creators of the project. [19]

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. It can be performed on the entire genome, transcriptome or proteome of an organism, and can also involve only selected segments or regions, like tandem repeats and transposable elements. Methodologies used include sequence alignment, searches against biological databases, and others.

BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.

<span class="mw-page-title-main">Orange (software)</span> Open-source data analysis software

Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for explorative qualitative data analysis and interactive data visualization.

<span class="mw-page-title-main">ImageJ</span> Java-based image processing program

ImageJ is a Java-based image processing program developed at the National Institutes of Health and the Laboratory for Optical and Computational Instrumentation. Its first version, ImageJ 1.x, is developed in the public domain, while ImageJ2 and the related projects SciJava, ImgLib2, and SCIFIO are licensed with a permissive BSD-2 license. ImageJ was designed with an open architecture that provides extensibility via Java plugins and recordable macros. Custom acquisition, analysis and processing plugins can be developed using ImageJ's built-in editor and a Java compiler. User-written plugins make it possible to solve many image processing and analysis problems, from three-dimensional live-cell imaging to radiological image processing, multiple imaging system data comparisons to automated hematology systems. ImageJ's plugin architecture and built-in development environment has made it a popular platform for teaching image processing.

High-content screening (HCS), also known as high-content analysis (HCA) or cellomics, is a method that is used in biological research and drug discovery to identify substances such as small molecules, peptides, or RNAi that alter the phenotype of a cell in a desired manner. Hence high content screening is a type of phenotypic screen conducted in cells involving the analysis of whole cells or components of cells with simultaneous readout of several parameters. HCS is related to high-throughput screening (HTS), in which thousands of compounds are tested in parallel for their activity in one or more biological assays, but involves assays of more complex cellular phenotypes as outputs. Phenotypic changes may include increases or decreases in the production of cellular products such as proteins and/or changes in the morphology of the cell. Hence HCA typically involves automated microscopy and image analysis. Unlike high-content analysis, high-content screening implies a level of throughput which is why the term "screening" differentiates HCS from HCA, which may be high in content but low in throughput.

GenePattern is a freely available computational biology open-source software package originally created and developed at the Broad Institute for the analysis of genomic data. Designed to enable researchers to develop, capture, and reproduce genomic analysis methodologies, GenePattern was first released in 2004. GenePattern is currently developed at the University of California, San Diego.

<span class="mw-page-title-main">Fiji (software)</span> Open-source image-processing software

Fiji is an open source image processing package based on ImageJ2.

Bioimage informatics is a subfield of bioinformatics and computational biology. It focuses on the use of computational techniques to analyze bioimages, especially cellular and molecular images, at large scale and high throughput. The goal is to obtain useful knowledge out of complicated and heterogeneous image and related metadata.

<span class="mw-page-title-main">Time-lapse microscopy</span> Type of microscopy

Time-lapse microscopy is time-lapse photography applied to microscopy. Microscope image sequences are recorded and then viewed at a greater speed to give an accelerated view of the microscopic process.

CellCognition is a free open-source computational framework for quantitative analysis of high-throughput fluorescence microscopy (time-lapse) images in the field of bioimage informatics and systems microscopy. The CellCognition framework uses image processing, computer vision and machine learning techniques for single-cell tracking and classification of cell morphologies. This enables measurements of temporal progression of cell phases, modeling of cellular dynamics and generation of phenotype map.

Neuronal tracing, or neuron reconstruction is a technique used in neuroscience to determine the pathway of the neurites or neuronal processes, the axons and dendrites, of a neuron. From a sample preparation point of view, it may refer to some of the following as well as other genetic neuron labeling techniques,

Flow cytometry bioinformatics is the application of bioinformatics to flow cytometry data, which involves storing, retrieving, organizing and analyzing flow cytometry data using extensive computational resources and tools. Flow cytometry bioinformatics requires extensive use of and contributes to the development of techniques from computational statistics and machine learning. Flow cytometry and related methods allow the quantification of multiple independent biomarkers on large numbers of single cells. The rapid growth in the multidimensionality and throughput of flow cytometry data, particularly in the 2000s, has led to the creation of a variety of computational analysis methods, data standards, and public databases for the sharing of results.

<span class="mw-page-title-main">Gene set enrichment analysis</span> Bioinformatics method

Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes, which are used for the analysis.

Vaa3D is an Open Source visualization and analysis software suite created mainly by Hanchuan Peng and his team at Janelia Research Campus, HHMI and Allen Institute for Brain Science. The software performs 3D, 4D and 5D rendering and analysis of very large image data sets, especially those generated using various modern microscopy methods, and associated 3D surface objects. This software has been used in several large neuroscience initiatives and a number of applications in other domains. In a recent Nature Methods review article, it has been viewed as one of the leading open-source software suites in the related research fields. In addition, research using this software was awarded the 2012 Cozzarelli Prize from the National Academy of Sciences.

BisQue is a free, open source web-based platform for the exchange and exploration of large, complex datasets. It is being developed at the Vision Research Lab at the University of California, Santa Barbara. BisQue specifically supports large scale, multi-dimensional multimodal-images and image analysis. Metadata is stored as arbitrarily nested and linked tag/value pairs, allowing for domain-specific data organization. Image analysis modules can be added to perform complex analysis tasks on compute clusters. Analysis results are stored within the database for further querying and processing. The data and analysis provenance is maintained for reproducibility of results. BisQue can be easily deployed in cloud computing environments or on computer clusters for scalability. BisQue has been integrated into the NSF Cyberinfrastructure project CyVerse. The user interacts with BisQue via any modern web browser.

Badrinath "Badri" Roysam is an Indian-American professor and researcher. He is the current chairman of the Department of Electrical and Computer Engineering at the University of Houston Cullen College of Engineering. Dr. Roysam is notable as the creator of the FARSIGHT project, which is a collaborative effort to create an open source software toolkit to analyze multidimensional images. Roysam's work as a researcher focuses on cancer immunotherapy and neuroscience.

<span class="mw-page-title-main">Anne E. Carpenter</span> American scientist

Anne E. Carpenter is an American scientist in the field of image analysis for cell biology and artificial intelligence for drug discovery. She is the co-creator of CellProfiler, open-source software for high-throughput biological image analysis, and a co-inventor of the Cell Painting assay, a method for image-based profiling. She is an Institute Scientist and Senior Director of the Imaging Platform at the Broad Institute.


The Cell Painting assay is a high-content, high-throughput imaging technique used to capture a wide array of cellular phenotypes in response to diverse perturbations. These phenotypes, often termed "morphological profiles", can be used to understand various biological phenomena, including cellular responses to genetic changes, drug treatments, and other environmental changes. This has been adopted by many pharmaceutical companies in profiling compounds including Recursion Pharmaceutical and AstraZeneca

References

  1. Carpenter AE, Jones TR, Lamprecht MR, Clarke C, Kang IH, Friman O, Guertin DA, Chang JH, Lindquist RA, Moffat J, Golland P, Sabatini DM (2006). "CellProfiler: image analysis software for identifying and quantifying cell phenotypes". Genome Biology. 7 (10): R100. doi: 10.1186/gb-2006-7-10-r100 . PMC   1794559 . PMID   17076895.
  2. 1 2 Kamentsky L, Jones TR, Fraser A, Bray MA, Logan DJ, Madden KL, Ljosa V, Rueden C, Eliceiri KW, Carpenter AE (April 2011). "Improved structure, function and compatibility for CellProfiler: modular high-throughput image analysis software". Bioinformatics. 27 (8): 1179–80. doi:10.1093/bioinformatics/btr095. PMC   3072555 . PMID   21349861.
  3. "CellProfiler wiki". GitHub . December 2016.
  4. "Imaging Platform". Broad Institute . 2018.
  5. "CellProfiler — Bio-Formats 5.2.1 documentation". www.openmicroscopy.org. Retrieved 2016-08-29.[ permanent dead link ]
  6. Lamprecht, Michael R.; Sabatini, David M.; Carpenter, Anne E. (2007-01-01). "CellProfiler: free, versatile software for automated biological image analysis". BioTechniques. 42 (1): 71–75. doi: 10.2144/000112257 . ISSN   0736-6205. PMID   17269487.
  7. Singh, S.; Bray, M.-A.; Jones, T. R.; Carpenter, A. E. (2014-12-01). "Pipeline for illumination correction of images for high-throughput microscopy". Journal of Microscopy. 256 (3): 231–236. doi:10.1111/jmi.12178. ISSN   1365-2818. PMC   4359755 . PMID   25228240.
  8. "IdentifyPrimaryObjects". d1zymp9ayga15t.cloudfront.net. Retrieved 2016-08-29.
  9. "ExportToSpreadsheet". d1zymp9ayga15t.cloudfront.net. Retrieved 2016-08-29.
  10. "ExportToDatabase". d1zymp9ayga15t.cloudfront.net. Retrieved 2016-08-29.
  11. "Open Microscopy Environment" . Retrieved 2018-05-07.
  12. "CellProfiler/CellProfiler". GitHub. Retrieved 2016-08-29.
  13. Bray, Mark-Anthony; Carpenter, Anne E. (2015-01-01). "CellProfiler Tracer: exploring and validating high-throughput, time-lapse microscopy image data". BMC Bioinformatics. 16: 368. doi: 10.1186/s12859-015-0759-x . ISSN   1471-2105. PMC   4634901 . PMID   26537300.
  14. 1 2 "What Is the Key Best Practice for Collaborating with a Computational Biologist?". Cell Systems. 3 (1): 7–11. 2016. doi: 10.1016/j.cels.2016.07.006 . PMID   27467242.
  15. "Cimini Lab" . Retrieved 2022-08-10.
  16. Carpenter, AE (2017-10-16). "CellProfiler 3.0 release: faster, better, and 3D". CellProfiler Blog.
  17. "CellProfiler 4.0 Release: Improvements in speed, utility, and usability". carpenterlab.broadinstitute.org. Retrieved 2020-10-16.
  18. "CellProfiler/CellProfiler". GitHub. Retrieved 2016-08-29.
  19. "CellProfiler". forum.cellprofiler.org. Retrieved 2016-08-29.