Phenotype microarray

Last updated

The phenotype microarray approach is a technology for high-throughput phenotyping of cells. A phenotype microarray system enables one to monitor simultaneously the phenotypic reaction of cells to environmental challenges or exogenous compounds in a high-throughput manner. The phenotypic reactions are recorded as either end-point measurements or respiration kinetics similar to growth curves.

Contents

Usages

High-throughput phenotypic testing is increasingly important for exploring the biology of bacteria, fungi, yeasts, and animal cell lines such as human cancer cells. Just as DNA microarrays and proteomic technologies have made it possible to assay the expression level of thousands of genes or proteins all a once, phenotype microarrays (PMs) make it possible to quantitatively measure thousands of cellular phenotypes simultaneously. [1] The approach also offers potential for testing gene function and improving genome annotation. [2] In contrast to many of the hitherto available molecular high-throughput technologies, phenotypic testing is processed with living cells, thus providing comprehensive information about the performance of entire cells. The major applications of the PM technology are in the fields of systems biology, microbial cell physiology, microbiology, and taxonomy, [3] and mammalian cell physiology including clinical research such as on autism. [4] Advantages of PMs over standard growth curves are that cellular respiration can be measured in environmental conditions where cellular replication (growth) may not be possible, [5] and that it is more accurate than optical density, which can vary between different cellular morphologies. In addition, respiration reactions are usually detected much earlier than cellular growth. [6]

Technology

A sole carbon source that can be transported into a cell and metabolized to produce NADH engenders a redox potential and flow of electrons to reduce a tetrazolium dye, [7] such as tetrazolium violet, which produces a purple color. The more rapid this metabolic flow, the more quickly purple color forms. The formation of purple color is a positive reaction. interpreted such that the sole carbon source is used as an energy source. A microplate reader and incubation facility are needed to provide the appropriate incubation conditions, and to automatically read the intensity of colour formation during tetrazolium reduction in intervals of, e.g., 15 minutes.

The principal idea of retrieving information about the abilities of an organism and its special modes of action when making use of certain energy sources can be equivalently applied to other macro-nutrients such as nitrogen, sulfur, or phosphorus and their compounds and derivatives. As an extension, the impact of auxotrophic supplements or antibiotics, heavy metals or other inhibitory compounds on the respiration behaviour of the cells can be determined.

Data structure

During a positive reaction, the longitudinal kinetics are expected to appear as sigmoidal curves in analogy to typical bacterial growth curves. Comparable to bacterial growth curves, the respiration kinetic curves may provide valuable information coded in the length of the lag phase λ, the respiration rate μ (corresponding to the steepness of the slope), the maximum cell respiration A (corresponding to the maximum value recorded), and the area under the curve (AUC). In contrast to bacterial growth curves, there is typically no death phase in PMs, as the reduced tetrazolium dye is insoluble.

Software

Proprietary and commercially available software is available that provides a solution for storage, retrieval, and analysis of high throughput phenotype data. A powerful free and open source software is the "opm" package based on R. [8] [9] "opm" contains tools for analyzing PM data including management, visualization and statistical analysis of PM data, covering curve-parameter estimation, dedicated and customizable plots, metadata management, statistical comparison with genome and pathway annotations, automatic generation of taxonomic reports, data discretization for phylogenetic software and export in the YAML markup language. In conjunction with other R packages it was used to apply boosting to re-analyse autism PM data and detect more determining factors. [10] The "opm" package has been developed and is maintained at the Deutsche Sammlung von Mikroorganismen und Zellkulturen. Another free and open source software developed to analyze Phenotype Microarray data is "DuctApe", a Unix command-line tool that also correlates genomic data. [11] Other software tools are PheMaDB, [12] which provides a solution for storage, retrieval, and analysis of high throughput phenotype data, and the PMViewer software [13] which focuses on graphical display but does not enable further statistical analysis. The latter is not publicly available.

See also

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<i>Candida albicans</i> Species of fungus

Candida albicans is an opportunistic pathogenic yeast that is a common member of the human gut flora. It can also survive outside the human body. It is detected in the gastrointestinal tract and mouth in 40–60% of healthy adults. It is usually a commensal organism, but it can become pathogenic in immunocompromised individuals under a variety of conditions. It is one of the few species of the genus Candida that cause the human infection candidiasis, which results from an overgrowth of the fungus. Candidiasis is, for example, often observed in HIV-infected patients. C. albicans is the most common fungal species isolated from biofilms either formed on (permanent) implanted medical devices or on human tissue. C. albicans, C. tropicalis, C. parapsilosis, and C. glabrata are together responsible for 50–90% of all cases of candidiasis in humans. A mortality rate of 40% has been reported for patients with systemic candidiasis due to C. albicans. By one estimate, invasive candidiasis contracted in a hospital causes 2,800 to 11,200 deaths yearly in the US. Nevertheless, these numbers may not truly reflect the true extent of damage this organism causes, given new studies indicating that C. albicans can cross the blood–brain barrier in mice.

<span class="mw-page-title-main">Systems biology</span> Computational and mathematical modeling of complex biological systems

Systems biology is the computational and mathematical analysis and modeling of complex biological systems. It is a biology-based interdisciplinary field of study that focuses on complex interactions within biological systems, using a holistic approach to biological research.

The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.

CellProfiler is free, open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically. Advanced algorithms for image analysis are available as individual modules that can be placed in sequential order together to form a pipeline; the pipeline is then used to identify and measure biological objects and features in images, particularly those obtained through fluorescence microscopy.

High-content screening (HCS), also known as high-content analysis (HCA) or cellomics, is a method that is used in biological research and drug discovery to identify substances such as small molecules, peptides, or RNAi that alter the phenotype of a cell in a desired manner. Hence high content screening is a type of phenotypic screen conducted in cells involving the analysis of whole cells or components of cells with simultaneous readout of several parameters. HCS is related to high-throughput screening (HTS), in which thousands of compounds are tested in parallel for their activity in one or more biological assays, but involves assays of more complex cellular phenotypes as outputs. Phenotypic changes may include increases or decreases in the production of cellular products such as proteins and/or changes in the morphology of the cell. Hence HCA typically involves automated microscopy and image analysis. Unlike high-content analysis, high-content screening implies a level of throughput which is why the term "screening" differentiates HCS from HCA, which may be high in content but low in throughput.

Fluxomics describes the various approaches that seek to determine the rates of metabolic reactions within a biological entity. While metabolomics can provide instantaneous information on the metabolites in a biological sample, metabolism is a dynamic process. The significance of fluxomics is that metabolic fluxes determine the cellular phenotype. It has the added advantage of being based on the metabolome which has fewer components than the genome or proteome.

<span class="mw-page-title-main">ARID4A</span> Protein-coding gene in humans

AT rich interactive domain 4A (RBP1-like), also known as ARID4A, is a protein which in humans is encoded by the ARID4A gene.

Genetic heterogeneity occurs through the production of single or similar phenotypes through different genetic mechanisms. There are two types of genetic heterogeneity: allelic heterogeneity, which occurs when a similar phenotype is produced by different alleles within the same gene; and locus heterogeneity, which occurs when a similar phenotype is produced by mutations at different loci.

Biology data visualization is a branch of bioinformatics concerned with the application of computer graphics, scientific visualization, and information visualization to different areas of the life sciences. This includes visualization of sequences, genomes, alignments, phylogenies, macromolecular structures, systems biology, microscopy, and magnetic resonance imaging data. Software tools used for visualizing biological data range from simple, standalone programs to complex, integrated systems.

Epistasis refers to genetic interactions in which the mutation of one gene masks the phenotypic effects of a mutation at another locus. Systematic analysis of these epistatic interactions can provide insight into the structure and function of genetic pathways. Examining the phenotypes resulting from pairs of mutations helps in understanding how the function of these genes intersects. Genetic interactions are generally classified as either Positive/Alleviating or Negative/Aggravating. Fitness epistasis is positive when a loss of function mutation of two given genes results in exceeding the fitness predicted from individual effects of deleterious mutations, and it is negative when it decreases fitness. Ryszard Korona and Lukas Jasnos showed that the epistatic effect is usually positive in Saccharomyces cerevisiae. Usually, even in case of positive interactions double mutant has smaller fitness than single mutants. The positive interactions occur often when both genes lie within the same pathway Conversely, negative interactions are characterized by an even stronger defect than would be expected in the case of two single mutations, and in the most extreme cases the double mutation is lethal. This aggravated phenotype arises when genes in compensatory pathways are both knocked out.

Edward Marcotte is a professor of biochemistry at The University of Texas at Austin, working in genetics, proteomics, and bioinformatics. Marcotte is an example of a computational biologist who also relies on experiments to validate bioinformatics-based predictions.

Extracellular RNA (exRNA) describes RNA species present outside of the cells in which they were transcribed. Carried within extracellular vesicles, lipoproteins, and protein complexes, exRNAs are protected from ubiquitous RNA-degrading enzymes. exRNAs may be found in the environment or, in multicellular organisms, within the tissues or biological fluids such as venous blood, saliva, breast milk, urine, semen, menstrual blood, and vaginal fluid. Although their biological function is not fully understood, exRNAs have been proposed to play a role in a variety of biological processes including syntrophy, intercellular communication, and cell regulation. The United States National Institutes of Health (NIH) published in 2012 a set of Requests for Applications (RFAs) for investigating extracellular RNA biology. Funded by the NIH Common Fund, the resulting program was collectively known as the Extracellular RNA Communication Consortium (ERCC). The ERCC was renewed for a second phase in 2019.

<span class="mw-page-title-main">Gene set enrichment analysis</span> Bioinformatics method

Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes which are used for the analysis.

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.

Time-resolved RNA sequencing methods are applications of RNA-seq that allow for observations of RNA abundances over time in a biological sample or samples. Second-Generation DNA sequencing has enabled cost effective, high throughput and unbiased analysis of the transcriptome. Normally, RNA-seq is only capable of capturing a snapshot of the transcriptome at the time of sample collection. This necessitates multiple samplings at multiple time points, which increases both monetary and time costs for experiments. Methodological and technological innovations have allowed for the analysis of the RNA transcriptome over time without requiring multiple samplings at various time points.

In genetics and genetic epidemiology, a phenome-wide association study, abbreviated PheWAS, is a study design in which the association between single-nucleotide polymorphisms or other types of DNA variants is tested across a large number of different phenotypes. The aim of PheWAS studies is to examine the causal linkage between known sequence differences and any type of trait, including molecular, biochemical, cellular, and especially clinical diagnoses and outcomes. It is a complementary approach to the genome-wide association study, or GWAS, methodology. A fundamental difference between GWAS and PheWAS designs is the direction of inference: in a PheWAS it is from exposure to many possible outcomes, that is, from SNPs to differences in phenotypes and disease risk. In a GWAS, the polarity of analysis is from one or a few phenotypes to many possible DNA variants. The approach has proven useful in rediscovering previously reported genotype-phenotype associations, as well as in identifying new ones.

Minimum information standards are sets of guidelines and formats for reporting data derived by specific high-throughput methods. Their purpose is to ensure the data generated by these methods can be easily verified, analysed and interpreted by the wider scientific community. Ultimately, they facilitate the transfer of data from journal articles into databases in a form that enables data to be mined across multiple data sets. Minimal information standards are available for a vast variety of experiment types including microarray (MIAME), RNAseq (MINSEQE), metabolomics (MSI) and proteomics (MIAPE).

In molecular biology, a batch effect occurs when non-biological factors in an experiment cause changes in the data produced by the experiment. Such effects can lead to inaccurate conclusions when their causes are correlated with one or more outcomes of interest in an experiment. They are common in many types of high-throughput sequencing experiments, including those using microarrays, mass spectrometers, and single-cell RNA-sequencing data. They are most commonly discussed in the context of genomics and high-throughput sequencing research, but they exist in other fields of science as well.

<span class="mw-page-title-main">Cellular deconvolution</span> Set of computational techniques

Cellular deconvolution refers to computational techniques aiming at estimating the proportions of different cell types in samples collected from a tissue. For example, samples collected from the human brain are a mixture of various neuronal and glial cell types in different proportions, where each cell type has a diverse gene expression profile. Since most high-throughput technologies use bulk samples and measure the aggregated levels of molecular information for all cells in a sample, the measured values would be an aggregate of the values pertaining to the expression landscape of different cell types. Therefore, many downstream analyses such as differential gene expression might be confounded by the variations in cell type proportions when using the output of high-throughput technologies applied to bulk samples. The development of statistical methods to identify cell type proportions in large-scale bulk samples is an important step for better understanding of the relationship between cell type composition and diseases.

References

  1. Bochner, B.R. (2009), "Global phenotypic characterization of bacteria", FEMS Microbiology Reviews, 33 (1): 191–205, doi:10.1111/j.1574-6976.2008.00149.x, PMC   2704929 , PMID   19054113
  2. Bochner, B.R.; Gadzinski, P.; Panomitros, E. (2001), "Phenotype MicroArrays for High Throughput Phenotypic Testing and Assay of Gene Function", Genome Research, 11 (7): 1246–1255, doi:10.1101/gr.186501, PMC   311101 , PMID   11435407
  3. Montero-Calasanz, M.C.; Göker, M.; Pötter, G.; Rohde, M.; Spröer, C.; Schumann, P.; Klenk, A.A.; Gorbushina, H.-P. (2013), "Geodermatophilus telluris sp. nov., a novel actinomycete isolated from Saharan desert sand in Chad", International Journal of Systematic and Evolutionary Microbiology, 13 (Pt 6): 2254–2259, doi:10.1099/ijs.0.046888-0, hdl: 10033/299082 , PMID   23159748
  4. Boccuto, L.; Chen, C.-F.; Pittman, A.R.; Skinner, C.D.; McCartney, H.J.; Jones, K.; Bochner, B.R.; Stevenson, R.E.; Schwartz, C.E. (2013), "Decreased tryptophan metabolism in patients with autism spectrum disorders", Molecular Autism, 4 (16): 16, doi: 10.1186/2040-2392-4-16 , PMC   3680090 , PMID   23731516
  5. Omsland, A.; Cockrell, D.C.; Howe, D.; Fischer, E.R.; Virtaneva, K.; Sturdevant, D.E.; Porcella, S.F.; Heinzen, R.A. (2009), "Host cell-free growth of the Q fever bacterium Coxiella burnetii", Proceedings of the National Academy of Sciences of the United States of America, 106 (11): 4430–4434, Bibcode:2009PNAS..106.4430O, doi: 10.1073/pnas.0812074106 , PMC   2657411 , PMID   19246385
  6. Vaas, L.A.I.; Marheine, M.; Sikorski, J.; Göker, M.; Schumacher, M. (2013), "Impacts of pr-10a overexpression at the molecular and the phenotypic level", International Journal of Molecular Sciences, 14 (7): 15141–15166, doi: 10.3390/ijms140715141 , PMC   3742292 , PMID   23880863
  7. Bochner, B.R.; Savageau, M.A. (1977), "Generalized indicator plate for genetic, metabolic, and taxonomic studies with microorganisms", Applied and Environmental Microbiology, 33 (2): 434–444, doi:10.1128/AEM.33.2.434-444.1977, PMC   170700 , PMID   322611
  8. Vaas, L.A.I.; Sikorski, J.; Michael, V.; Göker, M.; Klenk, H.-P. (2012), "Visualization and curve-parameter estimation strategies for efficient exploration of Phenotype MicroArray kinetics", PLOS ONE, 7 (4): e34846, Bibcode:2012PLoSO...734846V, doi: 10.1371/journal.pone.0034846 , PMC   3334903 , PMID   22536335
  9. Vaas, L.A.I.; Sikorski, J.; Hofner, B.; Fiebig, A.; Buddruhs, N.; Klenk, H.-P.; Göker, M. (2013), "opm: An R Package for Analysing OmniLog® Phenotype MicroArray Data", Bioinformatics, 29 (14): 1823–4, doi: 10.1093/bioinformatics/btt291 , PMID   23740744
  10. Hofner, B.; Boccuto, L.; Göker, M. (2015), "Controlling false discoveries in high-dimensional situations: Boosting with stability selection", BMC Bioinformatics, 16: 144, doi: 10.1186/s12859-015-0575-3 , PMC   4464883 , PMID   25943565
  11. Galardini, M.; Mengoni, A.; Biondi, E.G.; Semeraro, R.; Florio, A.; Bazzicalupo, M.; Benedetti, A.; Mocali, S. (2013), "DuctApe: A suite for the analysis and correlation of genomic and OmniLog™ Phenotype Microarray data", Genomics, 103 (1): 1–10, doi: 10.1016/j.ygeno.2013.11.005 , PMID   24316132
  12. Chang, W.; Sarver, K.; Higgs, B.; Read, T.; Nolan, N.; Chapman, C.; Bishop-Lilly, K.; Sozhamannan, S. (2011), "PheMaDB: A solution for storage, retrieval, and analysis of high throughput phenotype data", BMC Bioinformatics, 12: 109, doi: 10.1186/1471-2105-12-109 , PMC   3097161 , PMID   21507258
  13. Borglin, S.; Joyner, D.; Jacobsen, J.; Mukhopadhyay, A.; Hazen, T.C. (2009), "Overcoming the anaerobic hurdle in phenotypic microarrays: Generation and visualization of growth curve data for Desulfovibrio vulgaris Hildenborough" (PDF), Journal of Microbiological Methods, 76 (2): 159–168, doi:10.1016/j.mimet.2008.10.003, PMID   18996155