Lumi (software)

Last updated
lumi software
Stable release
2.0 / March 4, 2007
Operating system Linux, UNIX, Mac OS X, Windows
Platform R programming language and Bioconductor
Type Analysis of Illumina microarrays
License GNU General Public License
Website lumi software release website

lumi is a free, open source and open development software project for the analysis and comprehension of Illumina expression and methylation microarray data. The project was started in the summer of 2006 and set out to provide algorithms and data management tools of Illumina in the framework of Bioconductor. It is based on the statistical R programming language.

Contents

Features

The lumi package provides an analysis pipeline for probe-level Illumina expression and methylation microarray data, including probe-identifier management (nuID), updated probe-to-gene mapping and annotation using the latest release of RefSeq (nuIDblast), probe-intensity transformation (VST) and normalization (RSN), quality control (QA/QC) and preprocessing methods specific for Illumina methylation data. By extending the ExprSet object with Illumina-specific features, lumi is designed to work with other Bioconductor packages, such as Limma and GOstats to detect differential genes and conduct Gene Ontology analysis.

History

The lumi project was started in the summer of 2006 at the Bioinformatics Core Facility of the Robert H. Lurie Comprehensive Cancer Center, Northwestern University. Originally lumi was designed for the analysis of Illumina Expression BeadArray data. Starting from 2010 (version > 2.0), functions of analyzing Illumina methylation microarray data was added. The project team consists of Drs. Pan Du, Simon M. Lin, and Warren A. Kibbe. The project was started upon a request for collaboration from Dr. Serdar E. Bulun to analyze a set of new Illumina microarray data acquired at his lab on the study of the effect of retinoic acids on cancers. Dr. Pan Du led the software development of the project. lumi was the first software package to utilize the unique design of redundancy of beadArrays for the data transformation and normalization processes. The first release of lumi was on January 3, 2007 through the Bioconductor website. Before its formal release, it was beta-tested at Norwegian Radiumhospital, Leiden University Medical Center, Universiteit van Amsterdam, Università degli Studi di Brescia, UC Davis, Wayne State University, NIH, M.D. Anderson Cancer Center, Case Western Reserve University, Harvard University, Washington University, and Walter and Eliza Hall Institute of Medical Research.

See also

Related Research Articles

Microarray A small-scale two-dimensional array of samples on a solid support

A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample. It is a two-dimensional array on a solid substrate—usually a glass slide or silicon thin-film cell—that assays (tests) large amounts of biological material using high-throughput screening miniaturized, multiplexed and parallel processing and detection methods. The concept and methodology of microarrays was first introduced and illustrated in antibody microarrays by Tse Wen Chang in 1983 in a scientific publication and a series of patents. The "gene chip" industry started to grow significantly after the 1995 Science Magazine article by the Ron Davis and Pat Brown labs at Stanford University. With the establishment of companies, such as Affymetrix, Agilent, Applied Microarrays, Arrayjet, Illumina, and others, the technology of DNA microarrays has become the most sophisticated and the most widely used, while the use of protein, peptide and carbohydrate microarrays is expanding.

DNA microarray Collection of microscopic DNA spots attached to a solid surface

A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Each DNA spot contains picomoles of a specific DNA sequence, known as probes. These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA sample under high-stringency conditions. Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target. The original nucleic acid arrays were macro arrays approximately 9 cm × 12 cm and the first computerized image based analysis was published in 1981. It was invented by Patrick O. Brown. An example of its application is in SNPs arrays for polymorphisms in cardiovascular diseases, cancer, pathogens and GWAS analysis. It is also used for the identification of structural variations and the measurement of gene expression.

Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.

Methylation specific oligonucleotide microarray

Methylation specific oligonucleotide microarray, also known as MSO microarray, was developed as a technique to map epigenetic methylation changes in DNA of cancer cells.

Microarray analysis techniques

Microarray analysis techniques are used in interpreting the data generated from experiments on DNA, RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes - in many cases, an organism's entire genome - in a single experiment. Such experiments can generate very large amounts of data, allowing researchers to assess the overall state of a cell or organism. Data in such large quantities is difficult - if not impossible - to analyze without the help of computer programs.

ChIP-on-chip Molecular biology method

ChIP-on-chip is a technology that combines chromatin immunoprecipitation ('ChIP') with DNA microarray ("chip"). Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA in vivo. Specifically, it allows the identification of the cistrome, the sum of binding sites, for DNA-binding proteins on a genome-wide basis. Whole-genome analysis can be performed to determine the locations of binding sites for almost any protein of interest. As the name of the technique suggests, such proteins are generally those operating in the context of chromatin. The most prominent representatives of this class are transcription factors, replication-related proteins, like origin recognition complex protein (ORC), histones, their variants, and histone modifications.

Within computational biology, an MA plot is an application of a Bland–Altman plot for visual representation of genomic data. The plot visualizes the differences between measurements taken in two samples, by transforming the data onto M and A scales, then plotting these values. Though originally applied in the context of two channel DNA microarray gene expression data, MA plots are also used to visualise high-throughput sequencing analysis.

Tiling array

Tiling arrays are a subtype of microarray chips. Like traditional microarrays, they function by hybridizing labeled DNA or RNA target molecules to probes fixed onto a solid surface.

The nucleotide universal IDentifier (nuID) is designed to uniquely and globally identify oligonucleotide microarray probes.

The Illumina Methylation Assay using the Infinium I platform uses 'BeadChip' technology to generate a comprehensive genome-wide profiling of human DNA methylation. Similar to bisulfite sequencing and pyrosequencing, this method quantifies methylation levels at various loci within the genome. This assay is used for methylation probes on the Illumina Infinium HumanMethylation27 BeadChip. Probes on the 27k array target regions of the human genome to measure methylation levels at 27,578 CpG dinucleotides in 14,495 genes. The Infinium HumanMethylation450 BeadChip array targets > 450,000 methylation sites.

Methylated DNA immunoprecipitation is a large-scale purification technique in molecular biology that is used to enrich for methylated DNA sequences. It consists of isolating methylated DNA fragments via an antibody raised against 5-methylcytosine (5mC). This technique was first described by Weber M. et al. in 2005 and has helped pave the way for viable methylome-level assessment efforts, as the purified fraction of methylated DNA can be input to high-throughput DNA detection methods such as high-resolution DNA microarrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). Nonetheless, understanding of the methylome remains rudimentary; its study is complicated by the fact that, like other epigenetic properties, patterns vary from cell-type to cell-type.

Massive parallel signature sequencing (MPSS) is a procedure that is used to identify and quantify mRNA transcripts, resulting in data similar to serial analysis of gene expression (SAGE), although it employs a series of biochemical and sequencing steps that are substantially different.

Qlucore develops and markets Qlucore Omics Explorer, a D.I.Y next-generation bioinformatics software for research in life-science, plant and biotech industries, as well as academia.

Bayesian tool for methylation analysis

Bayesian tool for methylation analysis, also known as BATMAN, is a statistical tool for analysing methylated DNA immunoprecipitation (MeDIP) profiles. It can be applied to large datasets generated using either oligonucleotide arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq), providing a quantitative estimation of absolute methylation state in a region of interest.

Alicia Oshlack Australian bioinformatician

Alicia Yinema Kate Nungarai Oshlack is an Australian bioinformatician and is Co-Head of Computational Biology at the Peter MacCallum Cancer Centre in Melbourne, Victoria, Australia. She is best known for her work developing methods for the analysis of transcriptome data as a measure of gene expression. She has characterized the role of gene expression in human evolution by comparisons of humans, chimpanzees, orangutans, and rhesus macaques, and works collaboratively in data analysis to improve the use of clinical sequencing of RNA samples by RNAseq for human disease diagnosis.

Gene set enrichment analysis

Gene set enrichment analysis (GSEA) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with disease phenotypes. The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes which are used for the analysis.

Epigenome-wide association study

An epigenome-wide association study (EWAS) is an examination of a genome-wide set of quantifiable epigenetic marks, such as DNA methylation, in different individuals to derive associations between epigenetic variation and a particular identifiable phenotype/trait. When patterns change such as DNA methylation at specific loci, discriminating the phenotypically affected cases from control individuals, this is considered an indication that epigenetic perturbation has taken place that is associated, causally or consequentially, with the phenotype.

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.

Rafael Irizarry (scientist) American professor of biostatistics

Rafael Irizarry is a professor of biostatistics at the Harvard T.H. Chan School of Public Health and professor of biostatistics and computational biology at the Dana–Farber Cancer Institute. Irizarry is known as one of the founders of the Bioconductor project.