Tiling arrays are a subtype of microarray chips. Like traditional microarrays, they function by hybridizing labeled DNA or RNA target molecules to probes fixed onto a solid surface.
Tiling arrays differ from traditional microarrays in the nature of the probes. Instead of probing for sequences of known or predicted genes that may be dispersed throughout the genome, tiling arrays probe intensively for sequences which are known to exist in a contiguous region. This is useful for characterizing regions that are sequenced, but whose local functions are largely unknown. Tiling arrays aid in transcriptome mapping as well as in discovering sites of DNA/protein interaction (ChIP-chip, DamID), of DNA methylation (MeDIP-chip) and of sensitivity to DNase (DNase Chip) and array CGH. [1] In addition to detecting previously unidentified genes and regulatory sequences, improved quantification of transcription products is possible. Specific probes are present in millions of copies (as opposed to only several in traditional arrays) within an array unit called a feature, with anywhere from 10,000 to more than 6,000,000 different features per array. [2] Variable mapping resolutions are obtainable by adjusting the amount of sequence overlap between probes, or the amount of known base pairs between probe sequences, as well as probe length. For smaller genomes such as Arabidopsis , whole genomes can be examined. [3] Tiling arrays are a useful tool in genome-wide association studies.
The two main ways of synthesizing tiling arrays are photolithographic manufacturing and mechanical spotting or printing.
The first method involves in situ synthesis where probes, approximately 25bp, are built on the surface of the chip. These arrays can hold up to 6 million discrete features, each of which contains millions of copies of one probe.
The other way of synthesizing tiling array chips is via mechanically printing probes onto the chip. This is done by using automated machines with pins that place the previously synthesized probes onto the surface. Due to the size restriction of the pins, these chips can hold up to nearly 400,000 features. [4] Three manufacturers of tiling arrays are Affymetrix, NimbleGen and Agilent. Their products vary in probe length and spacing. ArrayExplorer.com is a free web-server to compare tiling arrays.
ChIP-chip is one of the most popular usages of tiling arrays. Chromatin immunoprecipitation allows binding sites of proteins to be identified. A genome-wide variation of this is known as ChIP-on-chip. Proteins that bind to chromatin are cross-linked in vivo, usually via fixation with formaldehyde. The chromatin is then fragmented and exposed to antibodies specific to the protein of interest. These complexes are then precipitated. The DNA is then isolated and purified. With traditional DNA microarrays, the immunoprecipitated DNA is hybridized to the chip, which contains probes that are designed to cover representative genome regions. Overlapping probes or probes in very close proximity can be used. This gives an unbiased analysis with high resolution. Besides these advantages, tiling arrays show high reproducibility and with overlapping probes spanning large segments of the genome, tiling arrays can interrogate protein binding sites, which harbor repeats. ChIP-chip experiments have been able to identify binding sites of transcription factors across the genome in yeast, drosophila and a few mammalian species. [5]
Another popular use of tiling arrays is in finding expressed genes. Traditional methods of gene prediction for annotation of genomic sequences have had problems when used to map the transcriptome, such as not producing an accurate structure of the genes and also missing transcripts entirely. The method of sequencing cDNA to find transcribed genes also runs into problems, such as failing to detect rare or very short RNA molecules, and so do not detect genes that are active only in response to signals or specific to a time frame. Tiling arrays can solve these issues. Due to the high resolution and sensitivity, even small and rare molecules can be detected. The overlapping nature of the probes also allows detection of non-polyadenylated RNA and can produce a more precise picture of gene structure. [6] Earlier studies on chromosome 21 and 22 showed the power of tiling arrays for identifying transcription units. [7] [8] [9] The authors used 25-mer probes that were 35bp apart, spanning the entire chromosomes. Labeled targets were made from polyadenylated RNA. They found many more transcripts than predicted and 90% were outside of annotated exons. Another study with Arabidopsis used high-density oligonucleotide arrays that cover the entire genome. More than 10 times more transcripts were found than predicted by ESTs[ clarification needed ] and other prediction tools. [3] [10] Also found were novel transcripts in the centromeric regions where it was thought that no genes are actively expressed. Many noncoding and natural antisense RNA have been identified using tiling arrays. [9]
Methyl-DNA immunoprecipitation followed by tiling array allows DNA methylation mapping and measurement across the genome. DNA is methylated on cytosine in CG di-nucleotides in many places in the genome. This modification is one of the best-understood inherited epigenetic changes and is shown to affect gene expression. Mapping these sites can add to the knowledge of expressed genes and also epigenetic regulation on a genome-wide level. Tiling array studies have generated high-resolution methylation maps for the Arabidopsis genome to generate the first "methylome".
DNase chip is an application of tiling arrays to identify hypersensitive sites, segments of open chromatin that are more readily cleaved by DNaseI. DNaseI cleaving produces larger fragments of around 1.2kb in size. These hypersensitive sites have been shown to accurately predict regulatory elements such as promoter regions, enhancers and silencers. [11] Historically, the method uses Southern blotting to find digested fragments. Tiling arrays have allowed researchers to apply the technique on a genome-wide scale.
Array-based CGH is a technique often used in diagnostics to compare differences between types of DNA, such as normal cells vs. cancer cells. Two types of tiling arrays are commonly used for array CGH, whole genome and fine tiled. The whole genome approach would be useful in identifying copy number variations with high resolution. On the other hand, fine-tiled array CGH would produce ultrahigh resolution to find other abnormalities such as breakpoints. [12]
Several different methods exist for tiling an array. One protocol for analyzing gene expression involves first isolating total RNA. This is then purified of rRNA molecules. The RNA is copied into double stranded DNA, which is subsequently amplified and in vitro transcribed to cRNA. The product is split into triplicates to produce dsDNA, which is then fragmented and labeled. Finally, the samples are hybridized to the tiling array chip. The signals from the chip are scanned and interpreted by computers.
Various software and algorithms are available for data analysis and vary in benefits depending on the manufacturer of the chip. For Affymetrix chips, the model-based analysis of tiling array (MAT) or hypergeometric analysis of tiling-arrays (HAT [13] ) are effective peak-seeking algorithms. For NimbleGen chips, TAMAL is more suitable for locating binding sites. Alternative algorithms include MA2C and TileScope, which are less complicated to operate. The Joint binding deconvolution algorithm is commonly used for Agilent chips. If sequence analysis of binding site or annotation of the genome is required then programs like MEME, Gibbs Motif Sampler, Cis-regulatory element annotation system and Galaxy are used. [4]
Tiling arrays provide an unbiased tool to investigate protein binding, gene expression and gene structure on a genome-wide scope. They allow a new level of insight in studying the transcriptome and methylome.
Drawbacks include the cost of tiling array kits. Although prices have fallen in the last several years, the price makes it impractical to use genome-wide tiling arrays for mammalian and other large genomes. Another issue is the "transcriptional noise" produced by its ultra-sensitive detection capability. [2] Furthermore, the approach provides no clearly defined start or stop to regions of interest identified by the array. Finally, arrays usually give only chromosome and position numbers, often necessitating sequencing as a separate step (although some modern arrays do give sequence information. [14] )
A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Each DNA spot contains picomoles of a specific DNA sequence, known as probes. These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA sample under high-stringency conditions. Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target. The original nucleic acid arrays were macro arrays approximately 9 cm × 12 cm and the first computerized image based analysis was published in 1981. It was invented by Patrick O. Brown. An example of its application is in SNPs arrays for polymorphisms in cardiovascular diseases, cancer, pathogens and GWAS analysis. Also for identification of structural variations and measurement of gene expression.
Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products. Sophisticated programs of gene expression are widely observed in biology, for example to trigger developmental pathways, respond to environmental stimuli, or adapt to new food sources. Virtually any step of gene expression can be modulated, from transcriptional initiation, to RNA processing, and to the post-translational modification of a protein. Often, one gene regulator controls another, and so on, in a gene regulatory network.
DNA-binding proteins are proteins that have DNA-binding domains and thus have a specific or general affinity for single- or double-stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that identify a base pair. However, there are some known minor groove DNA-binding ligands such as netropsin, distamycin, Hoechst 33258, pentamidine, DAPI and others.
Comparative genomic hybridization(CGH) is a molecular cytogenetic method for analysing copy number variations (CNVs) relative to ploidy level in the DNA of a test sample compared to a reference sample, without the need for culturing cells. The aim of this technique is to quickly and efficiently compare two genomic DNA samples arising from two sources, which are most often closely related, because it is suspected that they contain differences in terms of either gains or losses of either whole chromosomes or subchromosomal regions. This technique was originally developed for the evaluation of the differences between the chromosomal complements of solid tumor and normal tissue, and has an improved resolution of 5–10 megabases compared to the more traditional cytogenetic analysis techniques of giemsa banding and fluorescence in situ hybridization (FISH) which are limited by the resolution of the microscope utilized.
Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional “gene-by-gene” approach.
The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.
Fluorescence in situ hybridization (FISH) is a molecular cytogenetic technique that uses fluorescent probes that bind to only those parts of a nucleic acid sequence with a high degree of sequence complementarity. It was developed by biomedical researchers in the early 1980s to detect and localize the presence or absence of specific DNA sequences on chromosomes. Fluorescence microscopy can be used to find out where the fluorescent probe is bound to the chromosomes. FISH is often used for finding specific features in DNA for use in genetic counseling, medicine, and species identification. FISH can also be used to detect and localize specific RNA targets in cells, circulating tumor cells, and tissue samples. In this context, it can help define the spatial-temporal patterns of gene expression within cells and tissues.
A DNA-binding domain (DBD) is an independently folded protein domain that contains at least one structural motif that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence or have a general affinity to DNA. Some DNA-binding domains may also include nucleic acids in their folded structure.
A nuclear run-on assay is conducted to identify the genes that are being transcribed at a certain time point. Approximately one million cell nuclei are isolated and incubated with labeled nucleotides, and genes in the process of being transcribed are detected by hybridization of extracted RNA to gene specific probes on a blot. Garcia-Martinez et al. (2004) developed a protocol for the yeast S. cerevisiae that allows for the calculation of transcription rates (TRs) for all yeast genes to estimate mRNA stabilities for all yeast mRNAs.
An RNA spike-in is an RNA transcript of known sequence and quantity used to calibrate measurements in RNA hybridization assays, such as DNA microarray experiments, RT-qPCR, and RNA-Seq.
Molecular cytogenetics combines two disciplines, molecular biology and cytogenetics, and involves the analyzation of chromosome structure to help distinguish normal and cancer-causing cells. Human cytogenetics began in 1956 when it was discovered that normal human cells contain 46 chromosomes. However, the first microscopic observations of chromosomes were reported by Arnold, Flemming, and Hansemann in the late 1800s. Their work was ignored for decades until the actual chromosome number in humans was discovered as 46. In 1879, Arnold examined sarcoma and carcinoma cells having very large nuclei. Today, the study of molecular cytogenetics can be useful in diagnosing and treating various malignancies such as hematological malignancies, brain tumors, and other precursors of cancer. The field is overall focused on studying the evolution of chromosomes, more specifically the number, structure, function, and origin of chromosome abnormalities. It includes a series of techniques referred to as fluorescence in situ hybridization, or FISH, in which DNA probes are labeled with different colored fluorescent tags to visualize one or more specific regions of the genome. Introduced in the 1980s, FISH uses probes with complementary base sequences to locate the presence or absence of the specific DNA regions you are looking for. FISH can either be performed as a direct approach to metaphase chromosomes or interphase nuclei. Alternatively, an indirect approach can be taken in which the entire genome can be assessed for copy number changes using virtual karyotyping. Virtual karyotypes are generated from arrays made of thousands to millions of probes, and computational tools are used to recreate the genome in silico.
ChIP-on-chip is a technology that combines chromatin immunoprecipitation ('ChIP') with DNA microarray ("chip"). Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA in vivo. Specifically, it allows the identification of the cistrome, the sum of binding sites, for DNA-binding proteins on a genome-wide basis. Whole-genome analysis can be performed to determine the locations of binding sites for almost any protein of interest. As the name of the technique suggests, such proteins are generally those operating in the context of chromatin. The most prominent representatives of this class are transcription factors, replication-related proteins, like origin recognition complex protein (ORC), histones, their variants, and histone modifications.
ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.
RIP-chip is a molecular biology technique which combines RNA immunoprecipitation with a microarray. The purpose of this technique is to identify which RNA sequences interact with a particular RNA binding protein of interest in vivo. It can also be used to determine relative levels of gene expression, to identify subsets of RNAs which may be co-regulated, or to identify RNAs that may have related functions. This technique provides insight into the post-transcriptional gene regulation which occurs between RNA and RNA binding proteins.
Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome. The field is analogous to genomics and proteomics, which are the study of the genome and proteome of a cell. Epigenetic modifications are reversible modifications on a cell's DNA or histones that affect gene expression without altering the DNA sequence. Epigenomic maintenance is a continuous process and plays an important role in stability of eukaryotic genomes by taking part in crucial biological mechanisms like DNA repair. Plant flavones are said to be inhibiting epigenomic marks that cause cancers. Two of the most characterized epigenetic modifications are DNA methylation and histone modification. Epigenetic modifications play an important role in gene expression and regulation, and are involved in numerous cellular processes such as in differentiation/development and tumorigenesis. The study of epigenetics on a global level has been made possible only recently through the adaptation of genomic high-throughput assays.
Methylated DNA immunoprecipitation is a large-scale purification technique in molecular biology that is used to enrich for methylated DNA sequences. It consists of isolating methylated DNA fragments via an antibody raised against 5-methylcytosine (5mC). This technique was first described by Weber M. et al. in 2005 and has helped pave the way for viable methylome-level assessment efforts, as the purified fraction of methylated DNA can be input to high-throughput DNA detection methods such as high-resolution DNA microarrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). Nonetheless, understanding of the methylome remains rudimentary; its study is complicated by the fact that, like other epigenetic properties, patterns vary from cell-type to cell-type.
Copy number analysis usually refers to the process of analyzing data produced by a test for DNA copy number variation in patient's sample. Such analysis helps detect chromosomal copy number variation that may cause or may increase risks of various critical disorders. Copy number variation can be detected with various types of tests such as fluorescent in situ hybridization, comparative genomic hybridization and with high-resolution array-based tests based on array comparative genomic hybridization, SNP array technologies and high resolution microarrays that include copy number probes as well an SNPs. Array-based methods have been accepted as the most efficient in terms of their resolution and high-throughput nature and the highest coverage and they are also referred to as virtual karyotype. Data analysis for an array-based DNA copy number test can be very challenging though due to very high volume of data that come out of an array platform.
In genetics, DNase I hypersensitive sites (DHSs) are regions of chromatin that are sensitive to cleavage by the DNase I enzyme. In these specific regions of the genome, chromatin has lost its condensed structure, exposing the DNA and making it accessible. This raises the availability of DNA to degradation by enzymes, such as DNase I. These accessible chromatin zones are functionally related to transcriptional activity, since this remodeled state is necessary for the binding of proteins such as transcription factors.
Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology lies in understanding how the same genome can give rise to different cell types and how gene expression is regulated.
MNase-seq, short for micrococcal nuclease digestion with deep sequencing, is a molecular biological technique that was first pioneered in 2006 to measure nucleosome occupancy in the C. elegans genome, and was subsequently applied to the human genome in 2008. Though, the term ‘MNase-seq’ had not been coined until a year later, in 2009. Briefly, this technique relies on the use of the non-specific endo-exonuclease micrococcal nuclease, an enzyme derived from the bacteria Staphylococcus aureus, to bind and cleave protein-unbound regions of DNA on chromatin. DNA bound to histones or other chromatin-bound proteins may remain undigested. The uncut DNA is then purified from the proteins and sequenced through one or more of the various Next-Generation sequencing methods.