SNP array

Last updated

In molecular biology, SNP array is a type of DNA microarray which is used to detect polymorphisms within a population. A single nucleotide polymorphism (SNP), a variation at a single site in DNA, is the most frequent type of variation in the genome. Around 335 million SNPs have been identified in the human genome, [1] 15 million of which are present at frequencies of 1% or higher across different populations worldwide. [2]

Contents

Principles

The basic principles of SNP array are the same as the DNA microarray. These are the convergence of DNA hybridization, fluorescence microscopy, and solid surface DNA capture. The three mandatory components of the SNP arrays are: [3]

  1. An array containing immobilized allele-specific oligonucleotide (ASO) probes.
  2. Fragmented nucleic acid sequences of target, labelled with fluorescent dyes.
  3. A detection system that records and interprets the hybridization signal.

The ASO probes are often chosen based on sequencing of a representative panel of individuals: positions found to vary in the panel at a specified frequency are used as the basis for probes. SNP chips are generally described by the number of SNP positions they assay. Two probes must be used for each SNP position to detect both alleles; if only one probe were used, experimental failure would be indistinguishable from homozygosity of the non-probed allele. [4]

Applications

DNA copy number profile for the T47D breast cancer cell line (Affymetrix SNP Array) LRR and BAF profiles for the T47D breast cancer cell line top.svg
DNA copy number profile for the T47D breast cancer cell line (Affymetrix SNP Array)
LOH profile for the T47D breast cancer cell line (Affymetrix SNP Array) LRR and BAF profiles for the T47D breast cancer cell line bottom.svg
LOH profile for the T47D breast cancer cell line (Affymetrix SNP Array)

An SNP array is a useful tool for studying slight variations between whole genomes. The most important clinical applications of SNP arrays are for determining disease susceptibility [5] and for measuring the efficacy of drug therapies designed specifically for individuals. [6] In research, SNP arrays are most frequently used for genome-wide association studies. [7] Each individual has many SNPs. SNP-based genetic linkage analysis can be used to map disease loci, and determine disease susceptibility genes in individuals. The combination of SNP maps and high density SNP arrays allows SNPs to be used as markers for genetic diseases that have complex traits. For example, genome-wide association studies have identified SNPs associated with diseases such as rheumatoid arthritis [8] and prostate cancer. [9] A SNP array can also be used to generate a virtual karyotype using software to determine the copy number of each SNP on the array and then align the SNPs in chromosomal order. [10]

SNPs can also be used to study genetic abnormalities in cancer. For example, SNP arrays can be used to study loss of heterozygosity (LOH). LOH occurs when one allele of a gene is mutated in a deleterious way and the normally-functioning allele is lost. LOH occurs commonly in oncogenesis. For example, tumor suppressor genes help keep cancer from developing. If a person has one mutated and dysfunctional copy of a tumor suppressor gene and his second, functional copy of the gene gets damaged, they may become more likely to develop cancer. [11]

Other chip-based methods such as comparative genomic hybridization can detect genomic gains or deletions leading to LOH. SNP arrays, however, have an additional advantage of being able to detect copy-neutral LOH (also called uniparental disomy or gene conversion). Copy-neutral LOH is a form of allelic imbalance. In copy-neutral LOH, one allele or whole chromosome from a parent is missing. This problem leads to duplication of the other parental allele. Copy-neutral LOH may be pathological. For example, say that the mother's allele is wild-type and fully functional, and the father's allele is mutated. If the mother's allele is missing and the child has two copies of the father's mutant allele, disease can occur.

High density SNP arrays help scientists identify patterns of allelic imbalance. These studies have potential prognostic and diagnostic uses. Because LOH is so common in many human cancers, SNP arrays have great potential in cancer diagnostics. For example, recent SNP array studies have shown that solid tumors such as gastric cancer and liver cancer show LOH, as do non-solid malignancies such as hematologic malignancies, ALL, MDS, CML and others. These studies may provide insights into how these diseases develop, as well as information about how to create therapies for them. [12]

Breeding in a number of animal and plant species has been revolutionized by the emergence of SNP arrays. The method is based on the prediction of genetic merit by incorporating relationships among individuals based on SNP array data. [13] This process is known as genomic selection. Crop-specific arrays find use in agriculture. [14] [15]

Related Research Articles

An allele is a variation of the same sequence of nucleotides at the same place on a long DNA molecule, as described in leading textbooks on genetics and evolution. The word is a short form of "allelomorph".

Genomic imprinting is an epigenetic phenomenon that causes genes to be expressed or not, depending on whether they are inherited from the mother or the father. Genes can also be partially imprinted. Partial imprinting occurs when alleles from both parents are differently expressed rather than complete expression and complete suppression of one parent's allele. Forms of genomic imprinting have been demonstrated in fungi, plants and animals. In 2014, there were about 150 imprinted genes known in mice and about half that in humans. As of 2019, 260 imprinted genes have been reported in mice and 228 in humans.

A microsatellite is a tract of repetitive DNA in which certain DNA motifs are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. They have a higher mutation rate than other areas of DNA leading to high genetic diversity. Microsatellites are often referred to as short tandem repeats (STRs) by forensic geneticists and in genetic genealogy, or as simple sequence repeats (SSRs) by plant geneticists.

<span class="mw-page-title-main">Single-nucleotide polymorphism</span> Single nucleotide in genomic DNA at which different sequence alternatives exist

In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome that is present in a sufficiently large fraction of considered population.

<span class="mw-page-title-main">Loss of heterozygosity</span>

Loss of heterozygosity (LOH) is a type of genetic abnormality in diploid organisms in which one copy of an entire gene and its surrounding chromosomal region are lost. Since diploid cells have two copies of their genes, one from each parent, a single copy of the lost gene still remains when this happens, but any heterozygosity is no longer present.

<span class="mw-page-title-main">Ancestry-informative marker</span>

In population genetics, an ancestry-informative marker (AIM) is a single-nucleotide polymorphism that exhibits substantially different frequencies between different populations. A set of many AIMs can be used to estimate the proportion of ancestry of an individual derived from each population.

Genotyping is the process of determining differences in the genetic make-up (genotype) of an individual by examining the individual's DNA sequence using biological assays and comparing it to another individual's sequence or a reference sequence. It reveals the alleles an individual has inherited from their parents. Traditionally genotyping is the use of DNA sequences to define biological populations by use of molecular tools. It does not usually involve defining the genes of an individual.

SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms (SNPs) between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation. An SNP is a single base pair mutation at a specific locus, usually consisting of two alleles. SNPs are found to be involved in the etiology of many human diseases and are becoming of particular interest in pharmacogenetics. Because SNPs are conserved during evolution, they have been proposed as markers for use in quantitative trait loci (QTL) analysis and in association studies in place of microsatellites. The use of SNPs is being extended in the HapMap project, which aims to provide the minimal set of SNPs needed to genotype the human genome. SNPs can also provide a genetic fingerprint for use in identity testing. The increase of interest in SNPs has been reflected by the furious development of a diverse range of SNP genotyping methods.

<span class="mw-page-title-main">Bisulfite sequencing</span> Lab procedure detecting 5-methylcytosines in DNA

Bisulfitesequencing (also known as bisulphite sequencing) is the use of bisulfite treatment of DNA before routine sequencing to determine the pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the most studied. In animals it predominantly involves the addition of a methyl group to the carbon-5 position of cytosine residues of the dinucleotide CpG, and is implicated in repression of transcriptional activity.

<span class="mw-page-title-main">Genome-wide association study</span> Study of genetic variants in different individuals

In genomics, a genome-wide association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

<span class="mw-page-title-main">1000 Genomes Project</span> International research effort on genetic variation

The 1000 Genomes Project, launched in January 2008, was an international research effort to establish by far the most detailed catalogue of human genetic variation. Scientists planned to sequence the genomes of at least one thousand anonymous participants from a number of different ethnic groups within the following three years, using newly developed technologies which were faster and less expensive. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journal Nature. In 2012, the sequencing of 1092 genomes was announced in a Nature publication. In 2015, two papers in Nature reported results and the completion of the project and opportunities for future research.

Virtual karyotype is the digital information reflecting a karyotype, resulting from the analysis of short sequences of DNA from specific loci all over the genome, which are isolated and enumerated. It detects genomic copy number variations at a higher resolution for level than conventional karyotyping or chromosome-based comparative genomic hybridization (CGH). The main methods used for creating virtual karyotypes are array-comparative genomic hybridization and SNP arrays.

Diversity Arrays Technology (DArT) is a high-throughput genetic marker technique that can detect allelic variations to provides comprehensive genome coverage without any DNA sequence information for genotyping and other genetic analysis. The general steps involve reducing the complexity of the genomic DNA with specific restriction enzymes, choosing diverse fragments to serve as representations for the parent genomes, amplify via polymerase chain reaction (PCR), insert fragments into a vector to be placed as probes within a microarray, then fluorescent targets from a reference sequence will be allowed to hybridize with probes and put through an imaging system. The objective is to identify and quantify various forms of DNA polymorphism within genomic DNA of sampled species.

Molecular Inversion Probe (MIP) belongs to the class of Capture by Circularization molecular techniques for performing genomic partitioning, a process through which one captures and enriches specific regions of the genome. Probes used in this technique are single stranded DNA molecules and, similar to other genomic partitioning techniques, contain sequences that are complementary to the target in the genome; these probes hybridize to and capture the genomic target. MIP stands unique from other genomic partitioning strategies in that MIP probes share the common design of two genomic target complementary segments separated by a linker region. With this design, when the probe hybridizes to the target, it undergoes an inversion in configuration and circularizes. Specifically, the two target complementary regions at the 5’ and 3’ ends of the probe become adjacent to one another while the internal linker region forms a free hanging loop. The technology has been used extensively in the HapMap project for large-scale SNP genotyping as well as for studying gene copy alterations and characteristics of specific genomic loci to identify biomarkers for different diseases such as cancer. Key strengths of the MIP technology include its high specificity to the target and its scalability for high-throughput, multiplexed analyses where tens of thousands of genomic loci are assayed simultaneously.

<span class="mw-page-title-main">Exome sequencing</span> Sequencing of all the exons of a genome

Exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome. It consists of two steps: the first step is to select only the subset of DNA that encodes proteins. These regions are known as exons—humans have about 180,000 exons, constituting about 1% of the human genome, or approximately 30 million base pairs. The second step is to sequence the exonic DNA using any high-throughput DNA sequencing technology.

<span class="mw-page-title-main">Restriction site associated DNA markers</span> Type of genetic marker

Restriction site associated DNA (RAD) markers are a type of genetic marker which are useful for association mapping, QTL-mapping, population genetics, ecological genetics and evolutionary genetics. The use of RAD markers for genetic mapping is often called RAD mapping. An important aspect of RAD markers and mapping is the process of isolating RAD tags, which are the DNA sequences that immediately flank each instance of a particular restriction site of a restriction enzyme throughout the genome. Once RAD tags have been isolated, they can be used to identify and genotype DNA sequence polymorphisms mainly in form of single nucleotide polymorphisms (SNPs). Polymorphisms that are identified and genotyped by isolating and analyzing RAD tags are referred to as RAD markers. Although genotyping by sequencing presents an approach similar to the RAD-seq method, they differ in some substantial ways.

The Center for Applied Genomics is a research center at the Children's Hospital of Philadelphia that focuses on genomics research and the utilization of basic research findings in the development of new medical treatments.

<span class="mw-page-title-main">Gene polymorphism</span> Occurrence in an interbreeding population of two or more discontinuous genotypes

A gene is said to be polymorphic if more than one allele occupies that gene's locus within a population. In addition to having more than one allele at a specific locus, each allele must also occur in the population at a rate of at least 1% to generally be considered polymorphic.

Disease gene identification is a process by which scientists identify the mutant genotypes responsible for an inherited genetic disorder. Mutations in these genes can include single nucleotide substitutions, single nucleotide additions/deletions, deletion of the entire gene, and other genetic abnormalities.

Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.

References

  1. "dbSNP Summary". www.ncbi.nlm.nih.gov. Retrieved 4 October 2017.
  2. The 1000 Genomes Project Consortium (2010). "A map of human genome variation from population-scale sequencing". Nature. 467 (7319): 1061–1073. Bibcode:2010Natur.467.1061T. doi:10.1038/nature09534. ISSN   0028-0836. PMC   3042601 . PMID   20981092.
  3. LaFramboise, T. (1 July 2009). "Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances". Nucleic Acids Research. 37 (13): 4181–4193. doi:10.1093/nar/gkp552. PMC   2715261 . PMID   19570852.
  4. Rapley, Ralph; Harbron, Stuart (2004). Molecular analysis and genome discovery. Chichester [u.a.]: Wiley. ISBN   978-0-471-49919-0.
  5. Schaaf, Christian P.; Wiszniewska, Joanna; Beaudet, Arthur L. (22 September 2011). "Copy Number and SNP Arrays in Clinical Diagnostics". Annual Review of Genomics and Human Genetics. 12 (1): 25–51. doi:10.1146/annurev-genom-092010-110715. PMID   21801020.
  6. Alwi, Zilfalil Bin (2005). "The Use of SNPs in Pharmacogenomics Studies". The Malaysian Journal of Medical Sciences. 12 (2): 4–12. ISSN   1394-195X. PMC   3349395 . PMID   22605952.
  7. The International HapMap Consortium (2003). "The International HapMap Project" (PDF). Nature. 426 (6968): 789–796. Bibcode:2003Natur.426..789G. doi:10.1038/nature02168. hdl: 2027.42/62838 . ISSN   0028-0836. PMID   14685227. S2CID   4387110.
  8. Walsh, Alice M.; Whitaker, John W.; Huang, C. Chris; Cherkas, Yauheniya; Lamberth, Sarah L.; Brodmerkel, Carrie; Curran, Mark E.; Dobrin, Radu (30 April 2016). "Integrative genomic deconvolution of rheumatoid arthritis GWAS loci into gene and cell type associations". Genome Biology. 17 (1): 79. doi: 10.1186/s13059-016-0948-6 . PMC   4853861 . PMID   27140173.
  9. Amin Al Olama, A.; et al. (November 2010). "The genetics of type 2 diabetes: what have we learned from GWAS?". Annals of the New York Academy of Sciences. 1212 (1): 59–77. Bibcode:2010NYASA1212...59B. doi:10.1111/j.1749-6632.2010.05838.x. PMC   3057517 . PMID   21091714.
  10. Sato-Otsubo, Aiko; Sanada, Masashi; Ogawa, Seishi (February 2012). "Single-Nucleotide Polymorphism Array Karyotyping in Clinical Practice: Where, When, and How?". Seminars in Oncology. 39 (1): 13–25. doi:10.1053/j.seminoncol.2011.11.010. PMID   22289488.
  11. Zheng, Hai-Tao (2005). "Loss of heterozygosity analyzed by single nucleotide polymorphism array in cancer". World Journal of Gastroenterology. 11 (43): 6740–4. doi: 10.3748/wjg.v11.i43.6740 . PMC   4725022 . PMID   16425377.
  12. Mao, Xueying; Young, Bryan D; Lu, Yong-Jie (2007). "The Application of Single Nucleotide Polymorphism Microarrays in Cancer Research". Current Genomics. 8 (4): 219–228. doi:10.2174/138920207781386924. ISSN   1389-2029. PMC   2430687 . PMID   18645599.
  13. Meuwissen TH, Hayes BJ, Goddard ME (2001). "Prediction of total genetic value using genome-wide dense marker maps". Genetics. 157 (4): 1819–29. doi:10.1093/genetics/157.4.1819. PMC   1461589 . PMID   11290733.
  14. Hulse-Kemp, Amanda M; Lemm, Jana; Plieske, Joerg; Ashrafi, Hamid; Buyyarapu, Ramesh; Fang, David D; Frelichowski, James; Giband, Marc; Hague, Steve; Hinze, Lori L; Kochan, Kelli J; Riggs, Penny K; Scheffler, Jodi A; Udall, Joshua A; Ulloa, Mauricio; Wang, Shirley S; Zhu, Qian-Hao; Bag, Sumit K; Bhardwaj, Archana; Burke, John J; Byers, Robert L; Claverie, Michel; Gore, Michael A; Harker, David B; Islam, Mohammad Sariful; Jenkins, Johnie N; Jones, Don C; Lacape, Jean-Marc; Llewellyn, Danny J; Percy, Richard G; Pepper, Alan E; Poland, Jesse A; Mohan Rai, Krishan; Sawant, Samir V; Singh, Sunil Kumar; Spriggs, Andrew; Taylor, Jen M; Wang, Fei; Yourstone, Scott M; Zheng, Xiuting; Lawley, Cindy T; Ganal, Martin W; Van Deynze, Allen; Wilson, Iain W; Stelly, David M (2015-06-01). "Development of a 63K SNP Array for Cotton and High-Density Mapping of Intraspecific and Interspecific Populations of Gossypium spp". G3: Genes, Genomes, Genetics . Genetics Society of America (OUP). 5 (6): 1187–1209. doi:10.1534/g3.115.018416. ISSN   2160-1836. PMC   4478548 . PMID   25908569. S2CID   11590488.
  15. Rasheed, Awais; Hao, Yuanfeng; Xia, Xianchun; Khan, Awais; Xu, Yunbi; Varshney, Rajeev K.; He, Zhonghu (2017). "Crop Breeding Chips and Genotyping Platforms: Progress, Challenges, and Perspectives". Molecular Plant . Chin Acad Sci+Chin Soc Plant Bio+Shanghai Inst Bio Sci (Elsevier). 10 (8): 1047–1064. doi: 10.1016/j.molp.2017.06.008 . ISSN   1674-2052. PMID   28669791. S2CID   33780984.

Further reading