Nested association mapping

Last updated

Nested association mapping (NAM) is a technique designed by the labs of Edward Buckler, James Holland, and Michael McMullen for identifying and dissecting the genetic architecture of complex traits in corn ( Zea mays ). It is important to note that nested association mapping (unlike association mapping) is a specific technique that cannot be performed outside of a specifically designed population such as the Maize NAM population, [1] the details of which are described below.

Contents

Theory behind NAM

NAM was created as a means of combining the advantages and eliminating the disadvantages of two traditional methods for identifying quantitative trait loci: linkage analysis and association mapping. Linkage analysis depends upon recent genetic recombination between two different plant lines (as the result of a genetic cross) to identify general regions of interest, with the advantage of requiring few genetic markers to ensure genome wide coverage and high statistical power per allele. Linkage analysis, however, has the disadvantages of low mapping resolution and low allele richness. Association mapping, by contrast, takes advantage of historic recombination, and is performed by scanning a genome for SNPs in linkage disequilibrium with a trait of interest. Association mapping has advantages over linkage analysis in that it can map with high resolution and has high allelic richness, however, it also requires extensive knowledge of SNPs within the genome and is thus only now becoming possible in diverse species such as maize.

NAM takes advantage of both historic and recent recombination events in order to have the advantages of low marker density requirements, high allele richness, high mapping resolution, and high statistical power, with none of the disadvantages of either linkage analysis or association mapping. [1] [2] In these regards, the NAM approach is similar in principle to the MAGIC lines and AMPRILs in Arabidopsis and the Collaborative Cross in mouse.

Creation of the maize NAM population

Twenty-five diverse corn lines were chosen as the parental lines for the NAM population in order to encompass the remarkable diversity of maize and preserve historic linkage disequilibrium. Each parental line was crossed to the B73 maize inbred (chosen as a reference line due to its use in the public maize sequencing project and wide deployment as one of the most successful commercial inbred lines) to create the F1 population. The F1 plants were then self-fertilized for six generations in order to create a total of 200 homozygous recombinant inbred lines (RILs) per family, for a total of 5000 RILs within the NAM population. The lines are publicly available through the USDA-ARS Maize Stock Center.

Each RIL was then genotyped with the same 1106 molecular markers (for this to be possible, the researchers selected markers for which B73 had a rare allele), in order to identify recombination blocks. After genotyping with the 1106 markers, each of the parental lines was either sequenced or high-density genotyped, and the results of that sequencing/genotyping overlaid on the recombination blocks identified for each RIL. The result was 5000 RILs that were either fully sequenced or high density genotyped that, due to genotyping with the common 1106 markers, could all be compared to each other and analyzed together (Figure 1). [1] [2]

Figure 1. Creation of the NAM population. NAM summary.jpg
Figure 1. Creation of the NAM population.

The second aspect of the NAM population characterization is the sequencing of the parental lines. This captures information on the natural variation that went into the population and a record of the extensive recombination captured in the history of maize variation. The first phase of this sequencing was by reduced representation sequencing using next generation sequencing technology, as report in Gore, Chia et al. in 2009. [3] This initial sequencing discovered 1.6 million variable regions in maize, which is now facilitating analysis of a wide range of traits.

Process

As with traditional QTL mapping strategies, the general goal in Nested Association Mapping is to correlate a phenotype of interest with specific genotypes. One of the creators' stated goals for the NAM population was to be able to perform genome-wide association studies in maize by looking for associations between SNPs within the NAM population and quantitative traits of interest (e.g. flowering time, plant height, carotene content). [1] As of 2009, however, the sequencing of the original parental lines was not yet completed to the degree necessary to perform these analyses. The NAM population has, however, been successfully used for linkage analysis. In the linkage study that has been released, the unique structure of the NAM population, described in the previous section, allowed for joint stepwise regression and joint inclusive composite interval mapping of the combined NAM families to identify QTLs for flowering time. [4]

Current use

The first publication in which NAM was used to identify QTLs was authored by the Buckler lab on the genetic architecture of maize flowering time, and published in the summer of 2009. [4] In this groundbreaking study, the authors scored days to silking, days to anthesis, and the silking-anthesis interval for nearly one million plants, then performed single and joint stepwise regression and inclusive composite interval mapping (ICIM) to identify 39 QTLs explaining 89% of the variance in days to silking and days to anthesis and 29 QTLs explaining 64% of the variance in the silking-anthesis interval. [4]

Ninety-eight percent of the flowering time QTLs identified in this paper were found to affect flowering time by less than one day (as compared to the B73 reference). These relatively small QTL effects, however, were also shown to sum for each family to equal large differences and changes in days to silking. Furthermore, it was observed that while most QTLs were shared between families, each family appears to have functionally distinct alleles for most QTLs. These observations led the authors to propose a model of "Common genes with uncommon variants" [4] to explain flowering time diversity in maize. They tested their model by documenting an allelic series in the previously studied maize flowering time QTL Vgt1 (vegetation-to-transition1) [5] by controlling for genetic background and estimating the effects of vgt1 in each family. They then went on to identify specific sequence variants that corresponded to the allelic series, including one allele containing a miniature transposon strongly associated with early flowering, and other alleles containing SNPs associated with later flowering. [4]

Maize NAMs have helped to map otherwise difficult traits conveying resistance to fungi including Kump et al 2011, for southern leaf blight resistance, and Poland et al 2011, for northern leaf blight resistance. [6]

Implications

Nested association mapping has tremendous potential for the investigation of agronomic traits in maize and other species. As the initial flowering time study demonstrates, NAM has the power to identify QTLs for agriculturally relevant traits and to relate those QTLs to homologs and candidate genes in non-maize species. Furthermore, the NAM lines become a powerful public resource for the maize community, and an opportunity for the sharing of maize germplasm as well as the results of maize studies via common databases (see external links), further facilitating future research into maize agricultural traits. Given that maize is one of the most important agricultural crops worldwide, such research has powerful implications for the genetic improvement of crops, and subsequently, worldwide food security. [4]

Similar designs are also being created for wheat, barley, sorghum, and Arabidopsis thaliana .

See also

Related Research Articles

Genetic linkage is the tendency of DNA sequences that are close together on a chromosome to be inherited together during the meiosis phase of sexual reproduction. Two genetic markers that are physically near to each other are unlikely to be separated onto different chromatids during chromosomal crossover, and are therefore said to be more linked than markers that are far apart. In other words, the nearer two genes are on a chromosome, the lower the chance of recombination between them, and the more likely they are to be inherited together. Markers on different chromosomes are perfectly unlinked, although the penetrance of potentially deleterious alleles may be influenced by the presence of other alleles, and these other alleles may be located on other chromosomes than that on which a particular potentially deleterious allele is located.

<span class="mw-page-title-main">Single-nucleotide polymorphism</span> Single nucleotide in genomic DNA at which different sequence alternatives exist

In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population, many publications do not apply such a frequency threshold.

A quantitative trait locus (QTL) is a locus that correlates with variation of a quantitative trait in the phenotype of a population of organisms. QTLs are mapped by identifying which molecular markers correlate with an observed trait. This is often an early step in identifying the actual genes that cause the trait variation.

<span class="mw-page-title-main">Gene mapping</span> Process of locating specific genes

Gene mapping or genome mapping describes the methods used to identify the location of a gene on a chromosome and the distances between genes. Gene mapping can also describe the distances between different sites within a gene.

Genetic association is when one or more genotypes within a population co-occur with a phenotypic trait more often than would be expected by chance occurrence.

<span class="mw-page-title-main">Ancestry-informative marker</span>

In population genetics, an ancestry-informative marker (AIM) is a single-nucleotide polymorphism that exhibits substantially different frequencies between different populations. A set of many AIMs can be used to estimate the proportion of ancestry of an individual derived from each population.

In molecular biology and other fields, a molecular marker is a molecule, sampled from some source, that gives information about its source. For example, DNA is a molecular marker that gives information about the organism from which it was taken. For another example, some proteins can be molecular markers of Alzheimer's disease in a person from which they are taken. Molecular markers may be non-biological. Non-biological markers are often used in environmental studies.

A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium that represents a group of SNPs called a haplotype. It is possible to identify genetic variation and association to phenotypes without genotyping every SNP in a chromosomal region. This reduces the expense and time of mapping genome areas associated with disease, since it eliminates the need to study every individual SNP. Tag SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs across the entire genome are genotyped.

In genetics, completelinkage is defined as the state in which two loci are so close together that alleles of these loci are virtually never separated by crossing over. The closer the physical location of two genes on the DNA, the less likely they are to be separated by a crossing-over event. In the case of male Drosophila there is complete absence of recombinant types due to absence of crossing over. This means that all of the genes that start out on a single chromosome, will end up on that same chromosome in their original configuration. In the absence of recombination, only parental phenotypes are expected.

Marker assisted selection or marker aided selection (MAS) is an indirect selection process where a trait of interest is selected based on a marker linked to a trait of interest, rather than on the trait itself. This process has been extensively researched and proposed for plant- and animal- breeding.

<span class="mw-page-title-main">Genome-wide association study</span> Study of genetic variants in different individuals

In genomics, a genome-wide association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

A doubled haploid (DH) is a genotype formed when haploid cells undergo chromosome doubling. Artificial production of doubled haploids is important in plant breeding.

In genetics, association mapping, also known as "linkage disequilibrium mapping", is a method of mapping quantitative trait loci (QTLs) that takes advantage of historic linkage disequilibrium to link phenotypes to genotypes, uncovering genetic associations.

In statistical genetics, inclusive composite interval mapping (ICIM) has been proposed as an approach to QTL mapping for populations derived from bi-parental crosses. QTL mapping is based on genetic linkage map and phenotypic data to attempt to locate individual genetic factors on chromosomes and to estimate their genetic effects.

<span class="mw-page-title-main">Restriction site associated DNA markers</span> Type of genetic marker

Restriction site associated DNA (RAD) markers are a type of genetic marker which are useful for association mapping, QTL-mapping, population genetics, ecological genetics and evolutionary genetics. The use of RAD markers for genetic mapping is often called RAD mapping. An important aspect of RAD markers and mapping is the process of isolating RAD tags, which are the DNA sequences that immediately flank each instance of a particular restriction site of a restriction enzyme throughout the genome. Once RAD tags have been isolated, they can be used to identify and genotype DNA sequence polymorphisms mainly in form of single nucleotide polymorphisms (SNPs). Polymorphisms that are identified and genotyped by isolating and analyzing RAD tags are referred to as RAD markers. Although genotyping by sequencing presents an approach similar to the RAD-seq method, they differ in some substantial ways.

A recombinant inbred strain or recombinant inbred line (RIL) is an organism with chromosomes that incorporate an essentially permanent set of recombination events between chromosomes inherited from two or more inbred strains. F1 and F2 generations are produced by intercrossing the inbred strains; pairs of the F2 progeny are then mated to establish inbred strains through long-term inbreeding.

Quantitative trait loci mapping or QTL mapping is the process of identifying genomic regions that potentially contain genes responsible for important economic, health or environmental characters. Mapping QTLs is an important activity that plant breeders and geneticists routinely use to associate potential causal genes with phenotypes of interest. Family-based QTL mapping is a variant of QTL mapping where multiple-families are used.

<span class="mw-page-title-main">Kompetitive allele specific PCR</span>

Kompetitive allele specific PCR (KASP) is a homogenous, fluorescence-based genotyping variant of polymerase chain reaction. It is based on allele-specific oligo extension and fluorescence resonance energy transfer for signal generation.

In the field of genetic sequencing, genotyping by sequencing, also called GBS, is a method to discover single nucleotide polymorphisms (SNP) in order to perform genotyping studies, such as genome-wide association studies (GWAS). GBS uses restriction enzymes to reduce genome complexity and genotype multiple DNA samples. After digestion, PCR is performed to increase fragments pool and then GBS libraries are sequenced using next generation sequencing technologies, usually resulting in about 100bp single-end reads. It is relatively inexpensive and has been used in plant breeding. Although GBS presents an approach similar to restriction-site-associated DNA sequencing (RAD-seq) method, they differ in some substantial ways.

<span class="mw-page-title-main">Edward Buckler</span> Plant geneticist

Edward S. Buckler is a plant geneticist with the USDA Agricultural Research Service and holds an adjunct appointment at Cornell University. His work focuses on both quantitative and statistical genetics in maize as well as other crops such as cassava. He originated the concept of Nested association mapping and created the first population designed for this type of quantitative genetic analysis. Buckler was elected an American Association for the Advancement of Science Fellow in 2012. In 2014, he was elected to the National Academy of Sciences. In 2017, he received the NAS prize in Food and Agricultural Science for his work using natural genetic diversity to develop varieties of maize with fifteen times more vitamin A than existing varieties.

References

  1. 1 2 3 4 Yu, J., Holland, J.B., McMullen, M.D., Buckler, E.S. (2008). "Genetic design and statistical power of nested association mapping in maize". Genetics. 178 (1): 539–551. doi:10.1534/genetics.107.074245. PMC   2206100 . PMID   18202393.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  2. 1 2 Michael D. McMullen; Stephen Kresovich; Hector Sanchez Villeda; Peter Bradbury; Huihui Li; Qi Sun; Sherry Flint-Garcia; Jeffry Thornsberry; Charlotte Acharya; Christopher Bottoms; Patrick Brown; Chris Browne; Magen Eller; Kate Guill; Carlos Harjes; Dallas Kroon; Nick Lepak; Sharon E. Mitchell; Brooke Peterson; Gael Pressoir; Susan Romero; Marco Oropeza Rosas; Stella Salvo; Heather Yates; Mark Hanson; Elizabeth Jones; Stephen Smith; Jeffrey C. Glaubitz; Major Goodman; Doreen Ware; James B. Holland; Edward S. Buckler (2009). "Genetic Properties of the Maize Nested Association Mapping Population". Science. 325 (737): 737–740. Bibcode:2009Sci...325..737M. doi:10.1126/science.1174320. PMID   19661427. S2CID   14667346.
  3. Gore MA, Chia JM, Elshire RJ, et al. (November 2009). "A first-generation haplotype map of maize". Science. 326 (5956): 1115–7. Bibcode:2009Sci...326.1115G. doi:10.1126/science.1177837. PMID   19965431. S2CID   206521881.
  4. 1 2 3 4 5 6 Edward S. Buckler; James B. Holland; Peter J. Bradbury; Charlotte B. Acharya; Patrick J. Brown; Chris Browne; Elhan Ersoz; Sherry Flint-Garcia; Arturo Garcia; Jeffrey C. Glaubitz; Major M. Goodman; Carlos Harjes; Kate Guill; Dallas E. Kroon; Sara Larsson; Nicholas K. Lepak; Huihui Li; Sharon E. Mitchell; Gael Pressoir; Jason A. Peiffer; Marco Oropeza Rosas; Torbert R. Rocheford; M. Cinta Romay; Susan Romero; Stella Salvo; Hector Sanchez Villeda; H. Sofia da Silva; Qi Sun; Feng Tian; Narasimham Upadyayula; Doreen Ware; Heather Yates; Jianming Yu; Zhiwu Zhang; Stephen Kresovich; Michael D. McMullen (2009). "The Genetic Architecture of Maize Flowering Time". Science. 325 (5941): 714–718. Bibcode:2009Sci...325..714B. doi:10.1126/science.1174276. PMID   19661422. S2CID   8297435.
  5. Salvi S, Sponza G, Morgante M, et al. (July 2007). "Conserved noncoding genomic sequences associated with a flowering-time quantitative trait locus in maize". Proc. Natl. Acad. Sci. U.S.A. 104 (27): 11376–81. Bibcode:2007PNAS..10411376S. doi: 10.1073/pnas.0704145104 . PMC   2040906 . PMID   17595297.
  6. Huang, Xuehui; Han, Bin (2014-04-29). "Natural Variations and Genome-Wide Association Studies in Crop Plants". Annual Review of Plant Biology . 65 (1). Annual Reviews: 531–551. doi:10.1146/annurev-arplant-050213-035715. ISSN   1543-5008. PMID   24274033.

Maize Databases: