Family-based QTL mapping

Last updated

Quantitative trait loci mapping or QTL mapping is the process of identifying genomic regions that potentially contain genes responsible for important economic, health or environmental characters. Mapping QTLs is an important activity that plant breeders and geneticists routinely use to associate potential causal genes with phenotypes of interest. Family-based QTL mapping is a variant of QTL mapping where multiple-families are used.

Contents

Pedigree in humans and wheat

Pedigree information include information about ancestry. Keeping pedigree records is a centuries-old tradition. Pedigrees can also be verified using gene-marker data.

A complex five generation plant pedigree drawn with Pedimap Complex pedigree network.jpg
A complex five generation plant pedigree drawn with Pedimap

In plants

The method has been discussed in the context of plant breeding populations. [1] Pedigree records are kept by plants breeders and pedigree-based selection is popular in several plant species. Plant pedigrees are different from that of humans, particularly as plant are hermaphroditic – an individual can be male or female and mating can be performed in random combinations, with inbreeding loops. Also plant pedigrees may contain of "selfs", i.e. offspring resulting from self-pollination of a plant.

Pedigree denotation

           SIMPLE CROSS SYMBOL                    Example         / first order cross                          SON 64/KLRE          //, second order cross                       IR 64/KLRE // CIAN0            /3/, third order cross                   TOBS /3/ SON 64/KLRE // CIAN0           /4/, fourth order cross                  TOBS /3/ SON 64/KLRE // CIAN0 /4/ SEE             /n/, nth order cross                 BACK CROSS SYMBOL                *n                                    n  number of times the back cross parent used                                                      left side simple cross symbol,                                                      back cross parent is the female,                                                      right side – male,                                                      Example: SEE/3*ANE, TOBS*6/CIAN0
Example pedigree of Sonalika (SONALIKA = = I53.388/AN//YT54/N10B/3/Lerma Rojo/4/B4946.A.4.18.2.IY/Y53//3*Y50 drawn using Pedimap Sonalika pedigree.jpg
Example pedigree of Sonalika (SONALIKA = = I53.388/AN//YT54/N10B/3/Lerma Rojo/4/B4946.A.4.18.2.IY/Y53//3*Y50 drawn using Pedimap

The idea of family-based QTL mapping comes from inheritance of marker alleles and its association with trait of interest [1] has demonstrated how to use family-based association in plant breeding families.

Limitation of conventional methods

Traditional mapping populations include single family consisting of crossing between two parents or three parents often distantly related. There are some important limitations associated with traditional mapping methods. Some of which include limited polymorphism rates, and no indication of marker effectiveness in multiple genetic backgrounds. Often, by the time a QTL mapping population is developed and mapped, breeders have introgressed the new QTL using traditional breeding and selection methods. This can reduce the usefulness of MAS (marker-assisted selection) within breeding programs at the time when MAS could be most useful (i.e., shortly after new QTL are identified). [2] Family-based QTL mapping removes this limitation by using existing plant breeding families.

Pedigree marker information.jpg

Common study population mapping

Broadly, there are 3 classes of study designs: study designs in which large sets of relatives from extended or nuclear families are sampled, study designs in which pairs of relatives are sampled (e.g., sibling pairs) or study designs in which unrelated individuals are sampled.

Unrelated individuals

Natural collection of individuals (considered unrelated) with unknown pedigree constitutes mapping populations. The population based association mapping technique are based on this type of populations. In plant context such population are hard to find as most of individuals are someway related. Other disadvantage of such method is that even if we can find such a population, it is difficult to find high allele frequency for allele of interest (usually mutant)in such situation. For purpose of create balance in allele frequency, usually case-control studies.

Case control.jpg

Sibpairs

Such design include a pair of sibs from multiple independent families. The members in each sibpairs are not randomly chosen – often both siblings are chosen from one tail (upper or lower) of the distribution of the QT (concordant siblings) or one sibling is chosen from the upper tail and the other sibling is chosen from the lower tail (discordant siblings). Another sampling design could include a pair of siblings, one chosen from the upper or lower tail of the distribution and the other chosen randomly from among the remaining siblings.

Sibapirs.jpg

Trios

Trios include parents and one offspring (most affected). Trios are more commonly used in association studies. The concept of association mapping that each trio are unrelated, however trios are related in themselves.

 Trios.jpg 

Nuclear family

Nuclear family consists of two generation simple family pedigree.

Nuclear families.jpg

Extended pedigrees

In extended pedigree include multiple generation pedigree. It can be as deep or wide as the pedigree information is available. Extended pedigree are attractive for linkage-based analysis.

Extended pedigree.jpg

Linkage vs association analysis

Linkage and association analysis are primary tools for gene discovery, localization and functional analysis. [3] [4] While conceptual underpinning of these approaches have been long known, advances in recent decades in molecular genetics, development in efficient algorithms, and computing power have enabled the large scale application of these methods. While linkage studies seek to identify loci cosegregate with the trait within families, association studies seek to identify particular variants that are associated with the phenotype at the population level. These are complementary methods that, together, provide means to probe the genome and describe etiology of complex traits. In linkage studies, we seek to identify the loci that cosegregate with a specific genomic region, tagged by polymorphic markers, within families. In contrast, in association studies, we seek a correlation between a specific genetic variation and trait variation in sample of individuals, implicating a causal role of the variant.

Family-based linkage analysis

Genetic linkage is the phenomenon where by alleles at different loci cosegregate in families. The strength of cosegregation is measured by the recombination fraction θ, the probability of an odd number of recombination. More complex pedigree provide higher power. Identity by descent (IBD) matrix estimation is a central component in mapping of Quantitative Trait Loci (QTL) using variance component models. Alleles have identity by type (IBT) when they have the same phenotypic effect. Alleles that are identical by type fall into two groups; those that are identical by descent (IBD) because they arose from the same allele in an earlier generation; and those that are non-identical by descent (NIBD) or identical by state (IBS) because they arose from separate mutations. Parent-offspring pairs share 50% of their genes IBD, and monozygotic twins share 100% IBD. What is relevant in linkage analysis is the inheritance (or coinheritance) of alleles at adjacent loci; therefore; it is critical importance to determine whether the alleles are identical by descent (i.e. copies from same parental alleles) or only identical by state (i.e. appearing same, but derived from two different copies of alleles). Therefore, there three categories of family-based linkage analysis – strongly modeled (the traditional lod score model), weakly model based (variance components methods), or model free. Variance component methods may be viewed as hybrids.

Ibd concept.jpg

Family-based association analysis

Linkage disequilibrium (LD) and association mapping is receiving considerable attention in the plant genetics community for its potential to use existing genetic resources collections to fine map quantitative trait loci (QTL), validate candidate genes, and identify alleles of interest (Yu and Buckler, 2006). The three elements of particular importance for conducting association mapping or interpreting the results include:

  1. the analysis of population structure into subgroups,
  2. its use to control for spurious associations and consequences in the specific case of differential selection among subgroups, and
  3. the analysis of the local structure of LD into haplotypes and its consequences on the resolution and the application of LD mapping (Flint-Garcia et al. 2003).

In contrast to population-based association, family-based association tests are becoming more popular.

The family-based, Tran-disequilibirum test (TDT) has gained wide popularity in recent years,[ citation needed ] this method also focuses on alleles transmitted to affect offispring, but it is formulated to take account of both the linkage and the disequilibrium that underlie the association. The test requires genotype information on trio individuals, namely affected child and both biological parents; and at least one parent must be heterozygous for the test to be informative. The proposed test statistic is actually McNemar's chi-square statistic and tests the null hypothesis that the putative disease associated allele is transmitted 50% of the time from the heterogygous parents against the alternative hypothesis that the trait positive allele -associated allele will be transmitted more often. The TDT is not affected by population stratification and admixture. The concept of family-based test of association has been extended to quantitative traits.

Transmission disequilibrium test.jpg

Quantitative transmission disequilibrium test (QTDT)

The TDT has been extended in context of quantitative traits and nuclear or extended pedigree families. The generalized test allows to use any family type of families in testing. QTDT has also be been extended to haplotype-based association mapping. Haplotypes refer to combinations of marker alleles which are located closely together on the same chromosome and which tend to be inherited together. With availability of high density SNP makers, haplotypes play an important role in association studies. First – haplotypes are critical to understanding the LD pattern across the genome, which is essential for association studies. Actually there is no better way to understand LD pattern than to know the haplotypes themselves. Haplotypes tell us how alleles are organized along the chromosome and reflect the pattern of inheritance over evaluations. Second, methods based on haplotypes can be more powerful than those based on single markers in association studies of mapping complex trait genes.

Drawing family pedigrees

There are several pedigree drawing software available for human genetics context such as COPE (COllaborative Pedigree drawing Environment), CYRILLIC, FTM (Family Tree Maker), FTREE, KINDRED, PED (PEdigree Drawing software),PEDHUNTER, PEDIGRAPH, PEDIGREE/DRAW, PEDIGREE-VISUALIZER, PEDPLOT,PEDRAW/WPEDRAW (Pedigree Drawing/ Window Pedigree Drawing (MS-Window and X-Window version of PEDRAW)), PROGENY (Progeny Software, LLC) etc. However the pedigree drawing in plants requires some additional features such as inbreeding, selfing, mutation, polyploidy etc. which is supported in Pedimap. The pedimap can be used for pedigree visualization along with phenotypic, genotypic and ibd probabilities data in every type of plant pedigrees in both diploids and tetraploids.

See also

Related Research Articles

Genetic linkage is the tendency of DNA sequences that are close together on a chromosome to be inherited together during the meiosis phase of sexual reproduction. Two genetic markers that are physically near to each other are unlikely to be separated onto different chromatids during chromosomal crossover, and are therefore said to be more linked than markers that are far apart. In other words, the nearer two genes are on a chromosome, the lower the chance of recombination between them, and the more likely they are to be inherited together. Markers on different chromosomes are perfectly unlinked.

A quantitative trait locus (QTL) is a locus that correlates with variation of a quantitative trait in the phenotype of a population of organisms. QTLs are mapped by identifying which molecular markers correlate with an observed trait. This is often an early step in identifying and sequencing the actual genes that cause the trait variation.

A polygene is a member of a group of non-epistatic genes that interact additively to influence a phenotypic trait, thus contributing to multiple-gene inheritance, a type of non-Mendelian inheritance, as opposed to single-gene inheritance, which is the core notion of Mendelian inheritance. The term "monozygous" is usually used to refer to a hypothetical gene as it is often difficult to characterise the effect of an individual gene from the effects of other genes and the environment on a particular phenotype. Advances in statistical methodology and high throughput sequencing are, however, allowing researchers to locate candidate genes for the trait. In the case that such a gene is identified, it is referred to as a quantitative trait locus (QTL). These genes are generally pleiotropic as well. The genes that contribute to type 2 diabetes are thought to be mostly polygenes. In July 2016, scientists reported identifying a set of 355 genes from the last universal common ancestor (LUCA) of all organisms living on Earth.

Genetic association is when one or more genotypes within a population co-occur with a phenotypic trait more often than would be expected by chance occurrence.

Locus (genetics) Location of a gene or region on a chromosome

In genetics, a locus is a specific, fixed position on a chromosome where a particular gene or genetic marker is located. Each chromosome carries many genes, with each gene occupying a different position or locus; in humans, the total number of protein-coding genes in a complete haploid set of 23 chromosomes is estimated at 19,000–20,000.

A molecular marker is a molecule contained within a sample taken from an organism or other matter. It can be used to reveal certain characteristics about the respective source. DNA, for example, is a molecular marker containing information about genetic disorders and the evolutionary history of life. Specific regions of the DNA are used for diagnosing the autosomal recessive genetic disorder cystic fibrosis, taxonomic affinity (phylogenetics) and identity. Further, life forms are known to shed unique chemicals, including DNA, into the environment as evidence of their presence in a particular location. Other biological markers, like proteins, are used in diagnostic tests for complex neurodegenerative disorders, such as Alzheimer's disease. Non-biological molecular markers are also used, for example, in environmental studies.

A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium that represents a group of SNPs called a haplotype. It is possible to identify genetic variation and association to phenotypes without genotyping every SNP in a chromosomal region. This reduces the expense and time of mapping genome areas associated with disease, since it eliminates the need to study every individual SNP. Tag SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs across the entire genome are genotyped.

In genetics, completelinkage is defined as the state in which two loci are so close together that alleles of these loci are virtually never separated by crossing over. The closer the physical location of two genes on the DNA, the less likely they are to be separated by a crossing-over event. In the case of male Drosophila there is complete absence of recombinant types due to absence of crossing over. This means that all of the genes that start out on a single chromosome, will end up on that same chromosome in their original configuration. In the absence of recombination, only parental phenotypes are expected.

Marker assisted selection or marker aided selection (MAS) is an indirect selection process where a trait of interest is selected based on a marker linked to a trait of interest, rather than on the trait itself. This process has been extensively researched and proposed for plant and animal breeding.

Genome-wide association study Study of genetic variants in different individuals

In genomics, a genome-wide association study, also known as whole genome association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

A doubled haploid (DH) is a genotype formed when haploid cells undergo chromosome doubling. Artificial production of doubled haploids is important in plant breeding.

Expression quantitative trait loci (eQTLs) are genomic loci that explain variation in expression levels of mRNAs.

In genetics, association mapping, also known as "linkage disequilibrium mapping", is a method of mapping quantitative trait loci (QTLs) that takes advantage of historic linkage disequilibrium to link phenotypes to genotypes, uncovering genetic associations.

Nested association mapping (NAM) is a technique designed by the labs of Edward Buckler, James Holland, and Michael McMullen for identifying and dissecting the genetic architecture of complex traits in corn. It is important to note that nested association mapping is a specific technique that cannot be performed outside of a specifically designed population such as the Maize NAM population, the details of which are described below.

In statistical genetics, inclusive composite interval mapping (ICIM) has been proposed as an approach to QTL mapping for populations derived from bi-parental crosses. QTL mapping is based on genetic linkage map and phenotypic data and attempts to locate individual genetic factors on chromosomes and to estimate their genetic effects.

Molecular breeding is the application of molecular biology tools, often in plant breeding and animal breeding. In the broad sense, molecular breeding can be defined as the use of genetic manipulation performed at the level of DNA to improve traits of interest in plants and animals, and it may also include genetic engineering or gene manipulation, molecular marker-assisted selection, and genomic selection. More often, however, molecular breeding implies molecular marker-assisted breeding (MAB) and is defined as the application of molecular biotechnologies, specifically molecular markers, in combination with linkage maps and genomics, to alter and improve plant or animal traits on the basis of genotypic assays.

Linkage based QTL mapping is a variant of QTL mapping.

Kompetitive allele specific PCR

Kompetitive allele specific PCR (KASP) is a homogenous, fluorescence-based genotyping variant of polymerase chain reaction. It is based on allele-specific oligo extension and fluorescence resonance energy transfer for signal generation.

Complex traits

Complex traits, also known as quantitative traits, are traits that do not behave according to simple Mendelian inheritance laws. More specifically, their inheritance cannot be explained by the genetic segregation of a single gene. Such traits show a continuous range of variation and are influenced by both environmental and genetic factors. Compared to strictly Mendelian traits, complex traits are far more common, and because they can be hugely polygenic, they are studied using statistical techniques such as QTL mapping rather than classical genetics methods. Examples of complex traits include height, circadian rhythms, enzyme kinetics, and many diseases including diabetes and Parkinson's disease. One major goal of genetic research today is to better understand the molecular mechanisms through which genetic variants act to influence complex traits.

Rohan L. Fernando is a Sri Lankan American geneticist who is a professor of quantitative genetics in the Department of Animal Science at Iowa State University (ISU), US. Fernando's efforts have focused primarily on theory and methods for use of genetic markers in breeding, theory and methods for genetic evaluations of crossbred animals, methodology related to the estimation of genetic parameters and the prediction of genetic merit in populations undergoing selection and non-random mating, Bayesian methodology for analysis of unbalanced mixed model data, optimization of breeding programs, and use of computer simulation to study dynamics of genetic system.

References

  1. 1 2 Rosyara, U. R.; Gonzalez-Hernandez, J. L.; Glover, K. D.; Gedye, K. R.; Stein, J. M. (2009). "Family-based mapping of quantitative trait loci in plant breeding populations with resistance to Fusarium head blight in wheat as an illustration". Theoretical and Applied Genetics. 118 (8): 1617–1631. doi:10.1007/s00122-009-1010-9. PMID   19322557. S2CID   2882803.
  2. Beavis W.D. (1998) "QTL analyses: power, precision, and accuracy". In: Paterson AH (ed) Molecular analysis of complex traits. CRC Press, Boca Raton, pp 145–161
  3. Lander, E. S.; Green, P. (1987). "Construction of multilocus genetic linkage maps in humans". Proceedings of the National Academy of Sciences of the United States of America. 84 (8): 2363–2367. Bibcode:1987PNAS...84.2363L. doi: 10.1073/pnas.84.8.2363 . PMC   304651 . PMID   3470801.
  4. Glazier AM, Nadeau JH, Aitman TJ (2002) "Finding genes that underlie complex traits". Science 298:2345–2349