A recombinant inbred strain or recombinant inbred line (RIL) is an organism with chromosomes that incorporate an essentially permanent set of recombination events between chromosomes inherited from two or more inbred strains. F1 and F2 generations are produced by intercrossing the inbred strains; pairs of the F2 progeny are then mated to establish inbred strains through long-term inbreeding. [1]
Families of recombinant inbred strains numbering from 25 to 5000 are often used to map the locations of DNA sequence differences (quantitative trait loci) that contributed to differences in phenotype in model organisms. Recombinant inbred strains or lines were first developed using inbred strains of mice but are now used to study a wide range of organisms – Saccharomyces cerevisiae (yeast), Zea mays (maize), barley, Drosophila melanogaster , C. elegans and rat.
The origins and history of recombinant inbred strains are described by Crow. [1] While the potential utility of recombinant inbred strains in mapping analysis of complex polygenic traits was obvious from the outset, the small number of strains only made it feasible to map quantitative traits with very large effects (quasi-Mendelian loci). One of the initial motivations to use recombinant inbred strains is that expensive genotype data can be accumulated and reused – greatly simplifying mapping studies. [2] Another factor is the precision of mapping that can be achieved using these strains compared to typical F2 intercross progeny. [3]
As genotyping became progressively less expensive and more accurate the main advantage of using recombinant inbred strains and other genetic reference panels shifted to the ability to assemble massive and coherent databases on phenotypes (e.g., the GeneNetwork web service), and to use these coherent open-source data sets for large-scale collaborative research projects in predictive medicine and plant and animal research.
Recombinant inbred strains are now widely used in systems genetics and to study gene–environment interactions. [4] [5] [6] [7] It is possible to accumulate extensive genetic and phenotype data for each member of a family of recombinant inbred strains under several different conditions (e.g., baseline environment versus stressful environment). Each strain has a single fixed genome and it is also possible to resample a given genotype multiple times in multiple environments to obtain highly accurate estimates of genetic and environmental effects and their interactions.
Chromosomes of recombinant inbred strains typically consist of alternating haplotypes of highly variable length that are inherited intact from the parental strains. In the case of a typical mouse recombinant inbred strain made by crossing maternal strain BALB/cBy (C) with paternal strain C57BL/6By (B) called a CXB recombinant inbred strain, a chromosome will typically incorporate 2 to 5 alternating haplotype blocks with underlying genotypes such as BBBBBCCCCBBBCCCCCCCC, where each letter represents a single genotype (e.g. a SNP), where series of identical genotypes represent haplotypes, and where a transition between haplotypes represents a recombination event between the parental genomes. Both chromosomes (in any given chromosome pair) will have the same alternating pattern of haplotypes, and all markers will be homozygous. Each of the different chromosomes (Chr 1, Chr 2, etc.) will have a different pattern of haplotypes and recombinations. The only exception is that the Y chromosome and the mitochondrial genome, both of which are inherited intact from the paternal and maternal strain, respectively. For an RI strain to be useful for mapping purposes, the approximate position of recombinations along each chromosome need to be well defined either in terms of centimorgan or DNA basepair position. The precision with which these recombinations are mapped is a function of the number and position of the genotypes used to type the chromosomes – 20 in the example above.
All else being equal, the larger the family of recombinant inbred strains, the greater the power and resolution with which phenotypes can be mapped to chromosomal locations. The first set of eight strains, the CXB family, were generated by Donald Bailey at the Jackson Laboratory from an intercross between a female BALB/cBy mouse (abbreviated C) and a male C57BL/6By mouse in the 1960s. The small panel of 8 CXB strains was originally used to determine if the Major Histocompatibility (MHC) locus on proximal chromosome 17 was a key factor in different immune responses such as tissue rejection. The methods used to determine the locations of recombinations relied on visible markers (coat color phenotypes such as the C and B loci) and the electrophoretic mobility of proteins. Somewhat larger families of recombinant inbred strains were generated concurrently by Benjamin Taylor to map Mendelian and other major effect loci. In the 1990s the utility of recombinant inbred strains for mapping was significantly improved thanks to higher density genotypes made possible by the use of microsatellite markers. Between 2005 and 2007, virtually all extant mouse and rat recombinant inbred strains were regenotyped at many thousands of SNP markers, providing highly accurate maps of recombinations.
Genetic linkage is the tendency of DNA sequences that are close together on a chromosome to be inherited together during the meiosis phase of sexual reproduction. Two genetic markers that are physically near to each other are unlikely to be separated onto different chromatids during chromosomal crossover, and are therefore said to be more linked than markers that are far apart. In other words, the nearer two genes are on a chromosome, the lower the chance of recombination between them, and the more likely they are to be inherited together. Markers on different chromosomes are perfectly unlinked, although the penetrance of potentially deleterious alleles may be influenced by the presence of other alleles, and these other alleles may be located on other chromosomes than that on which a particular potentially deleterious allele is located.
Inbred strains are individuals of a particular species which are nearly identical to each other in genotype due to long inbreeding. A strain is inbred when it has undergone at least 20 generations of brother x sister or offspring x parent mating, at which point at least 98.6% of the loci in an individual of the strain will be homozygous, and each individual can be treated effectively as clones. Some inbred strains have been bred for over 150 generations, leaving individuals in the population to be isogenic in nature. Inbred strains of animals are frequently used in laboratories for experiments where for the reproducibility of conclusions all the test animals should be as similar as possible. However, for some experiments, genetic diversity in the test population may be desired. Thus outbred strains of most laboratory animals are also available, where an outbred strain is a strain of an organism that is effectively wildtype in nature, where there is as little inbreeding as possible.
A haplotype is a group of alleles in an organism that are inherited together from a single parent.
A quantitative trait locus (QTL) is a locus that correlates with variation of a quantitative trait in the phenotype of a population of organisms. QTLs are mapped by identifying which molecular markers correlate with an observed trait. This is often an early step in identifying the actual genes that cause the trait variation.
Backcrossing is a crossing of a hybrid with one of its parents or an individual genetically similar to its parent, to achieve offspring with a genetic identity closer to that of the parent. It is used in horticulture, animal breeding, and production of gene knockout organisms.
Forward genetics is a molecular genetics approach of determining the genetic basis responsible for a phenotype. Forward genetics provides an unbiased approach because it relies heavily on identifying the genes or genetic factors that cause a particular phenotype or trait of interest.
Gene mapping or genome mapping describes the methods used to identify the location of a gene on a chromosome and the distances between genes. Gene mapping can also describe the distances between different sites within a gene.
Genetic association is when one or more genotypes within a population co-occur with a phenotypic trait more often than would be expected by chance occurrence.
In genetics, a locus is a specific, fixed position on a chromosome where a particular gene or genetic marker is located. Each chromosome carries many genes, with each gene occupying a different position or locus; in humans, the total number of protein-coding genes in a complete haploid set of 23 chromosomes is estimated at 19,000–20,000.
A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium that represents a group of SNPs called a haplotype. It is possible to identify genetic variation and association to phenotypes without genotyping every SNP in a chromosomal region. This reduces the expense and time of mapping genome areas associated with disease, since it eliminates the need to study every individual SNP. Tag SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs across the entire genome are genotyped.
In genetics, completelinkage is defined as the state in which two loci are so close together that alleles of these loci are virtually never separated by crossing over. The closer the physical location of two genes on the DNA, the less likely they are to be separated by a crossing-over event. In the case of male Drosophila there is complete absence of recombinant types due to absence of crossing over. This means that all of the genes that start out on a single chromosome, will end up on that same chromosome in their original configuration. In the absence of recombination, only parental phenotypes are expected.
Marker assisted selection or marker aided selection (MAS) is an indirect selection process where a trait of interest is selected based on a marker linked to a trait of interest, rather than on the trait itself. This process has been extensively researched and proposed for plant- and animal- breeding.
Dr. Alexander Bachmanov studied veterinary medicine at the Saint Petersburg Veterinary Institute, Russia (1977-1982), received his Ph.D. in biological sciences from the Pavlov Institute of Physiology in Saint Petersburg, Russia in 1990. He completed postdoctoral fellowships at the Physiological Laboratory at Cambridge University in 1993 and at the Monell Chemical Senses Center, Philadelphia, Pennsylvania, in the United States from 1994 to 1997. He later joined Monnell's faculty.
A doubled haploid (DH) is a genotype formed when haploid cells undergo chromosome doubling. Artificial production of doubled haploids is important in plant breeding.
In genetics, association mapping, also known as "linkage disequilibrium mapping", is a method of mapping quantitative trait loci (QTLs) that takes advantage of historic linkage disequilibrium to link phenotypes to genotypes, uncovering genetic associations.
Nested association mapping (NAM) is a technique designed by the labs of Edward Buckler, James Holland, and Michael McMullen for identifying and dissecting the genetic architecture of complex traits in corn. It is important to note that nested association mapping is a specific technique that cannot be performed outside of a specifically designed population such as the Maize NAM population, the details of which are described below.
GeneNetwork is a combined database and open-source bioinformatics data analysis software resource for systems genetics. This resource is used to study gene regulatory networks that link DNA sequence differences to corresponding differences in gene and protein expression and to variation in traits such as health and disease risk. Data sets in GeneNetwork are typically made up of large collections of genotypes and phenotypes from groups of individuals, including humans, strains of mice and rats, and organisms as diverse as Drosophila melanogaster, Arabidopsis thaliana, and barley. The inclusion of genotypes makes it practical to carry out web-based gene mapping to discover those regions of genomes that contribute to differences among individuals in mRNA, protein, and metabolite levels, as well as differences in cell function, anatomy, physiology, and behavior.
Quantitative trait loci mapping or QTL mapping is the process of identifying genomic regions that potentially contain genes responsible for important economic, health or environmental characters. Mapping QTLs is an important activity that plant breeders and geneticists routinely use to associate potential causal genes with phenotypes of interest. Family-based QTL mapping is a variant of QTL mapping where multiple-families are used.
Molecular breeding is the application of molecular biology tools, often in plant breeding and animal breeding. In the broad sense, molecular breeding can be defined as the use of genetic manipulation performed at the level of DNA to improve traits of interest in plants and animals, and it may also include genetic engineering or gene manipulation, molecular marker-assisted selection, and genomic selection. More often, however, molecular breeding implies molecular marker-assisted breeding (MAB) and is defined as the application of molecular biotechnologies, specifically molecular markers, in combination with linkage maps and genomics, to alter and improve plant or animal traits on the basis of genotypic assays.
Mega2 allows the applied statistical geneticist to convert one's data from several input formats to a large number output formats suitable for analysis by commonly used software packages. In a typical human genetics study, the analyst often needs to use a variety of different software programs to analyze the data, and these programs usually require that the data be formatted to their precise input specifications. Conversion of one's data into these multiple different formats can be tedious, time-consuming, and error-prone. Mega2, by providing validated conversion pipelines, can accelerate the analyses while reducing errors.