A DNA segment is identical by state (IBS) in two or more individuals if they have identical nucleotide sequences in this segment. An IBS segment is identical by descent (IBD) in two or more individuals if they have inherited it from a common ancestor without recombination, that is, the segment has the same ancestral origin in these individuals. DNA segments that are IBD are IBS per definition, but segments that are not IBD can still be IBS due to the same mutations in different individuals or recombinations that do not alter the segment.[ citation needed ]
All individuals in a finite population are related if traced back long enough and will, therefore, share segments of their genomes IBD. During meiosis segments of IBD are broken up by recombination. Therefore, the expected length of an IBD segment depends on the number of generations since the most recent common ancestor at the locus of the segment. The length of IBD segments that result from a common ancestor n generations in the past (therefore involving 2n meiosis) is exponentially distributed with mean 1/(2n) Morgans (M). [1] The expected number of IBD segments decreases as the number of generations since the common ancestor at this locus increases. For a specific DNA segment, the probability of being IBD decreases as 2−2n since in each meiosis the probability of transmitting this segment is 1/2. [2]
Identified IBD segments can be used for a wide range of purposes. As noted above the amount (length and number) of IBD sharing depends on the familial relationships between the tested individuals. Therefore, one application of IBD segment detection is to quantify relatedness. [3] [4] [5] [6] Measurement of relatedness can be used in forensic genetics, [7] but can also increase information in genetic linkage mapping [3] [8] and help to decrease bias by undocumented relationships in standard association studies. [6] [9] Another application of IBD is genotype imputation and haplotype phase inference. [10] [11] [12] Long shared segments of IBD, which are broken up by short regions may be indicative for phasing errors. [5] [13] : SI
IBD mapping [3] is similar to linkage analysis, but can be performed without a known pedigree on a cohort of unrelated individuals. IBD mapping can be seen as a new form of association analysis that increases the power to map genes or genomic regions containing multiple rare disease susceptibility variants. [6] [14]
Using simulated data, Browning and Thompson showed that IBD mapping has higher power than association testing when multiple rare variants within a gene contribute to disease susceptibility. [14] Via IBD mapping, genome-wide significant regions in isolated populations as well as outbred populations were found while standard association tests failed. [11] [15] Houwen et al. used IBD sharing to identify the chromosomal location of a gene responsible for benign recurrent intrahepatic cholestasis in an isolated fishing population. [16] Kenny et al. also used an isolated population to fine-map a signal found by a genome-wide association study (GWAS) of plasma plant sterol (PPS) levels, a surrogate measure of cholesterol absorption from the intestine. [17] Francks et al. was able to identify a potential susceptibility locus for schizophrenia and bipolar disorder with genotype data of case-control samples. [18] Lin et al. found a genome-wide significant linkage signal in a dataset of multiple sclerosis patients. [19] Letouzé et al. used IBD mapping to look for founder mutations in cancer samples. [20]
Detection of natural selection in the human genome is also possible via detected IBD segments. Selection will usually tend to increase the number of IBD segments among individuals in a population. By scanning for regions with excess IBD sharing, regions in the human genome that have been under strong, very recent selection can be identified. [21] [22]
In addition to that, IBD segments can be useful for measuring and identifying other influences on population structure. [6] [23] [24] [25] [26] Gusev et al. showed that IBD segments can be used with additional modeling to estimate demographic history including bottlenecks and admixture. [24] Using similar models Palamara et al. and Carmi et al. reconstructed the demographic history of Ashkenazi Jewish and Kenyan Maasai individuals. [25] [26] [27] Botigué et al. investigated differences in African ancestry among European populations. [28] Ralph and Coop used IBD detection to quantify the common ancestry of different European populations [29] and Gravel et al. similarly tried to draw conclusions of the genetic history of populations in the Americas. [30] Ringbauer et al. utilized geographic structure of IBD segments to estimate dispersal within Eastern Europe during the last centuries. [31] Using the 1000 Genomes data Hochreiter found differences in IBD sharing between African, Asian and European populations as well as IBD segments that are shared with ancient genomes like the Neanderthal or Denisova. [13]
Programs for the detection of IBD segments in unrelated individuals:
Genetic linkage is the tendency of DNA sequences that are close together on a chromosome to be inherited together during the meiosis phase of sexual reproduction. Two genetic markers that are physically near to each other are unlikely to be separated onto different chromatids during chromosomal crossover, and are therefore said to be more linked than markers that are far apart. In other words, the nearer two genes are on a chromosome, the lower the chance of recombination between them, and the more likely they are to be inherited together. Markers on different chromosomes are perfectly unlinked, although the penetrance of potentially deleterious alleles may be influenced by the presence of other alleles, and these other alleles may be located on other chromosomes than that on which a particular potentially deleterious allele is located.
In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population, many publications do not apply such a frequency threshold.
Linkage disequilibrium, often abbreviated to LD, is a term in population genetics referring to the association of genes, usually linked genes, in a population. It has become an important tool in medical genetics as well as other fields
The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease and responses to drugs and environmental factors. The information produced by the project is made freely available for research.
Gene mapping or genome mapping describes the methods used to identify the location of a gene on a chromosome and the distances between genes. Gene mapping can also describe the distances between different sites within a gene.
A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium that represents a group of SNPs called a haplotype. It is possible to identify genetic variation and association to phenotypes without genotyping every SNP in a chromosomal region. This reduces the expense and time of mapping genome areas associated with disease, since it eliminates the need to study every individual SNP. Tag SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs across the entire genome are genotyped.
In genomics, a genome-wide association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.
In genetics, association mapping, also known as "linkage disequilibrium mapping", is a method of mapping quantitative trait loci (QTLs) that takes advantage of historic linkage disequilibrium to link phenotypes to genotypes, uncovering genetic associations.
Genetic studies of Jews are part of the population genetics discipline and are used to analyze the ancestry of Jewish populations, complementing research in other fields such as history, linguistics, archaeology, and paleontology. These studies investigate the origins of various Jewish ethnic divisions. In particular, they examine whether there is a common genetic heritage among them. The medical genetics of Jews are studied for population-specific diseases.
Disease gene identification is a process by which scientists identify the mutant genotypes responsible for an inherited genetic disorder. Mutations in these genes can include single nucleotide substitutions, single nucleotide additions/deletions, deletion of the entire gene, and other genetic abnormalities.
Quantitative trait loci mapping or QTL mapping is the process of identifying genomic regions that potentially contain genes responsible for important economic, health or environmental characters. Mapping QTLs is an important activity that plant breeders and geneticists routinely use to associate potential causal genes with phenotypes of interest. Family-based QTL mapping is a variant of QTL mapping where multiple-families are used.
A sequence related amplified polymorphism (SRAP) is a molecular technique, developed by G. Li and C. F. Quiros in 2001, for detecting genetic variation in the open reading frames (ORFs) of genomes of plants and related organisms.
In genetics, haplotype estimation refers to the process of statistical estimation of haplotypes from genotype data. The most common situation arises when genotypes are collected at a set of polymorphic sites from a group of individuals. For example in human genetics, genome-wide association studies collect genotypes in thousands of individuals at between 200,000-5,000,000 SNPs using microarrays. Haplotype estimation methods are used in the analysis of these datasets and allow genotype imputation of alleles from reference databases such as the HapMap Project and the 1000 Genomes Project.
In genetics, imputation is the statistical inference of unobserved genotypes. It is achieved by using known haplotypes in a population, for instance from the HapMap or the 1000 Genomes Project in humans, thereby allowing to test for association between a trait of interest and experimentally untyped genetic variants, but whose genotypes have been statistically inferred ("imputed"). Genotype imputation is usually performed on SNPs, the most common kind of genetic variation.
SNV calling from NGS data is any of a range of methods for identifying the existence of single nucleotide variants (SNVs) from the results of next generation sequencing (NGS) experiments. These are computational techniques, and are in contrast to special experimental methods based on known population-wide single nucleotide polymorphisms. Due to the increasing abundance of NGS data, these techniques are becoming increasingly popular for performing SNP genotyping, with a wide variety of algorithms designed for specific experimental designs and applications. In addition to the usual application domain of SNP genotyping, these techniques have been successfully adapted to identify rare SNPs within a population, as well as detecting somatic SNVs within an individual using multiple tissue samples.
Mega2 is a data manipulation software for applied statistical genetics. Mega is an acronym for Manipulation Environment for Genetic Analysis.
PLINK is a free, commonly used, open-source whole-genome association analysis toolset designed by Shaun Purcell. The software is designed flexibly to perform a wide range of basic, large-scale genetic analyses.
Genome-wide complex trait analysis (GCTA) Genome-based restricted maximum likelihood (GREML) is a statistical method for heritability estimation in genetics, which quantifies the total additive contribution of a set of genetic variants to a trait. GCTA is typically applied to common single nucleotide polymorphisms (SNPs) on a genotyping array and thus termed "chip" or "SNP" heritability.
Sharon Ruth Browning is a statistical geneticist at the University of Washington, and a research professor with its Department of Biostatistics. Her research has various implications for the field of biogenetics.
In statistical genetics, Haseman–Elston(HE) regression is a statistical regression originally proposed for linkage analysis of quantitative traits in sibling pairs. It was first developed by Joseph K. Haseman and Robert C. Elston in 1972. A much earlier implementation of sib-pair linkage analysis was proposed by Lionel S. Penrose in 1935 and 1938. Notably, Penrose was the father of Nobel laureate theoretical physicist Roger Penrose.
{{cite journal}}
: CS1 maint: numeric names: authors list (link)