A DNA segment is identical by state (IBS) in two or more individuals if they have identical nucleotide sequences in this segment. An IBS segment is identical by descent (IBD) in two or more individuals if they have inherited it from a common ancestor without recombination, that is, the segment has the same ancestral origin in these individuals. DNA segments that are IBD are IBS per definition, but segments that are not IBD can still be IBS due to the same mutations in different individuals or recombinations that do not alter the segment.[ citation needed ]
All individuals in a finite population are related if traced back long enough and will, therefore, share segments of their genomes IBD. During meiosis segments of IBD are broken up by recombination. Therefore, the expected length of an IBD segment depends on the number of generations since the most recent common ancestor at the locus of the segment. The length of IBD segments that result from a common ancestor n generations in the past (therefore involving 2n meiosis) is exponentially distributed with mean 1/(2n) Morgans (M). [1] The expected number of IBD segments decreases with the number of generations since the common ancestor at this locus. For a specific DNA segment, the probability of being IBD decreases as 2−2n since in each meiosis the probability of transmitting this segment is 1/2. [2]
Identified IBD segments can be used for a wide range of purposes. As noted above the amount (length and number) of IBD sharing depends on the familial relationships between the tested individuals. Therefore, one application of IBD segment detection is to quantify relatedness. [3] [4] [5] [6] Measurement of relatedness can be used in forensic genetics, [7] but can also increase information in genetic linkage mapping [3] [8] and help to decrease bias by undocumented relationships in standard association studies. [6] [9] Another application of IBD is genotype imputation and haplotype phase inference. [10] [11] [12] Long shared segments of IBD, which are broken up by short regions may be indicative for phasing errors. [5] [13] : SI
IBD mapping [3] is similar to linkage analysis, but can be performed without a known pedigree on a cohort of unrelated individuals. IBD mapping can be seen as a new form of association analysis that increases the power to map genes or genomic regions containing multiple rare disease susceptibility variants. [6] [14]
Using simulated data, Browning and Thompson showed that IBD mapping has higher power than association testing when multiple rare variants within a gene contribute to disease susceptibility. [14] Via IBD mapping, genome-wide significant regions in isolated populations as well as outbred populations were found while standard association tests failed. [11] [15] Houwen et al. used IBD sharing to identify the chromosomal location of a gene responsible for benign recurrent intrahepatic cholestasis in an isolated fishing population. [16] Kenny et al. also used an isolated population to fine-map a signal found by a genome-wide association study (GWAS) of plasma plant sterol (PPS) levels, a surrogate measure of cholesterol absorption from the intestine. [17] Francks et al. was able to identify a potential susceptibility locus for schizophrenia and bipolar disorder with genotype data of case-control samples. [18] Lin et al. found a genome-wide significant linkage signal in a dataset of multiple sclerosis patients. [19] Letouzé et al. used IBD mapping to look for founder mutations in cancer samples. [20]
Detection of natural selection in the human genome is also possible via detected IBD segments. Selection will usually tend to increase the number of IBD segments among individuals in a population. By scanning for regions with excess IBD sharing, regions in the human genome that have been under strong, very recent selection can be identified. [21] [22]
In addition to that, IBD segments can be useful for measuring and identifying other influences on population structure. [6] [23] [24] [25] [26] Gusev et al. showed that IBD segments can be used with additional modeling to estimate demographic history including bottlenecks and admixture. [24] Using similar models Palamara et al. and Carmi et al. reconstructed the demographic history of Ashkenazi Jewish and Kenyan Maasai individuals. [25] [26] [27] Botigué et al. investigated differences in African ancestry among European populations. [28] Ralph and Coop used IBD detection to quantify the common ancestry of different European populations [29] and Gravel et al. similarly tried to draw conclusions of the genetic history of populations in the Americas. [30] Ringbauer et al. utilized geographic structure of IBD segments to estimate dispersal within Eastern Europe during the last centuries. [31] Using the 1000 Genomes data Hochreiter found differences in IBD sharing between African, Asian and European populations as well as IBD segments that are shared with ancient genomes like the Neanderthal or Denisova. [13]
Programs for the detection of IBD segments in unrelated individuals:
Genetic linkage is the tendency of DNA sequences that are close together on a chromosome to be inherited together during the meiosis phase of sexual reproduction. Two genetic markers that are physically near to each other are unlikely to be separated onto different chromatids during chromosomal crossover, and are therefore said to be more linked than markers that are far apart. In other words, the nearer two genes are on a chromosome, the lower the chance of recombination between them, and the more likely they are to be inherited together. Markers on different chromosomes are perfectly unlinked, although the penetrance of potentially deleterious alleles may be influenced by the presence of other alleles, and these other alleles may be located on other chromosomes than that on which a particular potentially deleterious allele is located.
In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population, many publications do not apply such a frequency threshold.
The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease and responses to drugs and environmental factors. The information produced by the project is made freely available for research.
Genetic association is when one or more genotypes within a population co-occur with a phenotypic trait more often than would be expected by chance occurrence.
In molecular biology, SNP array is a type of DNA microarray which is used to detect polymorphisms within a population. A single nucleotide polymorphism (SNP), a variation at a single site in DNA, is the most frequent type of variation in the genome. Around 335 million SNPs have been identified in the human genome, 15 million of which are present at frequencies of 1% or higher across different populations worldwide.
A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium that represents a group of SNPs called a haplotype. It is possible to identify genetic variation and association to phenotypes without genotyping every SNP in a chromosomal region. This reduces the expense and time of mapping genome areas associated with disease, since it eliminates the need to study every individual SNP. Tag SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs across the entire genome are genotyped.
Haploview is a commonly used bioinformatics software which is designed to analyze and visualize patterns of linkage disequilibrium (LD) in genetic data. Haploview can also perform association studies, choosing tagSNPs and estimating haplotype frequencies. Haploview is developed and maintained by Dr. Mark Daly's lab at the MIT/Harvard Broad Institute.
In genomics, a genome-wide association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.
The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI). Although the name of the database implies a collection of one class of polymorphisms only, it in fact contains a range of molecular variation: (1) SNPs, (2) short deletion and insertion polymorphisms (indels/DIPs), (3) microsatellite markers or short tandem repeats (STRs), (4) multinucleotide polymorphisms (MNPs), (5) heterozygous sequences, and (6) named variants. The dbSNP accepts apparently neutral polymorphisms, polymorphisms corresponding to known phenotypes, and regions of no variation. It was created in September 1998 to supplement GenBank, NCBI’s collection of publicly available nucleic acid and protein sequences.
Expression quantitative trait loci (eQTLs) are genomic loci that explain variation in expression levels of mRNAs.
In genetics, association mapping, also known as "linkage disequilibrium mapping", is a method of mapping quantitative trait loci (QTLs) that takes advantage of historic linkage disequilibrium to link phenotypes to genotypes, uncovering genetic associations.
Genetic studies of Jews are part of the population genetics discipline and are used to analyze the ancestry of Jewish populations, complementing research in other fields such as history, linguistics, archaeology, and paleontology. These studies investigate the origins of various Jewish ethnic divisions. In particular, they examine whether there is a common genetic heritage among them. The medical genetics of Jews are studied for population-specific diseases.
GeneNetwork is a combined database and open-source bioinformatics data analysis software resource for systems genetics. This resource is used to study gene regulatory networks that link DNA sequence differences to corresponding differences in gene and protein expression and to variation in traits such as health and disease risk. Data sets in GeneNetwork are typically made up of large collections of genotypes and phenotypes from groups of individuals, including humans, strains of mice and rats, and organisms as diverse as Drosophila melanogaster, Arabidopsis thaliana, and barley. The inclusion of genotypes makes it practical to carry out web-based gene mapping to discover those regions of genomes that contribute to differences among individuals in mRNA, protein, and metabolite levels, as well as differences in cell function, anatomy, physiology, and behavior.
Genomic structural variation is the variation in structure of an organism's chromosome, such as deletions, duplications, copy-number variants, insertions, inversions and translocations. Originally, a structure variation affects a sequence length about 1kb to 3Mb, which is larger than SNPs and smaller than chromosome abnormality. However, the operational range of structural variants has widened to include events > 50bp. Some structural variants are associated with genetic diseases, however most are not. Approximately 13% of the human genome is defined as structurally variant in the normal population, and there are at least 240 genes that exist as homozygous deletion polymorphisms in human populations, suggesting these genes are dispensable in humans. While humans carry a median of 3.6 Mbp in SNPs, a median of 8.9 Mbp is affected by structural variation which thus causes most genetic differences between humans in terms of raw sequence data.
Disease gene identification is a process by which scientists identify the mutant genotypes responsible for an inherited genetic disorder. Mutations in these genes can include single nucleotide substitutions, single nucleotide additions/deletions, deletion of the entire gene, and other genetic abnormalities.
Quantitative trait loci mapping or QTL mapping is the process of identifying genomic regions that potentially contain genes responsible for important economic, health or environmental characters. Mapping QTLs is an important activity that plant breeders and geneticists routinely use to associate potential causal genes with phenotypes of interest. Family-based QTL mapping is a variant of QTL mapping where multiple-families are used.
In genetics, haplotype estimation refers to the process of statistical estimation of haplotypes from genotype data. The most common situation arises when genotypes are collected at a set of polymorphic sites from a group of individuals. For example in human genetics, genome-wide association studies collect genotypes in thousands of individuals at between 200,000-5,000,000 SNPs using microarrays. Haplotype estimation methods are used in the analysis of these datasets and allow genotype imputation of alleles from reference databases such as the HapMap Project and the 1000 Genomes Project.
Mega2 is a data manipulation software for applied statistical genetics. Mega is an acronym for Manipulation Environment for Genetic Analysis.
Sharon Ruth Browning is a statistical geneticist at the University of Washington, and a research professor with its Department of Biostatistics. Her research has various implications for the field of biogenetics.
The Human Pangenome Reference is a collection of genomes from a diverse cohort of individuals compiled by the Human Pangenome Reference Consortium (HPRC). This first draft pangenome comprises 47 phased, diploid assemblies from a diverse cohort of individuals and was intended to capture the genetic diversity of the human population. The development of this pangenome seeks to address perceived shortcomings in the current human reference genome by offering a more comprehensive and inclusive resource for genomic research and analysis.
{{cite journal}}
: CS1 maint: numeric names: authors list (link)