Haplotype

Last updated
DNA molecule 1 differs from DNA molecule 2 at a single base-pair location (a C/A polymorphism). Dna-SNP.svg
DNA molecule 1 differs from DNA molecule 2 at a single base-pair location (a C/A polymorphism).

A haplotype (haploid genotype) is a group of alleles in an organism that are inherited together from a single parent. [1] [2]

Contents

Many organisms contain genetic material (DNA) which is inherited from two parents. Normally these organisms have their DNA organized in two sets of pairwise similar chromosomes. The offspring gets one chromosome in each pair from each parent. A set of pairs of chromosomes is called diploid and a set of only one half of each pair is called haploid. The haploid genotype (haplotype) is a genotype that considers the singular chromosomes rather than the pairs of chromosomes. It can be all the chromosomes from one of the parents or a minor part of a chromosome, for example a sequence of 9000 base pairs or a small set of alleles.

Specific contiguous parts of the chromosome are likely to be inherited together and not be split by chromosomal crossover, a phenomenon called genetic linkage. [3] [4] As a result, identifying these statistical associations and a few alleles of a specific haplotype sequence can facilitate identifying all other such polymorphic sites that are nearby on the chromosome (imputation). [5] Such information is critical for investigating the genetics of common diseases; which in fact have been investigated in humans by the International HapMap Project. [6] [7]

Other parts of the genome are almost always haploid and do not undergo crossover: for example, humans mitochondrial DNA is pass down through the maternal line and the Y chromosome is passed down the paternal line. In these cases, the entire sequence can be grouped into a simple evolutionary tree, with each branch founded by a unique-event polymorphism mutation (often, but not always, a single-nucleotide polymorphism (SNP)). Each clade under a branch, containing haplotypes with a single shared ancestor, is called a haplogroup. [8] [9] [10]

Haplotype resolution

An organism's genotype may not define its haplotype uniquely. For example, consider a diploid organism and two bi-allelic loci (such as SNPs) on the same chromosome. Assume the first locus has alleles A or T and the second locus G or C. Both loci, then, have three possible genotypes: (AA, AT, and TT) and (GG, GC, and CC), respectively. For a given individual, there are nine possible configurations (haplotypes) at these two loci (shown in the Punnett square below). For individuals who are homozygous at one or both loci, the haplotypes are unambiguous - meaning that there is not any differentiation of haplotype T1T2 vs haplotype T2T1; where T1 and T2 are labeled to show that they are the same locus, but labeled as such to show it does not matter which order you consider them in, the end result is two T loci. For individuals heterozygous at both loci, the gametic phase is ambiguous - in these cases, an observer does not know which haplotype the individual has, e.g., TA vs AT.

Locus 1
Locus 2
AAATTT
GGAG AGAG TGTG TG
GCAG ACAG TC
or
AC TG
TG TC
CCAC ACAC TCTC TC

The only unequivocal method of resolving phase ambiguity is by sequencing. However, it is possible to estimate the probability of a particular haplotype when phase is ambiguous using a sample of individuals.

Given the genotypes for a number of individuals, the haplotypes can be inferred by haplotype resolution or haplotype phasing techniques. These methods work by applying the observation that certain haplotypes are common in certain genomic regions. Therefore, given a set of possible haplotype resolutions, these methods choose those that use fewer different haplotypes overall. The specifics of these methods vary - some are based on combinatorial approaches (e.g., parsimony), whereas others use likelihood functions based on different models and assumptions such as the Hardy–Weinberg principle, the coalescent theory model, or perfect phylogeny. The parameters in these models are then estimated using algorithms such as the expectation-maximization algorithm (EM), Markov chain Monte Carlo (MCMC), or hidden Markov models (HMM).

Microfluidic whole genome haplotyping is a technique for the physical separation of individual chromosomes from a metaphase cell followed by direct resolution of the haplotype for each allele.

Gametic phase

In genetics, a gametic phase represents the original allelic combinations that a diploid individual inherits from both parents. [11] It is therefore a particular association of alleles at different loci on the same chromosome. Gametic phase is influenced by genetic linkage. [12]

Y-DNA haplotypes from genealogical DNA tests

Unlike other chromosomes, Y chromosomes generally do not come in pairs. Every human male (excepting those with XYY syndrome) has only one copy of that chromosome. This means that there is not any chance variation of which copy is inherited, and also (for most of the chromosome) not any shuffling between copies by recombination; so, unlike autosomal haplotypes, there is effectively not any randomisation of the Y-chromosome haplotype between generations. A human male should largely share the same Y chromosome as his father, give or take a few mutations; thus Y chromosomes tend to pass largely intact from father to son, with a small but accumulating number of mutations that can serve to differentiate male lineages. In particular, the Y-DNA represented as the numbered results of a Y-DNA genealogical DNA test should match, except for mutations.

UEP results (SNP results)

Unique-event polymorphisms (UEPs) such as SNPs represent haplogroups. STRs represent haplotypes. The results that comprise the full Y-DNA haplotype from the Y chromosome DNA test can be divided into two parts: the results for UEPs, sometimes loosely called the SNP results as most UEPs are single-nucleotide polymorphisms, and the results for microsatellite short tandem repeat sequences (Y-STRs).

The UEP results represent the inheritance of events it is believed can be assumed to have happened only once in all human history. These can be used to identify the individual's Y-DNA haplogroup, his place in the "family tree" of the whole of humanity. Different Y-DNA haplogroups identify genetic populations that are often distinctly associated with particular geographic regions; their appearance in more recent populations located in different regions represents the migrations tens of thousands of years ago of the direct patrilineal ancestors of current individuals.

Y-STR haplotypes

Genetic results also include the Y-STR haplotype, the set of results from the Y-STR markers tested.

Unlike the UEPs, the Y-STRs mutate much more easily, which allows them to be used to distinguish recent genealogy. But it also means that, rather than the population of descendants of a genetic event all sharing the same result, the Y-STR haplotypes are likely to have spread apart, to form a cluster of more or less similar results. Typically, this cluster will have a definite most probable center, the modal haplotype (presumably similar to the haplotype of the original founding event), and also a haplotype diversity — the degree to which it has become spread out. The further in the past the defining event occurred, and the more that subsequent population growth occurred early, the greater the haplotype diversity will be for a particular number of descendants. However, if the haplotype diversity is smaller for a particular number of descendants, this may indicate a more recent common ancestor, or a recent population expansion.

It is important to note that, unlike for UEPs, two individuals with a similar Y-STR haplotype may not necessarily share a similar ancestry. Y-STR events are not unique. Instead, the clusters of Y-STR haplotype results inherited from different events and different histories tend to overlap.

In most cases, it is a long time since the haplogroups' defining events, so typically the cluster of Y-STR haplotype results associated with descendants of that event has become rather broad. These results will tend to significantly overlap the (similarly broad) clusters of Y-STR haplotypes associated with other haplogroups. This makes it impossible for researchers to predict with absolute certainty to which Y-DNA haplogroup a Y-STR haplotype would point. If the UEPs are not tested, the Y-STRs may be used only to predict probabilities for haplogroup ancestry, but not certainties.

A similar scenario exists in trying to evaluate whether shared surnames indicate shared genetic ancestry. A cluster of similar Y-STR haplotypes may indicate a shared common ancestor, with an identifiable modal haplotype, but only if the cluster is sufficiently distinct from what may have happened by chance from different individuals who historically adopted the same name independently. Many names were adopted from common occupations, for instance, or were associated with habitation of particular sites. More extensive haplotype typing is needed to establish genetic genealogy. Commercial DNA-testing companies now offer their customers testing of more numerous sets of markers to improve definition of their genetic ancestry. The number of sets of markers tested has increased from 12 during the early years to 111 more recently.

Establishing plausible relatedness between different surnames data-mined from a database is significantly more difficult. The researcher must establish that the very nearest member of the population in question, chosen purposely from the population for that reason, would be unlikely to match by accident. This is more than establishing that a randomly selected member of the population is unlikely to have such a close match by accident. Because of the difficulty, establishing relatedness between different surnames as in such a scenario is likely to be impossible, except in special cases where there is specific information to drastically limit the size of the population of candidates under consideration.

Diversity

Haplotype diversity is a measure of the uniqueness of a particular haplotype in a given population. The haplotype diversity (H) is computed as: [13]


where is the (relative) haplotype frequency of each haplotype in the sample and is the sample size. Haplotype diversity is given for each sample.

See also

Related Research Articles

An allele, or allelomorph, is a variant of the sequence of nucleotides at a particular location, or locus, on a DNA molecule.

<span class="mw-page-title-main">Single-nucleotide polymorphism</span> Single nucleotide in genomic DNA at which different sequence alternatives exist

In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population, many publications do not apply such a frequency threshold.

In population genetics, linkage disequilibrium (LD) is the non-random association of alleles at different loci in a given population. Loci are said to be in linkage disequilibrium when the frequency of association of their different alleles is higher or lower than expected if the loci were independent and associated randomly.

The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease and responses to drugs and environmental factors. The information produced by the project is made freely available for research.

Genetic genealogy is the use of genealogical DNA tests, i.e., DNA profiling and DNA testing, in combination with traditional genealogical methods, to infer genetic relationships between individuals. This application of genetics came to be used by family historians in the 21st century, as DNA tests became affordable. The tests have been promoted by amateur groups, such as surname study groups or regional genealogical groups, as well as research projects such as the Genographic Project.

A Y-STR is a short tandem repeat (STR) on the Y-chromosome. Y-STRs are often used in forensics, paternity, and genealogical DNA testing. Y-STRs are taken specifically from the male Y chromosome. These Y-STRs provide a weaker analysis than autosomal STRs because the Y chromosome is only found in males, which are only passed down by the father, making the Y chromosome in any paternal line practically identical. This causes a significantly smaller amount of distinction between Y-STR samples. Autosomal STRs provide a much stronger analytical power because of the random matching that occurs between pairs of chromosomes during the zygote-making process.

Genetic association is when one or more genotypes within a population co-occur with a phenotypic trait more often than would be expected by chance occurrence.

<span class="mw-page-title-main">Human genetic variation</span> Genetic diversity in human populations

Human genetic variation is the genetic differences in and among populations. There may be multiple variants of any given gene in the human population (alleles), a situation called polymorphism.

A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium that represents a group of SNPs called a haplotype. It is possible to identify genetic variation and association to phenotypes without genotyping every SNP in a chromosomal region. This reduces the expense and time of mapping genome areas associated with disease, since it eliminates the need to study every individual SNP. Tag SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs across the entire genome are genotyped.

In genetic genealogy, a unique-event polymorphism (UEP) is a genetic marker that corresponds to a mutation that is likely to occur so infrequently that it is believed overwhelmingly probable that all the individuals who share the marker, worldwide, will have inherited it from the same common ancestor, and the same single mutation event.

dbSNP Genetics database

The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI). Although the name of the database implies a collection of one class of polymorphisms only, it in fact contains a range of molecular variation: (1) SNPs, (2) short deletion and insertion polymorphisms (indels/DIPs), (3) microsatellite markers or short tandem repeats (STRs), (4) multinucleotide polymorphisms (MNPs), (5) heterozygous sequences, and (6) named variants. The dbSNP accepts apparently neutral polymorphisms, polymorphisms corresponding to known phenotypes, and regions of no variation. It was created in September 1998 to supplement GenBank, NCBI’s collection of publicly available nucleic acid and protein sequences.

Microfluidic whole genome haplotyping is a technique for the physical separation of individual chromosomes from a metaphase cell followed by direct resolution of the haplotype for each allele.

Genetic studies on Serbs show close affinity to other neighboring South Slavs.

In genetics, haplotype estimation refers to the process of statistical estimation of haplotypes from genotype data. The most common situation arises when genotypes are collected at a set of polymorphic sites from a group of individuals. For example in human genetics, genome-wide association studies collect genotypes in thousands of individuals at between 200,000-5,000,000 SNPs using microarrays. Haplotype estimation methods are used in the analysis of these datasets and allow genotype imputation of alleles from reference databases such as the HapMap Project and the 1000 Genomes Project.

<span class="mw-page-title-main">Y Chromosome Haplotype Reference Database</span>

The Y Chromosome Haplotype Reference Database (YHRD) is an open-access, annotated collection of population samples typed for Y chromosomal sequence variants. Two important objectives are pursued: (1) the generation of reliable frequency estimates for Y-STR haplotypes and Y-SNP haplotypes to be used in the quantitative assessment of matches in forensic and kinship cases and (2) the characterization of male lineages to draw conclusions about the origins and history of human populations. The database is endorsed by the International Society for Forensic Genetics (ISFG). By May 2023 about 350,000 Y chromosomes typed for 9-29 STR loci have been directly submitted by worldwide forensic institutions and universities. In geographic terms, about 53% of the YHRD samples stem from Asia, 21% from Europe, 12% from North America, 10% from Latin America, 3% from Africa, 0.8% from Oceania/Australia and 0.2% from the Arctic. The 1.406 individual sampling projects are described in about 800 peer-reviewed publications

The relationship of the Mayas to other indigenous peoples of the Americas has been assessed using traditional genetic markers. Mayas inhabited several parts of Mexico and Central America, including Chiapas, the northern lowlands of the Yucatán Peninsula, the southern lowlands and highlands of Guatemala, Belize, and parts of western El Salvador and Honduras. Genetic studies of the Maya people are reported to show higher levels of variation when compared to other groups.

Population genetics is a scientific discipline which contributes to the examination of the human evolutionary and historical migrations. Particularly useful information is provided by the research of two uniparental markers within our genome, the Y-chromosome (Y-DNA) and mitochondrial DNA (mtDNA), as well as autosomal DNA. The data from Y-DNA and autosomal DNA suggests that the Croats mostly are descendants of the Slavs of the medieval migration period, according to mtDNA have genetic diversity which fits within a broader European maternal genetic landscape, and overall have a uniformity with other South Slavs from the territory of former Yugoslavia.

A multilocus genotype is the combination of alleles found at two or more loci in a single individual.

As with all modern European nations, a large degree of 'biological continuity' exists between Bosnians and Bosniaks and their ancient predecessors with Y chromosomal lineages testifying to predominantly Paleolithic European ancestry. Studies based on bi-allelic markers of the NRY have shown the three main ethnic groups of Bosnia and Herzegovina to share, in spite of some quantitative differences, a large fraction of the same ancient gene pool distinct for the region. Analysis of autosomal STRs have moreover revealed no significant difference between the population of Bosnia and Herzegovina and neighbouring populations.

References

    1. By C. Barry Cox, Peter D. Moore, Richard Ladle. Wiley-Blackwell, 2016. ISBN   978-1-118-96858-1 p106. Biogeography: An Ecological and Evolutionary Approach
    2. Editorial Board, V&S Publishers, 2012, ISBN   9381588643 p137.Concise Dictionary of Science
    3. BiologyPages/H/Haplotypes.html Kimball's Biology Pages (Creative Commons Attribution 3.0)
    4. "haplotype / haplotypes | Learn Science at Scitable". www.nature.com.
    5. Yoosefzadeh-Najafabadi, Mohsen; Rajcan, Istvan; Eskandari, Milad (2022). "Optimizing genomic selection in soybean: An important improvement in agricultural genomics". Heliyon. 8 (11): e11873. Bibcode:2022Heliy...811873Y. doi: 10.1016/j.heliyon.2022.e11873 . PMC   9713349 . PMID   36468106.
    6. The International HapMap Consortium (2003). "The International HapMap Project" (PDF). Nature. 426 (6968): 789–796. Bibcode:2003Natur.426..789G. doi:10.1038/nature02168. hdl: 2027.42/62838 . PMID   14685227. S2CID   4387110.
    7. The International HapMap Consortium (2005). "A haplotype map of the human genome". Nature. 437 (7063): 1299–1320. Bibcode:2005Natur.437.1299T. doi:10.1038/nature04226. PMC   1880871 . PMID   16255080. This article speaks of a haplotype length, which is the length of a contiguous run of the chromosome inherited from a single parent.
    8. Arora, Devender; Singh, Ajeet; Sharma, Vikrant; Bhaduria, Harvendra Singh; Patel, Ram Bahadur (2015). "Hgs Db: Haplogroups Database to understand migration and molecular risk assessment". Bioinformation. 11 (6): 272–5. doi:10.6026/97320630011272. PMC   4512000 . PMID   26229286.
    9. International Society of Genetic Genealogy 2015 Genetics Glossary, Haplogroup
    10. "Facts & Genes. Volume 7, Issue 3". Archived from the original on May 9, 2008.
    11. Taylor, Duncan; Bright, Jo-Anne; Buckleton, John S. (2016). "Biological basis for DNA evidence". In Buckleton, John S.; Bright, Jo-Anne; Taylor, Duncan (eds.). Forensic DNA Evidence Interpretation (2nd ed.). Boca Rotan, FL: CRC Press. pp. 1–36. ISBN   9781482258899.
    12. Excoffier, Laurent (1 November 2003). "Gametic phase estimation over large genomic regions using an adaptive window approach". Human Genomics. 1 (1): 7–19. doi: 10.1186/1479-7364-1-1-7 . PMC   3525008 . PMID   15601529.
    13. Masatoshi Nei and Fumio Tajima, "DNA polymorphism detectable by restriction endonucleases", Genetics 97:145 (1981)