Y Chromosome Haplotype Reference Database

Last updated
Logo of the Y Chromosome Haplotype Reference Database (YHRD) version 4.0 YHRD 4.0.png
Logo of the Y Chromosome Haplotype Reference Database (YHRD) version 4.0

The Y Chromosome Haplotype Reference Database (YHRD) is an open-access, annotated collection of population samples typed for Y chromosomal sequence variants. [1] Two important objectives are pursued: (1) the generation of reliable frequency estimates for Y-STR haplotypes and Y-SNP haplotypes to be used in the quantitative assessment of matches in forensic and kinship cases and (2) the characterization of male lineages to draw conclusions about the origins and history of human populations. The database is endorsed by the International Society for Forensic Genetics (ISFG). By May 2023 about 350,000 Y chromosomes typed for 9-29 STR loci have been directly submitted by worldwide forensic institutions and universities. In geographic terms, about 53% of the YHRD samples stem from Asia, 21% from Europe, 12% from North America, 10% from Latin America, 3% from Africa, 0.8% from Oceania/Australia and 0.2% from the Arctic. The 1.406 individual sampling projects are described in more than 800 peer-reviewed publications [2]

Contents

Submission and registration

YHRD is built by direct submissions of population data from individual laboratories. Upon receipt of a submission, the YHRD staff examines the originality, relevance, plausibility and validity of the data and assigns an accession number to the population sample if these criteria are met. The submissions are then registered to the public database, where the entries are retrievable by Search for haplotypes, contributors or accession numbers. All population data published in forensic journals as FSI: Genetics or International Journal of Legal Medicine [3] are required to be validated by the YHRD custodians and are subsequently included in the YHRD. [4]

Database structure

The database supports the most frequently used haplotype formats (e.g. Minimal (minHt), Powerplex Y12, [5] YFiler, [6] Powerplex Y23, [7] YfilerPlus and Maximal (maxHt) for which differently-sized databases exist.

Because correlations exist between geographic areas, language groups and Y chromosomal variants, the YHRD population database was structured to display the geographic, linguistic and phylogenetic relationship of searched haplotype profiles. Currently the YHRD database recognizes four separate "metapopulation" structures: national, continental, linguistic/ethnic and phylogenetic affiliation with several categories within. In population genetics the term metapopulation describes discrete spatially distributed population groups which are interconnected by geneflow and migration. [8] By analogy, the term metapopulation is used in forensic genetics to describe a set of geographically dispersed populations with shared ancestry and continuing geneflow. Thus, the population groups are more similar within the metapopulation than to groups outside the metapopulation. [9]

National

The concept of pooling data to build "national databases" has a very straightforward explanation: law enforcement agencies and forensic services rely on their national population to build reference databases. In most instances offenders and victims stem from the national population, and their genetic profiles should thus be represented in the database. In countries like US, Brazil, UK or China which are characterized by strong population substructure national reference databases are often built on basis of a historical concept of ethnic affiliation, e.g. the US population is sub-structured in Caucasian, African, Hispanic, Asian and Native American populations or UK differentiates English, Afro-Caribbean, Indo-Pakistani and Chinese. National databases due to their importance in national legislation are thus searchable in the YHRD. Each national Metapopulation in the YHRD comprises all individuals sampled in a particular country regardless of the ancestry of the individuals.

Continental

Continental Metapopulations in the YHRD comprises all individuals sampled in a particular continent regardless of their ancestries. The YHRD defines seven continental Metapopulations following the United Nations classification of geographical regions: Africa, Arctic, Asia, Europe, Latin America, North America, Oceania/Australia.

Linguistic/ethnic

The Metapopulation structure built on basis of "ethnicity/linguistic affiliation" takes to a larger extent the ancestry of sampled individuals into account. "Ancestry" is a term collating historical, cultural, geographical and linguistic categories. Of course, a Metapopulation concept on basis of "ethnicity" is by no means ideal, fully rational or fully translatable, but simply takes the fact into account that on a global level categories other than "nation" or "geography" far better describe the observed genetic clustering and inhomogeneity of Y chromosome patterns.

For a global reference database the "major language group" criterion seems most appropriate to group data by taking the ancestry into account and produce subdatabases with respect to genetic similarity. The reasoning in doing so is twofold: first, language is an inherited cultural trait and thus the language phylae often correlate with genetic traits not the least to Y chromosome polymorphisms. Second, since languages are well examined by science and mostly understood by the public due to the long tradition of language research, the linguistic terminology is in principle more understandable and translatable into practice than their genetic pendant. Aside from the pure linguistic categorization (e.g. the Altaic language family comprising people speaking Turk and Mongol languages) we took also unifying geographic criteria (Sub-Saharan Africa comprising speakers of different African language groups which live south of the Sahara).

It is important to state, that the current Metapopulation structure is an a-priori categorization which needs a continuous evaluation and verification by means of statistical methods to quantify the genetic similarity/dissimilarity between the samples. While the current categorization of eight large Metapopulations gains some support from genetic distance analysis done on basis of ~41,000 haplotypes [9] a further subdivision of the "Eurasian – European Metapopulation" was implemented solely on basis of Y-STR haplotypes. The analysis of ~12,000 European Haplotypes by AMOVA demonstrates that three larger pools of European haplotypes exist: the western, eastern and southeastern metapopulations. [10]

Currently the YHRD has seven non-overlapping broadly defined metapopulations: African, Afro-Asiatic, Native American, Australian Aboriginal, East Asian, Eskimo-Aleut, and Eurasian. Some of these metapopulations are further subdivided, e.g. Eurasian into six subcategories, from which the European subgroup splits further into three groups of Western, Eastern and Southeastern Europeans.

Phylogenetic

The DNA profiling of Y chromosomes submitted to the YHRD is now continuously extended for binary Y-SNP polymorphisms. The phylogeny of the Y chromosome defined by binary polymorphisms is well established and stable. [11] [12] [13] [14] All Y chromosomes sharing a mutation are related by descent, until a further mutation splits the branch. Haplotypes within a haplogroup could be highly similar or even "identical by descent" (IBD). In thus, the haplogroup could be used as a criterion to substructure the database according to the phylogenetic descent of samples. Even though the chronology of the SNP mutations is far less certain than the structure of the tree, many haplogroups could be equated with events in human prehistory. The worldwide distribution of the patterns of the human Y-chromosome diversity has revealed clear geographically associated haplogroups. [11]

Database tools

AMOVA

Analysis of molecular variance (AMOVA) is a method for analyzing population variation using molecular data, e.g. Y-STR haplotypes. [15] With AMOVA it is possible to evaluate and quantify the extent of differentiation between two or more population samples. AMOVA is implemented as an online tool in the YHRD and provides a way of estimating ΦST and FST values. The online tool accepts Excel files and creates entry files from it. As much as 9 reference populations selected from the YHRD as well as population sets can be added to the AMOVA analysis. The online calculation returns as a result a *.csv table with pairwise FST or ΦST(RST) values plus p-values as a test for significance (10,000 permutations). In addition, an MDS plot is generated to illustrate the genetic distance between the analyzed populations graphically. The program shows the references for the selected population studies which facilitates the correct citation.

Mixture

The tool can be applied for forensic cases when a mixed trace (2 or more male contributors) should be analyzed. The result will be a likelihood ratio of donorship vs. non-donorship of the putative contributor to the trace.

Kinship

The tool can be applied for kinship cases when a relationship between upstream and downstream relatives (e.g. father-son or grandfather-grandson) should be analyzed. The result will be a likelihood ratio (or kinship index) of patrilineal relationship vs. patrilineal non-relationship of the analyzed persons.

Match statistics

Searching the YHRD will result in a match or a non-match between a searched haplotype and the databased reference samples. The relative number of matches is described as the profile frequency. In forensic casework the probability of a match which is based on the profile frequency is evaluated using different methods. Some of these are recommended by national guidelines, e.g. the augmented counting method with confidence intervals and/or theta subpopulation correction (SWGDAM Interpretation Guidelines for Y-Chromosome STR typing by Forensic Laboratories in the US, 2014) or the Discrete Laplace method (Andersen et al. 2013) as recommended in Germany (Willuweit et al. 2018). Both augmented counting and DL values are provided by the YHRD for different metapopulations.

See also

Related Research Articles

<span class="mw-page-title-main">Haplotype</span> Group of genes from one parent

A haplotype is a group of alleles in an organism that are inherited together from a single parent.

Genetic genealogy is the use of genealogical DNA tests, i.e., DNA profiling and DNA testing, in combination with traditional genealogical methods, to infer genetic relationships between individuals. This application of genetics came to be used by family historians in the 21st century, as DNA tests became affordable. The tests have been promoted by amateur groups, such as surname study groups or regional genealogical groups, as well as research projects such as the Genographic Project.

A genealogical DNA test is a DNA-based genetic test used in genetic genealogy that looks at specific locations of a person's genome in order to find or verify ancestral genealogical relationships, or to estimate the ethnic mixture of an individual. Since different testing companies use different ethnic reference groups and different matching algorithms, ethnicity estimates for an individual vary between tests, sometimes dramatically.

A Y-STR is a short tandem repeat (STR) on the Y-chromosome. Y-STRs are often used in forensics, paternity, and genealogical DNA testing. Y-STRs are taken specifically from the male Y chromosome. These Y-STRs provide a weaker analysis than autosomal STRs because the Y chromosome is only found in males, which are only passed down by the father, making the Y chromosome in any paternal line practically identical. This causes a significantly smaller amount of distinction between Y-STR samples. Autosomal STRs provide a much stronger analytical power because of the random matching that occurs between pairs of chromosomes during the zygote-making process.

<span class="mw-page-title-main">Haplogroup K-M9</span> Human Y chromosome DNA grouping indicating common ancestry

Haplogroup K or K-M9 is a genetic lineage within human Y-chromosome DNA haplogroup. A sublineage of haplogroup IJK, K-M9, and its descendant clades represent a geographically widespread and diverse haplogroup. The lineages have long been found among males on every continent except Antarctica.

<span class="mw-page-title-main">Haplogroup T-M184</span> Human Y-chromosome DNA haplogroup

Haplogroup T-M184, also known as Haplogroup T, is a human Y-chromosome DNA haplogroup. The unique-event polymorphism that defines this clade is the single-nucleotide polymorphism known as M184.

<span class="mw-page-title-main">Genetic history of the Middle East</span>

The genetic history of the Middle East is the subject of research within the fields of human population genomics, archaeogenetics and Middle Eastern studies. Researchers use Y-DNA, mtDNA, and other autosomal DNA tests to identify the genetic history of ancient and modern populations of Egypt, Persia, Mesopotamia, Anatolia, Arabia, the Levant, and other areas.

Haplogroup E-V68, also known as E1b1b1a, is a major human Y-chromosome DNA haplogroup found in North Africa, the Horn of Africa, Western Asia and Europe. It is a subclade of the larger and older haplogroup, known as E1b1b or E-M215. The E1b1b1a lineage is identified by the presence of a single nucleotide polymorphism (SNP) mutation on the Y chromosome, which is known as V68. It is a subject of discussion and study in genetics as well as genetic genealogy, archaeology, and historical linguistics.

<span class="mw-page-title-main">Haplogroup G-M285</span> Human Y-chromosome DNA haplogroup

In human genetics, Haplogroup G-M285 or G-M342, also known as Haplogroup G1, is a Y-chromosome haplogroup. Haplogroup G1 is a primary subclade of haplogroup G.

Haplogroup G-FGC7535, also known as Haplogroup G2a1, is a Y-chromosome haplogroup. It is an immediate descendant of G2a (G-P15), which is a primary branch of haplogroup G2 (P287).

Y-DNA haplogroups in populations of Europe are haplogroups of the male Y-chromosome found in European populations.

Genetic studies on Serbs show close affinity to other neighboring South Slavs.

Listed here are the human Y-chromosome DNA haplogroups found in various ethnic groups and populations from North Africa and the Sahel (Tuaregs).

The genetic history of Egypt reflects its geographical location at the crossroads of several major biocultural areas: North Africa, the Sahara, the Middle East, the Mediterranean and sub-Saharan Africa.

<span class="mw-page-title-main">Haplogroup K2b (Y-DNA)</span> Human Y-chromosome DNA haplogroup

Haplogroup K2b (P331), also known as MPS is a human y-chromosome haplogroup that is thought to be less than 3,000 years younger than K, and less than 10,000 years younger than F, meaning it probably is around 50,000 years old, according to the age estimates of Tatiana Karafet et al. 2014.

Population genetics is a scientific discipline which contributes to the examination of the human evolutionary and historical migrations. Particularly useful information is provided by the research of two uniparental markers within our genome, the Y-chromosome (Y-DNA) and mitochondrial DNA (mtDNA), as well as autosomal DNA. The data from Y-DNA and autosomal DNA suggests that the Croats mostly are descendants of the Slavs of the medieval migration period, according to mtDNA have genetic diversity which fits within a broader European maternal genetic landscape, and overall have a uniformity with other South Slavs from the territory of former Yugoslavia.

Haplogroup T-L206, also known as haplogroup T1, is a human Y-chromosome DNA haplogroup. The SNP that defines the T1 clade is L206. The haplogroup is one of two primary branches of T (T-M184), the other subclade being T2 (T-PH110).

Haplogroup D-M55 (M64.1/Page44.1) also known as Haplogroup D1a2a is a Y-chromosome haplogroup. It is one of two branches of Haplogroup D1a. The other is D1a1, which is found with high frequency in Tibetans and other Tibeto-Burmese populations and geographical close groups. D is also distributed with low to medium frequency in Central Asia, East Asia, and Mainland Southeast Asia.

As with all modern European nations, a large degree of 'biological continuity' exists between Bosnians and Bosniaks and their ancient predecessors with Y chromosomal lineages testifying to predominantly Paleolithic European ancestry. Studies based on bi-allelic markers of the NRY have shown the three main ethnic groups of Bosnia and Herzegovina to share, in spite of some quantitative differences, a large fraction of the same ancient gene pool distinct for the region. Analysis of autosomal STRs have moreover revealed no significant difference between the population of Bosnia and Herzegovina and neighbouring populations.

<span class="mw-page-title-main">Haplogroup E-M2</span> Human Y-chromosome DNA haplogroup

Haplogroup E-M2, also known as E1b1a1-M2, is a human Y-chromosome DNA haplogroup. E-M2 is primarily distributed within Africa followed by West Asia. More specifically, E-M2 is the predominant subclade in West Africa, Central Africa, Southern Africa, and the region of the African Great Lakes; it also occurs at moderate frequencies in North Africa, and the Middle East. E-M2 has several subclades, but many of these subhaplogroups are included in either E-L485 or E-U175. E-M2 is especially common among indigenous Africans who speak Niger-Congo languages, and was spread to Southern Africa and East Africa through the Bantu expansion.

References

  1. Roewer, L.; Krawczak, M.; Willuweit, S.; Nagy, M.; Alves, C.; Amorim, A.; Anslinger, K.; Augustin, C.; Betz, A.; Bosch, E.; Cagliá, A.; Carracedo, A.; Corach, D.; Dekairelle, A.-F.; Dobosz, T.; Dupuy, B.M.; Füredi, S.; Gehrig, C.; Gusmaõ, L.; Henke, J.; Henke, L.; Hidding, M.; Hohoff, C.; Hoste, B.; Jobling, M.A.; Kärgel, H.J.; de Knijff, P.; Lessig, R.; Liebeherr, E.; Lorente, M.; Martı́nez-Jarreta, B.; Nievas, P.; Nowak, M.; Parson, W.; Pascali, V.L.; Penacino, G.; Ploski, R.; Rolf, B.; Sala, A.; Schmitt, C.; Schmidt, U.; Schneider, P.M.; Szibor, R.; Teifel-Greding, J.; Kayser, M. (2001). "Online reference database of European Y-chromosomal short tandem repeat (STR) haplotypes". Forensic Science International. 118 (2–3): 106–113. doi:10.1016/S0379-0738(00)00478-3. ISSN   0379-0738. PMID   11311820.
  2. "YHRD Homepage" . Retrieved November 1, 2022.
  3. "International Journal of Legal Medicine". Springer Publishers. Retrieved 6 April 2021.
  4. "FSIGEN Publishing Guidelines" (PDF). Retrieved 25 September 2013.
  5. "Promega PowerPlex Y" . Retrieved 25 September 2013.
  6. "Applied Biosystem Yfiler" . Retrieved 25 September 2013.
  7. "Promega PowerPlex Y23" . Retrieved 25 September 2013.
  8. Hanski, I. and Gilpin, M. (1997). Metapopulation Biology: Ecology, Genetics, and Evolution., Academic Press, San Diego.
  9. 1 2 Willuweit, Sascha; Roewer, Lutz (2007). "Y chromosome haplotype reference database (YHRD): Update". Forensic Science International: Genetics. 1 (2): 83–87. doi:10.1016/j.fsigen.2007.01.017. ISSN   1872-4973. PMID   19083734.
  10. Roewer, Lutz; Croucher, Peter J. P.; Willuweit, Sascha; Lu, Tim T.; Kayser, Manfred; Lessig, Rüdiger; de Knijff, Peter; Jobling, Mark A.; Tyler-Smith, Chris; Krawczak, Michael (2005). "Signature of recent historical events in the European Y-chromosomal STR haplotype distribution". Human Genetics. 116 (4): 279–291. doi:10.1007/s00439-004-1201-z. ISSN   0340-6717. PMID   15660227. S2CID   23253257.
  11. 1 2 Underhill, Peter A.; Shen, Peidong; Lin, Alice A.; Jin, Li; Passarino, Giuseppe; Yang, Wei H.; Kauffman, Erin; Bonné-Tamir, Batsheva; Bertranpetit, Jaume; Francalacci, Paolo; Ibrahim, Muntaser; Jenkins, Trefor; Kidd, Judith R.; Mehdi, S. Qasim; Seielstad, Mark T.; Wells, R. Spencer; Piazza, Alberto; Davis, Ronald W.; Feldman, Marcus W.; Cavalli-Sforza, L. Luca; Oefner, Peter. J. (2000). "Y chromosome sequence variation and the history of human populations". Nature Genetics. 26 (3): 358–361. doi:10.1038/81685. PMID   11062480. S2CID   12893406.
  12. Hammer, Michael F.; Karafet, Tatiana M.; Redd, Alan J.; Jarjanazi, Hamdi; Santachiara-Benerecetti, Silvana; Soodyall, Himla; Zegura, Stephen L. (2001). "Hierarchical Patterns of Global Human Y-Chromosome Diversity". Molecular Biology and Evolution. 18 (7): 1189–1203. doi: 10.1093/oxfordjournals.molbev.a003906 . PMID   11420360.
  13. Jobling, Mark A.; Tyler-Smith, Chris (2003). "The human y chromosome: An evolutionary marker comes of age". Nature Reviews Genetics. 4 (8): 598–612. doi:10.1038/nrg1124. PMID   12897772. S2CID   13508130.
  14. Karafet, Tatiana M.; Mendez, Fernando L.; Meilerman, Monica B.; Underhill, Peter A.; Zegura, Stephen L.; Hammer, Michael F. (2008). "New binary polymorphisms reshape and increase resolution of the human y chromosomal haplogroup tree". Genome Research. 18 (5): 830–838. doi:10.1101/gr.7172008. PMC   2336805 . PMID   18385274.
  15. Roewer, L (1996). "Analysis of molecular variance (AMOVA) of Y-chromosome-specific microsatellites in two closely related human populations [published erratum appears in Hum Mol Genet 1997 May;6(5):828]". Human Molecular Genetics. 5 (7): 1029–1033. doi:10.1093/hmg/5.7.1029. ISSN   1460-2083. PMID   8817342.