Human Genome Diversity Project

Last updated

The Human Genome Diversity Project (HGDP) was started by Stanford University's Morrison Institute in 1990s along with collaboration of scientists around the world. [1] It is the result of many years of work by Luigi Cavalli-Sforza, one of the most cited scientists in the world, who has published extensively in the use of genetics to understand human migration and evolution. The HGDP data sets have often been cited in papers on such topics as population genetics, anthropology, and heritable disease research. [2] [3]

Contents

The project has noted the need to record the genetic profiles of indigenous populations, as isolated populations are the best way to understand the genetic frequencies that have clues into our distant past. Knowing about the relationship between such populations makes it possible to infer the journey of humankind from the humans who left Africa and populated the world to the humans of today. The HGDP-CEPH Human Genome Diversity Cell Line Panel is a resource of 1,063 cultured lymphoblastoid cell lines (LCLs) from 1,050 individuals in 52 world populations, banked at the Fondation Jean Dausset-CEPH in Paris.

The HGDP is not related to the Human Genome Project (HGP) and has attempted to maintain a distinct identity. [4] The whole genome sequencing and analysis of the HGDP was published in 2020, creating a comprehensive resource of genetic variation from underrepresented human populations and illuminating patterns of genetic variation, demographic history and introgression of modern humans with Neanderthals and Denisovans. [5] [6]

Studied populations

The HGDP includes the 51 populations from around the world. [7] A description of the populations that were studied can be found in a 2005 review paper by Cavalli-Sforza: [8]

Genetic clustering of all 51 HGDP populations. Analysis results are mapped onto the source location of the each sample. Worldwide human populations - frappe results.png
Genetic clustering of all 51 HGDP populations. Analysis results are mapped onto the source location of the each sample.
Africa Bantu, Biaka, Mandenka, Mbuti pygmy, Mozabite, San, and Yoruba
Asia Western Asia Bedouin, Druze, and Jews
Central & South Asia Balochi, Brahui, Burusho, Hazara, Kalash, Makrani, Pashtun, Sindhi, and Uyghur
Eastern Asia Khmer, Dai, Daur, Han (North China), Han (South China), Hezhen, Japanese, Lahu, Miao, Mongola, Naxi, Oroqen, She, Tu, Tujia, Xibo, Yakut, Yi
Native America Colombian, Karitiana, Maya, Pima, Surui
Europe Adygei, Basque, French, North Italian, Orcadian, Russian, Sardinian, and Tuscan
Oceania Melanesian, and Papuan

One of the most important tenets of the HGDP debate has been the social and ethical implications for indigenous populations, specifically the methods and ethics of informed consent. Some questions include:

These questions are specifically addressed by the HGDP's "Model Ethical Protocol for Collecting DNA Samples". [10]

Potential benefits

The scientific community has used the HGDP data to study human migration, mutation rates, relationships between different populations, genes involved in height, and selective pressure. HGDP has been instrumental in assessing human diversity and in providing information about similarities and differences in human populations. The HGDP is the project with the largest scope among the various human diversity databases available.

So far 148 papers have been published using the HGDP database. Authors using HGDP data work in the US, Russia, Brazil, Ireland, Portugal, France, and other countries.

More specifically, HGDP data has been used in studies of evolution and expansion of modern human populations. [11]

Diversity research is relevant in various fields of study ranging from disease surveillance to anthropology. Genomewide-association studies (GWAS) try to associate a genetic mutation with a disease; it is becoming clear that these associations are population-dependent and that understanding human diversity will be a major step toward increasing the power to find genes associated with disease.

To gain a full assessment of human development, scientists must engage in diversity research. This research needs to be conducted as quickly as possible before small native populations such as those in South America become extinct.

Another benefit of genomic diversity mapping would be in disease research. Diversity research could help explain why certain ethnic populations are vulnerable to or resistant to certain diseases and how populations have adapted to vulnerabilities (see race in biomedicine).

The study of human populations has been at the forefront of genomic and clinical research since the Human Genome Project (HGP) was completed. Projects similar to HGDP are the 1000 Genomes Project and the HapMap Project. Each has its own specificities and each has been used by scientists to a large extent for overlapping purposes.

Potential problems

Denouncing the project since its outset, some indigenous communities, NGOs, and human rights organizations have objected to the HGDP's goals based on perceived issues of scientific racism, colonialism, biocolonialism, and informed consent.[ citation needed ]

Racism

The Action Group on Erosion, Technology and Concentration (ETC Group) has been a major critic of the HGDP, speculating that issues of racism and stigmatization could occur should the HGDP be completed.[ citation needed ] One major concern with the research project has been the potential, in certain countries, for racism resulting from use of HGDP data. Critics feel that when governments are armed with genetic data linked to certain racial groups, those governments might deny human rights based on this genetic data. For example, countries could define races purely in genetic terms and deny a certain person's right(s) based on lack of conformity to a certain race's genetic model.

Uneven application

Eight of nine DNA groups under Ctrl/South category belong to Pakistan even though India is in the same group with about seven times the population of Pakistan and with far more numerous racial diversities. However, it is noteworthy that Rosenberg et al. found that the sampled Pakistani populations are more genetically diverse than 15 Indian populations that were explicitly compared. [12]

Use of genetic data for non-medical purposes

Use of HGDP genetic materials for nonmedical purposes not agreed to by indigenous donors, especially purposes that create possibilities for human rights violations, has been a matter of concern.[ citation needed ] For example, Kidd et al. described the use of DNA samples from indigenous populations to explore a forensic identification capability based on ethnic origins. [13]

Creating artificial genetic distinctions

Anthropologist Jonathan M. Marks stated: "As any anthropologist knows, ethnic groups are categories of human invention, not given by nature." [14] Some indigenous peoples have refused to take part in the HGDP due to concerns about misuse of the data: "In December [1993], a World Council of Indigenous Peoples in Guatemala repudiated the HGDP." [14]

Alternative approaches

In 1995, the National Research Council (NRC) issued its recommendations on the HGDP. The NRC endorsed the concept of diversity research, also pointing out some concerns with the HGDP procedure. The NRC report suggested alternatives such as keeping sample sources anonymous (i.e., sampling genetic data without tying it to specific racial groups). While such approaches would eliminate the concerns discussed above (regarding racism, weapons development, etc.), they would also prevent researchers from achieving many of the benefits that were to be gained from the project.

Some members of the Human Genome Project (HGP) argued in favor of engaging in diversity research on data gleaned from the Human Genome Diversity Project, although most agreed that diversity research should be done by the HGP and not as a separate project.

A number of the principal collaborators in the HGDP have been involved in the privately funded Genographic Project launched in April 2005 with similar aims.

Related Research Articles

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 24 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease and responses to drugs and environmental factors. The information produced by the project is made freely available for research.

Researchers have investigated the relationship between race and genetics as part of efforts to understand how biology may or may not contribute to human racial categorization. Today, the consensus among scientists is that race is a social construct, and that using it as a proxy for genetic differences among populations is misleading.

<span class="mw-page-title-main">Haplogroup E-M96</span> Human Y chromosome DNA grouping indicating common ancestry

Haplogroup E-M96 is a human Y-chromosome DNA haplogroup. It is one of the two main branches of the older and ancestral haplogroup DE, the other main branch being haplogroup D. The E-M96 clade is divided into two main subclades: the more common E-P147, and the less common E-M75.

<span class="mw-page-title-main">Haplogroup R (Y-DNA)</span> Human Y-chromosome DNA haplogroup

Haplogroup R, or R-M207, is a Y-chromosome DNA haplogroup. It is both numerous and widespread among modern populations.

<span class="mw-page-title-main">Fixation index</span> Measure of population differentiation

The fixation index (FST) is a measure of population differentiation due to genetic structure. It is frequently estimated from genetic polymorphism data, such as single-nucleotide polymorphisms (SNP) or microsatellites. Developed as a special case of Wright's F-statistics, it is one of the most commonly used statistics in population genetics. Its values range from 0 to 1, with 0 being no differentiation and 1 being complete differentiation.

<span class="mw-page-title-main">Human genetic variation</span> Genetic diversity in human populations

Human genetic variation is the genetic differences in and among populations. There may be multiple variants of any given gene in the human population (alleles), a situation called polymorphism.

<span class="mw-page-title-main">Human Genome Project</span> International scientific research project (1990–2003)

The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a physical and a functional standpoint. It started in 1990 and was completed in 2003. It remains the world's largest collaborative biological project. Planning for the project began in 1984 by the US government, and it officially launched in 1990. It was declared complete on April 14, 2003, and included about 92% of the genome. Level "complete genome" was achieved in May 2021, with only 0.3% of the bases covered by potential issues. The final gapless assembly was finished in January 2022.

<span class="mw-page-title-main">Genetic history of Europe</span>

The genetic history of Europe includes information around the formation, ethnogenesis, and other DNA-specific information about populations indigenous, or living in Europe.

<span class="mw-page-title-main">Guido Barbujani</span> Italian population geneticist, evolutionist and literary author

Guido Barbujani is an Italian population geneticist, evolutionary biologist and literary author born in Adria, who has worked with the State University of New York at Stony Brook (NY), University of Padua, and University of Bologna. He has taught at the University of Ferrara since 1996.

The various ethnolinguistic groups found in the Caucasus, Central Asia, Europe, the Middle East, North Africa and/or South Asia demonstrate differing rates of particular Y-DNA haplogroups.

<span class="mw-page-title-main">1000 Genomes Project</span> International research effort on genetic variation

The 1000 Genomes Project (1KGP), taken place from January 2008 to 2015, was an international research effort to establish the most detailed catalogue of human genetic variation at the time. Scientists planned to sequence the genomes of at least one thousand anonymous healthy participants from a number of different ethnic groups within the following three years, using advancements in newly developed technologies. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journal Nature. In 2012, the sequencing of 1092 genomes was announced in a Nature publication. In 2015, two papers in Nature reported results and the completion of the project and opportunities for future research.

Haplogroup E-P2, also known as E1b1, is a human Y-chromosome DNA haplogroup. E-P2 has two basal branches, E-V38 and E-M215. E-P2 had an ancient presence in East Africa and the Levant; presently, it is primarily distributed in Africa where it may have originated, and occurs at lower frequencies in the Middle East and Europe.

E-Z827, also known as E1b1b1b, is a major human Y-chromosome DNA haplogroup. It is the parent lineage to the E-Z830 and E-V257 subclades, and defines their common phylogeny. The former is predominantly found in the Middle East; the latter is most frequently observed in North Africa, with its E-M81 subclade observed among the ancient Guanche natives of the Canary Islands. E-Z827 is also found at lower frequencies in Europe, and in isolated parts of Southeast Africa.

Haplogroup Q-M120, also known as Q1a1a1, is a Y-DNA haplogroup. It is the only primary branch of haplogroup Q1a1a (F746/NWT01). The lineage is most common amongst modern populations in eastern Eurasia.

The genetic history of Egypt reflects its geographical location at the crossroads of several major biocultural areas: North Africa, the Sahara, the Middle East, the Mediterranean and sub-Saharan Africa.

Haplogroup A-L1085, also known as haplogroup A0-T is a human Y-DNA haplogroup. It is part of the paternal lineage of almost all humans alive today. The SNP L1085 has played two roles in population genetics: firstly, most Y-DNA haplogroups have diverged from it and; secondly, it defines the undiverged basal clade A-L1085*.

Genetic studies on Arabs refers to the analyses of the genetics of ethnic Arab people in the Middle East and North Africa. Arabs are genetically diverse as a result of their intermarriage and mixing with indigenous people of the pre-Islamic Middle East and North Africa following the Arab and Islamic expansion. Genetic ancestry components related to the Arabian Peninsula display an increasing frequency pattern from west to east over North Africa. A similar frequency pattern exist across northeastern Africa with decreasing genetic affinities to groups of the Arabian Peninsula along the Nile river valley across Sudan and the more they go south. This genetic cline of admixture is dated to the time of Arab migrations to the Maghreb and northeast Africa.

<span class="mw-page-title-main">Haplogroup E-M2</span> Human Y-chromosome DNA haplogroup

Haplogroup E-M2, also known as E1b1a1-M2, is a human Y-chromosome DNA haplogroup. E-M2 is primarily distributed within Africa followed by West Asia. More specifically, E-M2 is the predominant subclade in West Africa, Central Africa, Southern Africa, and the region of the African Great Lakes; it also occurs at moderate frequencies in North Africa, and the Middle East. E-M2 has several subclades, but many of these subhaplogroups are included in either E-L485 or E-U175. E-M2 is especially common among indigenous Africans who speak Niger-Congo languages, and was spread to Southern Africa and East Africa through the Bantu expansion.

Human genetic clustering refers to patterns of relative genetic similarity among human individuals and populations, as well as the wide range of scientific and statistical methods used to study this aspect of human genetic variation.

References

  1. "Proposed Human Genome Diversity Project Still Plagued By Controversy And Questions". The Scientist Magazine®. Retrieved 11 September 2019.
  2. Cann, HM; De Toma, C; Cazes, L; Legrand, MF; Morel, V; Piouffre, L; Bodmer, J; Bodmer, WF; et al. (2002). "A human genome diversity cell line panel". Science. 296 (5566): 261–2. doi:10.1126/science.296.5566.261b. PMID   11954565. S2CID   41595131.
  3. Li, J. Z.; Absher, D. M.; Tang, H.; Southwick, A. M.; Casto, A. M.; Ramachandran, S.; Cann, H. M.; Barsh, G. S.; et al. (2008). "Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation". Science. 319 (5866): 1100–4. Bibcode:2008Sci...319.1100L. doi:10.1126/science.1153717. PMID   18292342. S2CID   53541133.
  4. "HGDP FAQ". www.verslo.is. Retrieved 7 October 2017.
  5. Bergström, Anders; McCarthy, Shane A.; Hui, Ruoyun; Almarri, Mohamed A.; Ayub, Qasim; Danecek, Petr; Chen, Yuan; Felkel, Sabine; Hallast, Pille; Kamm, Jack; Blanché, Hélène (20 March 2020). "Insights into human genetic variation and population history from 929 diverse genomes". Science. 367 (6484): eaay5012. doi:10.1126/science.aay5012. ISSN   0036-8075. PMC   7115999 . PMID   32193295.
  6. Almarri, Mohamed A.; Bergström, Anders; Prado-Martinez, Javier; Yang, Fengtang; Fu, Beiyuan; Dunham, Alistair S.; Chen, Yuan; Hurles, Matthew E.; Tyler-Smith, Chris; Xue, Yali (9 July 2020). "Population Structure, Stratification, and Introgression of Human Structural Variation". Cell. 182 (1): 189–199.e15. doi: 10.1016/j.cell.2020.05.024 . ISSN   0092-8674. PMC   7369638 . PMID   32531199.
  7. Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, et al. (2008). "Worldwide human relationships inferred from genome-wide patterns of variation". Science. 319 (5866): 1100–4. Bibcode:2008Sci...319.1100L. doi:10.1126/science.1153717. PMID   18292342. S2CID   53541133.
  8. Cavalli-Sforza, L. Luca (2005). "Opinion: The Human Genome Diversity Project: past, present and future". Nature Reviews Genetics. 6 (4): 333–40. doi:10.1038/nrg1596. PMID   15803201. S2CID   37122309.
  9. López Herráez D, Bauchet M, Tang K, Theunert C, Pugach I, Li J, et al. (2009). "Genetic variation and recent positive selection in worldwide human populations: evidence from nearly 1 million SNPs". PLOS ONE. 4 (11): e7888. Bibcode:2009PLoSO...4.7888L. doi: 10.1371/journal.pone.0007888 . PMC   2775638 . PMID   19924308.
  10. Weiss, KM; Cavalli-Sforza, LL; Dunston, GM; Feldman, M; Greely, HT; Kidd, KK; King, M; Moore, JA; et al. (1997). "Proposed model ethical protocol for collecting DNA samples". Houston Law Review. 33 (5): 1431–74. PMID   12627556.
  11. Zhivotovsky, Lev A.; Rosenberg, Noah A.; Feldman, Marcus W. (2003). "Features of Evolution and Expansion of Modern Humans, Inferred from Genomewide Microsatellite Markers". The American Journal of Human Genetics. 72 (5): 1171–86. doi:10.1086/375120. PMC   1180270 . PMID   12690579.
  12. Rosenberg, Noah A.; Mahajan, Saurabh; Gonzalez-Quevedo, Catalina; Blum, Michael G. B.; Nino-Rosales, Laura; Ninis, Vasiliki; Das, Parimal; Hegde, Madhuri; et al. (2006). "Low Levels of Genetic Divergence across Geographically and Linguistically Diverse Populations from India". PLOS Genetics. 2 (12): e215. doi: 10.1371/journal.pgen.0020215 . PMC   1713257 . PMID   17194221.
  13. Kidd, Kenneth K.; Pakstis, Andrew J.; Speed, William C.; Grigorenko, Elena L.; Kajuna, Sylvester L.B.; Karoma, Nganyirwa J.; Kungulilo, Selemani; Kim, Jong-Jin; et al. (2006). "Developing a SNP panel for forensic identification of individuals". Forensic Science International. 164 (1): 20–32. doi:10.1016/j.forsciint.2005.11.017. PMID   16360294.
  14. 1 2 Marks, Jonathan (2002). What it means to be 98% chimpanzee. Berkeley: University of California Press. pp.  202–7. ISBN   978-0-520-22615-9.