Human Genome Diversity Project

Last updated January 30, 2026

The Human Genome Diversity Project (HGDP) was started by Stanford University's Morrison Institute in 1990s along with collaboration of scientists around the world.^[1] It is the result of many years of work by Luigi Cavalli-Sforza, one of the most cited scientists in the world, who has published extensively in the use of genetics to understand human migration and evolution. The HGDP data sets have often been cited in papers on such topics as population genetics, anthropology, and heritable disease research.^[2]^[3]

The project has noted the need to record the genetic profiles of indigenous populations, as isolated populations are the best way to understand the genetic frequencies that have clues into our distant past. Knowing about the relationship between such populations makes it possible to infer the journey of humankind from the humans who left Africa and populated the world to the humans of today. The HGDP-CEPH Human Genome Diversity Cell Line Panel is a resource of 1,063 cultured lymphoblastoid cell lines (LCLs) from 1,050 individuals in 52 world populations, banked at the Fondation Jean Dausset-CEPH in Paris.

The HGDP is not related to the Human Genome Project (HGP) and has attempted to maintain a distinct identity.^[4] The whole genome sequencing and analysis of the HGDP was published in 2020, creating a comprehensive resource of genetic variation from underrepresented human populations and illuminating patterns of genetic variation, demographic history and introgression of modern humans with Neanderthals and Denisovans.^[5]^[6]

Studied populations

The HGDP includes the 51 populations from around the world.^[7] A description of the populations that were studied can be found in a 2005 review paper by Cavalli-Sforza:^[8]

Africa		Bantu, Biaka, Mandenka, Mbuti pygmy, Mozabite, San, and Yoruba
Asia	Western Asia	Bedouin, Druze, and Jews
	Central & South Asia	Balochi, Brahui, Burusho, Hazara, Kalash, Makrani, Pashtun, Sindhi, and Uyghur
	Eastern Asia	Khmer, Dai, Daur, Han (North China), Han (South China), Hezhen, Japanese, Lahu, Miao, Mongola, Naxi, Oroqen, She, Tu, Tujia, Xibo, Yakut, Yi
Native America		Colombian, Karitiana, Maya, Pima, Surui
Europe		Adygei, Basque, French, North Italian, Orcadian, Russian, Sardinian, and Tuscan
Oceania		Melanesian, and Papuan

Informed consent

One of the most important tenets of the HGDP debate has been the social and ethical implications for indigenous populations, specifically the methods and ethics of informed consent. Some questions include:

How would consent be obtained?
Would individuals or groups fully understand the project's intentions, particularly with regards to language barriers and differing cultural views?
What is 'informed' in a cross-cultural context?
Who would be authorized to actually give consent?
How would individuals know what happened to their DNA?
For how long would their information be kept in DNA databases?

These questions are specifically addressed by the HGDP's "Model Ethical Protocol for Collecting DNA Samples".^[10]

Potential benefits

The scientific community has used the HGDP data to study human migration, mutation rates, relationships between different populations, genes involved in height, and selective pressure. HGDP has been instrumental in assessing human diversity and in providing information about similarities and differences in human populations. The HGDP is the project with the largest scope among the various human diversity databases available.

So far 148 papers have been published using the HGDP database. Authors using HGDP data work in the US, Russia, Brazil, Ireland, Portugal, France, and other countries.

More specifically, HGDP data has been used in studies of evolution and expansion of modern human populations.^[11]

Diversity research is relevant in various fields of study ranging from disease surveillance to anthropology. Genomewide-association studies (GWAS) try to associate a genetic mutation with a disease; it is becoming clear that these associations are population-dependent and that understanding human diversity will be a major step toward increasing the power to find genes associated with disease.

To gain a full assessment of human development, scientists must engage in diversity research. This research needs to be conducted as quickly as possible before small native populations such as those in South America become extinct.

Another benefit of genomic diversity mapping would be in disease research. Diversity research could help explain why certain ethnic populations are vulnerable to or resistant to certain diseases and how populations have adapted to vulnerabilities (see race in biomedicine).

The study of human populations has been at the forefront of genomic and clinical research since the Human Genome Project (HGP) was completed. Projects similar to HGDP are the 1000 Genomes Project and the HapMap Project. Each has its own specificities and each has been used by scientists to a large extent for overlapping purposes.

Potential problems

Denouncing the project since its outset, some indigenous communities, NGOs, and human rights organizations have objected to the HGDP's goals based on issues of scientific racism, colonialism, biocolonialism, and informed consent.^{[ citation needed ]}

Racism

The Action Group on Erosion, Technology and Concentration (ETC Group) has been a major critic of the HGDP, speculating that issues of racism and stigmatization could occur should the HGDP be completed.^{[ citation needed ]} One major concern with the research project has been the potential, in certain countries, for racism resulting from use of HGDP data. Critics feel that when governments are armed with genetic data linked to certain racial groups, those governments might deny human rights based on this genetic data. For example, countries could define races purely in genetic terms and deny a certain person's right(s) based on lack of conformity to a certain race's genetic model.

Uneven application

Eight of nine DNA groups under Ctrl/South category belong to Pakistan even though India is in the same group with about seven times the population of Pakistan and with far more numerous racial diversities. However, it is noteworthy that Rosenberg et al. found that the sampled Pakistani populations are more genetically diverse than 15 Indian populations that were explicitly compared.^[12]

Use of genetic data for non-medical purposes

Use of HGDP genetic materials for nonmedical purposes not agreed to by indigenous donors, especially purposes that create possibilities for human rights violations, has been a matter of concern.^{[ citation needed ]} For example, Kidd et al. described the use of DNA samples from indigenous populations to explore a forensic identification capability based on ethnic origins.^[13]

Creating artificial genetic distinctions

Anthropologist Jonathan M. Marks stated: "As any anthropologist knows, ethnic groups are categories of human invention, not given by nature."^[14] Some indigenous peoples have refused to take part in the HGDP due to concerns about misuse of the data: "In December [1993], a World Council of Indigenous Peoples in Guatemala repudiated the HGDP."^[14]

Alternative approaches

In 1995, the National Research Council (NRC) issued its recommendations on the HGDP. The NRC endorsed the concept of diversity research, also pointing out some concerns with the HGDP procedure. The NRC report suggested alternatives such as keeping sample sources anonymous (i.e., sampling genetic data without tying it to specific racial groups). While such approaches would eliminate the concerns discussed above (regarding racism, weapons development, etc.), they would also prevent researchers from achieving many of the benefits that were to be gained from the project.

Some members of the Human Genome Project (HGP) argued in favor of engaging in diversity research on data gleaned from the Human Genome Diversity Project, although most agreed that diversity research should be done by the HGP and not as a separate project.

A number of the principal collaborators in the HGDP have been involved in the privately funded Genographic Project launched in April 2005 with similar aims.

References

↑ "Proposed Human Genome Diversity Project Still Plagued By Controversy And Questions". The Scientist Magazine®. Archived from the original on 26 March 2019. Retrieved 11 September 2019.
↑ Cann, HM; De Toma, C; Cazes, L; Legrand, MF; Morel, V; Piouffre, L; Bodmer, J; Bodmer, WF; et al. (2002). "A human genome diversity cell line panel". Science. 296 (5566): 261–2. doi:10.1126/science.296.5566.261b. PMID 11954565. S2CID 41595131.
↑ Li, J. Z.; Absher, D. M.; Tang, H.; Southwick, A. M.; Casto, A. M.; Ramachandran, S.; Cann, H. M.; Barsh, G. S.; et al. (2008). "Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation". Science. 319 (5866): 1100–4. Bibcode:2008Sci...319.1100L. doi:10.1126/science.1153717. PMID 18292342. S2CID 53541133.
↑ "HGDP FAQ". www.verslo.is. Retrieved 7 October 2017.^{[ dead link ]}
↑ Bergström, Anders; McCarthy, Shane A.; Hui, Ruoyun; Almarri, Mohamed A.; Ayub, Qasim; Danecek, Petr; Chen, Yuan; Felkel, Sabine; Hallast, Pille; Kamm, Jack; Blanché, Hélène (20 March 2020). "Insights into human genetic variation and population history from 929 diverse genomes". Science. 367 (6484) eaay5012. doi:10.1126/science.aay5012. ISSN 0036-8075. PMC 7115999 . PMID 32193295.
↑ Almarri, Mohamed A.; Bergström, Anders; Prado-Martinez, Javier; Yang, Fengtang; Fu, Beiyuan; Dunham, Alistair S.; Chen, Yuan; Hurles, Matthew E.; Tyler-Smith, Chris; Xue, Yali (9 July 2020). "Population Structure, Stratification, and Introgression of Human Structural Variation". Cell. 182 (1): 189–199.e15. doi: 10.1016/j.cell.2020.05.024 . ISSN 0092-8674. PMC 7369638 . PMID 32531199.
↑ Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, et al. (2008). "Worldwide human relationships inferred from genome-wide patterns of variation". Science. 319 (5866): 1100–4. Bibcode:2008Sci...319.1100L. doi:10.1126/science.1153717. PMID 18292342. S2CID 53541133.
↑ Cavalli-Sforza, L. Luca (2005). "Opinion: The Human Genome Diversity Project: past, present and future". Nature Reviews Genetics. 6 (4): 333–40. doi:10.1038/nrg1596. PMID 15803201. S2CID 37122309.
↑ López Herráez D, Bauchet M, Tang K, Theunert C, Pugach I, Li J, et al. (2009). "Genetic variation and recent positive selection in worldwide human populations: evidence from nearly 1 million SNPs". PLOS ONE. 4 (11) e7888. Bibcode:2009PLoSO...4.7888L. doi: 10.1371/journal.pone.0007888 . PMC 2775638 . PMID 19924308.
↑ Weiss, KM; Cavalli-Sforza, LL; Dunston, GM; Feldman, M; Greely, HT; Kidd, KK; King, M; Moore, JA; et al. (1997). "Proposed model ethical protocol for collecting DNA samples". Houston Law Review. 33 (5): 1431–74. PMID 12627556.
↑ Zhivotovsky, Lev A.; Rosenberg, Noah A.; Feldman, Marcus W. (2003). "Features of Evolution and Expansion of Modern Humans, Inferred from Genomewide Microsatellite Markers". The American Journal of Human Genetics. 72 (5): 1171–86. doi:10.1086/375120. PMC 1180270 . PMID 12690579.
↑ Rosenberg, Noah A.; Mahajan, Saurabh; Gonzalez-Quevedo, Catalina; Blum, Michael G. B.; Nino-Rosales, Laura; Ninis, Vasiliki; Das, Parimal; Hegde, Madhuri; et al. (2006). "Low Levels of Genetic Divergence across Geographically and Linguistically Diverse Populations from India". PLOS Genetics. 2 (12) e215. doi: 10.1371/journal.pgen.0020215 . PMC 1713257 . PMID 17194221.
↑ Kidd, Kenneth K.; Pakstis, Andrew J.; Speed, William C.; Grigorenko, Elena L.; Kajuna, Sylvester L.B.; Karoma, Nganyirwa J.; Kungulilo, Selemani; Kim, Jong-Jin; et al. (2006). "Developing a SNP panel for forensic identification of individuals". Forensic Science International. 164 (1): 20–32. doi:10.1016/j.forsciint.2005.11.017. PMID 16360294.
1 2 Marks, Jonathan (2002). What it means to be 98% chimpanzee. Berkeley: University of California Press. pp. 202–7. ISBN 978-0-520-22615-9.

External links

Morrison Institute
Fondation Jean Dausset-CEPH
ETC Group
National Research Council Archived 29 October 2004 at the Wayback Machine
A critical page about the HGDP from physical anthropologist Jonathan Marks
The Human Genome Controversy, Wired Magazine, Issue 5.07 (July 1997) via Internet Archive
Not to be confused with National Cancer Institute, Laboratory of Genomic Diversity

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Proposed Human Genome Diversity Project Still Plagued By Controversy And Questions". The Scientist Magazine®. Archived from the original on 26 March 2019. Retrieved 11 September 2019.

[2] Cann, HM; De Toma, C; Cazes, L; Legrand, MF; Morel, V; Piouffre, L; Bodmer, J; Bodmer, WF; et al. (2002). "A human genome diversity cell line panel". Science. 296 (5566): 261–2. doi:10.1126/science.296.5566.261b. PMID 11954565. S2CID 41595131.

[3] Li, J. Z.; Absher, D. M.; Tang, H.; Southwick, A. M.; Casto, A. M.; Ramachandran, S.; Cann, H. M.; Barsh, G. S.; et al. (2008). "Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation". Science. 319 (5866): 1100–4. Bibcode:2008Sci...319.1100L. doi:10.1126/science.1153717. PMID 18292342. S2CID 53541133.

[4] "HGDP FAQ". www.verslo.is. Retrieved 7 October 2017.^{[ dead link ]}

[5] Bergström, Anders; McCarthy, Shane A.; Hui, Ruoyun; Almarri, Mohamed A.; Ayub, Qasim; Danecek, Petr; Chen, Yuan; Felkel, Sabine; Hallast, Pille; Kamm, Jack; Blanché, Hélène (20 March 2020). "Insights into human genetic variation and population history from 929 diverse genomes". Science. 367 (6484) eaay5012. doi:10.1126/science.aay5012. ISSN 0036-8075. PMC 7115999 . PMID 32193295.

[6] Almarri, Mohamed A.; Bergström, Anders; Prado-Martinez, Javier; Yang, Fengtang; Fu, Beiyuan; Dunham, Alistair S.; Chen, Yuan; Hurles, Matthew E.; Tyler-Smith, Chris; Xue, Yali (9 July 2020). "Population Structure, Stratification, and Introgression of Human Structural Variation". Cell. 182 (1): 189–199.e15. doi: 10.1016/j.cell.2020.05.024 . ISSN 0092-8674. PMC 7369638 . PMID 32531199.

[pmid18292342-7] Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, et al. (2008). "Worldwide human relationships inferred from genome-wide patterns of variation". Science. 319 (5866): 1100–4. Bibcode:2008Sci...319.1100L. doi:10.1126/science.1153717. PMID 18292342. S2CID 53541133.

[8] Cavalli-Sforza, L. Luca (2005). "Opinion: The Human Genome Diversity Project: past, present and future". Nature Reviews Genetics. 6 (4): 333–40. doi:10.1038/nrg1596. PMID 15803201. S2CID 37122309.

[pmid19924308-9] López Herráez D, Bauchet M, Tang K, Theunert C, Pugach I, Li J, et al. (2009). "Genetic variation and recent positive selection in worldwide human populations: evidence from nearly 1 million SNPs". PLOS ONE. 4 (11) e7888. Bibcode:2009PLoSO...4.7888L. doi: 10.1371/journal.pone.0007888 . PMC 2775638 . PMID 19924308.

[10] Weiss, KM; Cavalli-Sforza, LL; Dunston, GM; Feldman, M; Greely, HT; Kidd, KK; King, M; Moore, JA; et al. (1997). "Proposed model ethical protocol for collecting DNA samples". Houston Law Review. 33 (5): 1431–74. PMID 12627556.

[11] Zhivotovsky, Lev A.; Rosenberg, Noah A.; Feldman, Marcus W. (2003). "Features of Evolution and Expansion of Modern Humans, Inferred from Genomewide Microsatellite Markers". The American Journal of Human Genetics. 72 (5): 1171–86. doi:10.1086/375120. PMC 1180270 . PMID 12690579.

[12] Rosenberg, Noah A.; Mahajan, Saurabh; Gonzalez-Quevedo, Catalina; Blum, Michael G. B.; Nino-Rosales, Laura; Ninis, Vasiliki; Das, Parimal; Hegde, Madhuri; et al. (2006). "Low Levels of Genetic Divergence across Geographically and Linguistically Diverse Populations from India". PLOS Genetics. 2 (12) e215. doi: 10.1371/journal.pgen.0020215 . PMC 1713257 . PMID 17194221.

[13] Kidd, Kenneth K.; Pakstis, Andrew J.; Speed, William C.; Grigorenko, Elena L.; Kajuna, Sylvester L.B.; Karoma, Nganyirwa J.; Kungulilo, Selemani; Kim, Jong-Jin; et al. (2006). "Developing a SNP panel for forensic identification of individuals". Forensic Science International. 164 (1): 20–32. doi:10.1016/j.forsciint.2005.11.017. PMID 16360294.

[marks-14] 1 2 Marks, Jonathan (2002). What it means to be 98% chimpanzee. Berkeley: University of California Press. pp. 202–7. ISBN 978-0-520-22615-9.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[10]

[11]

[12]

[13]

[14]

v t e Personal genomics
Data collection	Biobank Biological database
Field concepts	Biological specimen De-identification Human genetic variation Genetic linkage Single-nucleotide polymorphisms Identity by descent Genetic disorder
Applications	Personalized medicine Predictive medicine Genetic epidemiology Pharmacogenomics
Analysis techniques	Whole genome sequencing Genome-wide association study SNP array Genetic testing
Major projects	Human Genome Project International HapMap Project 1000 Genomes Project Human Genome Diversity Project