This article needs additional citations for verification .(February 2008) |
The Human Genome Diversity Project (HGDP) was started by Stanford University's Morrison Institute in 1990s along with collaboration of scientists around the world. [1] It is the result of many years of work by Luigi Cavalli-Sforza, one of the most cited scientists in the world, who has published extensively in the use of genetics to understand human migration and evolution. The HGDP data sets have often been cited in papers on such topics as population genetics, anthropology, and heritable disease research. [2] [3]
The project has noted the need to record the genetic profiles of indigenous populations, as isolated populations are the best way to understand the genetic frequencies that have clues into our distant past. Knowing about the relationship between such populations makes it possible to infer the journey of humankind from the humans who left Africa and populated the world to the humans of today. The HGDP-CEPH Human Genome Diversity Cell Line Panel is a resource of 1,063 cultured lymphoblastoid cell lines (LCLs) from 1,050 individuals in 52 world populations, banked at the Fondation Jean Dausset-CEPH in Paris.
The HGDP is not related to the Human Genome Project (HGP) and has attempted to maintain a distinct identity. [4] The whole genome sequencing and analysis of the HGDP was published in 2020, creating a comprehensive resource of genetic variation from underrepresented human populations and illuminating patterns of genetic variation, demographic history and introgression of modern humans with Neanderthals and Denisovans. [5] [6]
The HGDP includes the 51 populations from around the world. [7] A description of the populations that were studied can be found in a 2005 review paper by Cavalli-Sforza: [8]
Africa | Bantu, Biaka, Mandenka, Mbuti pygmy, Mozabite, San, and Yoruba | |
---|---|---|
Asia | Western Asia | Bedouin, Druze, and Jews |
Central & South Asia | Balochi, Brahui, Burusho, Hazara, Kalash, Makrani, Pashtun, Sindhi, and Uyghur | |
Eastern Asia | Khmer, Dai, Daur, Han (North China), Han (South China), Hezhen, Japanese, Lahu, Miao, Mongola, Naxi, Oroqen, She, Tu, Tujia, Xibo, Yakut, Yi | |
Native America | Colombian, Karitiana, Maya, Pima, Surui | |
Europe | Adygei, Basque, French, North Italian, Orcadian, Russian, Sardinian, and Tuscan | |
Oceania | Melanesian, and Papuan |
One of the most important tenets of the HGDP debate has been the social and ethical implications for indigenous populations, specifically the methods and ethics of informed consent. Some questions include:
These questions are specifically addressed by the HGDP's "Model Ethical Protocol for Collecting DNA Samples". [10]
The scientific community has used the HGDP data to study human migration, mutation rates, relationships between different populations, genes involved in height, and selective pressure. HGDP has been instrumental in assessing human diversity and in providing information about similarities and differences in human populations. The HGDP is the project with the largest scope among the various human diversity databases available.
So far 148 papers have been published using the HGDP database. Authors using HGDP data work in the US, Russia, Brazil, Ireland, Portugal, France, and other countries.
More specifically, HGDP data has been used in studies of evolution and expansion of modern human populations. [11]
Diversity research is relevant in various fields of study ranging from disease surveillance to anthropology. Genomewide-association studies (GWAS) try to associate a genetic mutation with a disease; it is becoming clear that these associations are population-dependent and that understanding human diversity will be a major step toward increasing the power to find genes associated with disease.
To gain a full assessment of human development, scientists must engage in diversity research. This research needs to be conducted as quickly as possible before small native populations such as those in South America become extinct.
Another benefit of genomic diversity mapping would be in disease research. Diversity research could help explain why certain ethnic populations are vulnerable to or resistant to certain diseases and how populations have adapted to vulnerabilities (see race in biomedicine).
The study of human populations has been at the forefront of genomic and clinical research since the Human Genome Project (HGP) was completed. Projects similar to HGDP are the 1000 Genomes Project and the HapMap Project. Each has its own specificities and each has been used by scientists to a large extent for overlapping purposes.
This section needs additional citations for verification .(August 2012) |
Denouncing the project since its outset, some indigenous communities, NGOs, and human rights organizations have objected to the HGDP's goals based on perceived issues of scientific racism, colonialism, biocolonialism, and informed consent.[ citation needed ]
The Action Group on Erosion, Technology and Concentration (ETC Group) has been a major critic of the HGDP, speculating that issues of racism and stigmatization could occur should the HGDP be completed.[ citation needed ] One major concern with the research project has been the potential, in certain countries, for racism resulting from use of HGDP data. Critics feel that when governments are armed with genetic data linked to certain racial groups, those governments might deny human rights based on this genetic data. For example, countries could define races purely in genetic terms and deny a certain person's right(s) based on lack of conformity to a certain race's genetic model.
Eight of nine DNA groups under Ctrl/South category belong to Pakistan even though India is in the same group with about seven times the population of Pakistan and with far more numerous racial diversities. However, it is noteworthy that Rosenberg et al. found that the sampled Pakistani populations are more genetically diverse than 15 Indian populations that were explicitly compared. [12]
Use of HGDP genetic materials for nonmedical purposes not agreed to by indigenous donors, especially purposes that create possibilities for human rights violations, has been a matter of concern.[ citation needed ] For example, Kidd et al. described the use of DNA samples from indigenous populations to explore a forensic identification capability based on ethnic origins. [13]
Anthropologist Jonathan M. Marks stated: "As any anthropologist knows, ethnic groups are categories of human invention, not given by nature." [14] Some indigenous peoples have refused to take part in the HGDP due to concerns about misuse of the data: "In December [1993], a World Council of Indigenous Peoples in Guatemala repudiated the HGDP." [14]
In 1995, the National Research Council (NRC) issued its recommendations on the HGDP. The NRC endorsed the concept of diversity research, also pointing out some concerns with the HGDP procedure. The NRC report suggested alternatives such as keeping sample sources anonymous (i.e., sampling genetic data without tying it to specific racial groups). While such approaches would eliminate the concerns discussed above (regarding racism, weapons development, etc.), they would also prevent researchers from achieving many of the benefits that were to be gained from the project.
Some members of the Human Genome Project (HGP) argued in favor of engaging in diversity research on data gleaned from the Human Genome Diversity Project, although most agreed that diversity research should be done by the HGP and not as a separate project.
A number of the principal collaborators in the HGDP have been involved in the privately funded Genographic Project launched in April 2005 with similar aims.
The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 24 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.
The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease and responses to drugs and environmental factors. The information produced by the project is made freely available for research.
Researchers have investigated the relationship between race and genetics as part of efforts to understand how biology may or may not contribute to human racial categorization. Today, the consensus among scientists is that race is a social construct, and that using it as a proxy for genetic differences among populations is misleading.
Haplogroup E-M96 is a human Y-chromosome DNA haplogroup. It is one of the two main branches of the older and ancestral haplogroup DE, the other main branch being haplogroup D. The E-M96 clade is divided into two main subclades: the more common E-P147, and the less common E-M75.
Haplogroup R, or R-M207, is a Y-chromosome DNA haplogroup. It is both numerous and widespread among modern populations.
The fixation index (FST) is a measure of population differentiation due to genetic structure. It is frequently estimated from genetic polymorphism data, such as single-nucleotide polymorphisms (SNP) or microsatellites. Developed as a special case of Wright's F-statistics, it is one of the most commonly used statistics in population genetics. Its values range from 0 to 1, with 0 being no differentiation and 1 being complete differentiation.
Human genetic variation is the genetic differences in and among populations. There may be multiple variants of any given gene in the human population (alleles), a situation called polymorphism.
The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a physical and a functional standpoint. It started in 1990 and was completed in 2003. It remains the world's largest collaborative biological project. Planning for the project began in 1984 by the US government, and it officially launched in 1990. It was declared complete on April 14, 2003, and included about 92% of the genome. Level "complete genome" was achieved in May 2021, with only 0.3% of the bases covered by potential issues. The final gapless assembly was finished in January 2022.
The genetic history of Europe includes information around the formation, ethnogenesis, and other DNA-specific information about populations indigenous, or living in Europe.
Guido Barbujani is an Italian population geneticist, evolutionary biologist and literary author born in Adria, who has worked with the State University of New York at Stony Brook (NY), University of Padua, and University of Bologna. He has taught at the University of Ferrara since 1996.
The various ethnolinguistic groups found in the Caucasus, Central Asia, Europe, the Middle East, North Africa and/or South Asia demonstrate differing rates of particular Y-DNA haplogroups.
The 1000 Genomes Project (1KGP), taken place from January 2008 to 2015, was an international research effort to establish the most detailed catalogue of human genetic variation at the time. Scientists planned to sequence the genomes of at least one thousand anonymous healthy participants from a number of different ethnic groups within the following three years, using advancements in newly developed technologies. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journal Nature. In 2012, the sequencing of 1092 genomes was announced in a Nature publication. In 2015, two papers in Nature reported results and the completion of the project and opportunities for future research.
Haplogroup E-P2, also known as E1b1, is a human Y-chromosome DNA haplogroup. E-P2 has two basal branches, E-V38 and E-M215. E-P2 had an ancient presence in East Africa and the Levant; presently, it is primarily distributed in Africa where it may have originated, and occurs at lower frequencies in the Middle East and Europe.
E-Z827, also known as E1b1b1b, is a major human Y-chromosome DNA haplogroup. It is the parent lineage to the E-Z830 and E-V257 subclades, and defines their common phylogeny. The former is predominantly found in the Middle East; the latter is most frequently observed in North Africa, with its E-M81 subclade observed among the ancient Guanche natives of the Canary Islands. E-Z827 is also found at lower frequencies in Europe, and in isolated parts of Southeast Africa.
Haplogroup Q-M120, also known as Q1a1a1, is a Y-DNA haplogroup. It is the only primary branch of haplogroup Q1a1a (F746/NWT01). The lineage is most common amongst modern populations in eastern Eurasia.
The genetic history of Egypt reflects its geographical location at the crossroads of several major biocultural areas: North Africa, the Sahara, the Middle East, the Mediterranean and sub-Saharan Africa.
Haplogroup A-L1085, also known as haplogroup A0-T is a human Y-DNA haplogroup. It is part of the paternal lineage of almost all humans alive today. The SNP L1085 has played two roles in population genetics: firstly, most Y-DNA haplogroups have diverged from it and; secondly, it defines the undiverged basal clade A-L1085*.
Genetic studies on Arabs refers to the analyses of the genetics of ethnic Arab people in the Middle East and North Africa. Arabs are genetically diverse as a result of their intermarriage and mixing with indigenous people of the pre-Islamic Middle East and North Africa following the Arab and Islamic expansion. Genetic ancestry components related to the Arabian Peninsula display an increasing frequency pattern from west to east over North Africa. A similar frequency pattern exist across northeastern Africa with decreasing genetic affinities to groups of the Arabian Peninsula along the Nile river valley across Sudan and the more they go south. This genetic cline of admixture is dated to the time of Arab migrations to the Maghreb and northeast Africa.
Haplogroup E-M2, also known as E1b1a1-M2, is a human Y-chromosome DNA haplogroup. E-M2 is primarily distributed within Africa followed by West Asia. More specifically, E-M2 is the predominant subclade in West Africa, Central Africa, Southern Africa, and the region of the African Great Lakes; it also occurs at moderate frequencies in North Africa, and the Middle East. E-M2 has several subclades, but many of these subhaplogroups are included in either E-L485 or E-U175. E-M2 is especially common among indigenous Africans who speak Niger-Congo languages, and was spread to Southern Africa and East Africa through the Bantu expansion.
Human genetic clustering refers to patterns of relative genetic similarity among human individuals and populations, as well as the wide range of scientific and statistical methods used to study this aspect of human genetic variation.