Ancestry-informative marker

Last updated
AIMS can be used to identify five European "clusters" Bauchet European clusters.png
AIMS can be used to identify five European "clusters"

In population genetics, an ancestry-informative marker (AIM) is a single-nucleotide polymorphism that exhibits substantially different frequencies between different populations. A set of many AIMs can be used to estimate the proportion of ancestry of an individual derived from each population.

Contents

A single-nucleotide polymorphism is a modification of a single nucleotide base within a DNA sequence. [1] There are an estimated 15 million SNP (Single-nucleotide polymorphism) sites (out of roughly 3 billion base pairs, or about 0.4%) from among which AIMs may potentially be selected. [2] The SNPs that relate to ancestry are often traced to the Y chromosome and mitochondrial DNA because both of these areas are inherited from one parent, eradicating complexities that come with parental gene recombination. [3] [ page needed ] SNP mutations are rare, so sequences with SNPs tend to be passed down through generations rather than altered each generation. However, because any given SNP is relatively common in a population, analysts must examine groups of SNPs (otherwise known as AIMS) to determine someone's ancestry. Using statistical methods such as apparent error rate and Improved Bayesian Estimate, the set of SNPs with the highest accuracy for predicting a specific ancestry can be found. [4]

Examining a suite of these markers more or less evenly spaced across the genome is also a cost-effective way to discover novel genes underlying complex diseases in a technique called admixture mapping or mapping by admixture linkage disequilibrium.

As one example, the Duffy Null allele (FY*0) has a frequency of almost 100% of Sub-Saharan Africans, but occurs very infrequently in populations outside of this region. A person having this allele is thus more likely to have Sub-Saharan African ancestors. North and South Han Chinese ancestry can be distinguished unambiguously using a set of 140 AIMS. [5]

Collections of AIMs have been developed that can estimate the geographical origins of ancestors from within Europe. [6]

Following the development of ancient DNA databases, ancient ancestry-informative marker (aAIM) were similarly defined as a single-nucleotide polymorphism that exhibits substantially different frequencies between different ancient populations. A set of aAIMs can be used to identify the ancestry of ancient populations and eventually quantify the genetic similarity to modern-day individuals. [7]

Discovery and development

The discovery of ancestry-informative markers was made possible by the development of next generation sequencing, or NGS. NGS enables the study of genetic markers by isolating specific gene sequences. [8] One such method for sequence extraction is the use restriction enzymes, specifically endonuclease, which modifies the DNA sequence. This enzyme can be used with DNA ligase (connecting two different DNA), modifying DNA by inserting DNA from other organism. [9] Another method, cDNA sequencing, or RNA-seq, can also help to acquire information of the transcriptomes in a broad range of organisms and find SNPs (single nucleotide polymorphisms), within a DNA sequence.

Applications

Ancestry informative markers have a number of applications in genetic research, forensics, and private industry. AIMs that indicate a predisposition for diseases such as type 2 diabetes mellitus and renal disease have been shown to reduce the effects of genetic admixture in ancestral mapping when using admixture mapping software. [10] The differential ability of ancestry-informative markers allows scientists and researchers to narrow geographical populations of concern; for example, illegal organ trafficking can be traced to certain areas by comparing the samples taken from organ recipients and deciphering the foreign marker in their body. [11] An array of private companies, such as 23andMe and AncestryDNA, provide cost-effective direct-to-consumers (DTC) genetic testing by analyzing ancestry informative markers to determine geographic origins. These private companies collect massive quantities of data such as biological samples and self-reported information from consumers, a practice known as biobank ing, enabling their researchers to discover more insights on AIMs. [12]

Though AIM panels can be useful for disease screening, the Genetic Information Nondiscrimination Act (GINA) prevents the use of genetic information for insurance and workplace discrimination. [13]

Medical research

Different ancestral traits and their affiliation to diseases can help scientists determine appropriate approaches of treatment for a specific population. [14] Medical researchers have revealed the link between ancestry traits and some common diseases; for example, individuals of African descent have been found to be at higher risk of asthma than those of European ancestry. [15]

AIM panels can be used for detecting disease risk factors. One such panel was created for African American ancestry based on subsets of commercially available SNP arrays. These types of arrays can help reduce the cost of identifying risk factors, since they allow researchers to screen for ancestry markers instead of the entire genome. This is due to the fact that these SNP arrays narrow the scope of the necessary screening from hundreds of thousands of SNP markers to a panel of a few thousands of AIMs. [16]

While some believe that structured populations should be used in studies to better ascertain genetic associations to diseases, the social implications of the potential racial stigma that may result from such studies is a major concern. However, the study done by Yang et al. (2005) suggests that the technology to conduct deeper research into and identify ancestry-associated variations in human disease does already exist. [14]

See also

Related Research Articles

A microsatellite is a tract of repetitive DNA in which certain DNA motifs are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. They have a higher mutation rate than other areas of DNA leading to high genetic diversity. Microsatellites are often referred to as short tandem repeats (STRs) by forensic geneticists and in genetic genealogy, or as simple sequence repeats (SSRs) by plant geneticists.

<span class="mw-page-title-main">Single-nucleotide polymorphism</span> Single nucleotide in genomic DNA at which different sequence alternatives exist

In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population, many publications do not apply such a frequency threshold.

<span class="mw-page-title-main">Haplotype</span> Group of genes from one parent

A haplotype is a group of alleles in an organism that are inherited together from a single parent.

The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease and responses to drugs and environmental factors. The information produced by the project is made freely available for research.

Researchers have investigated the relationship between race and genetics as part of efforts to understand how biology may or may not contribute to human racial categorization. Today, the consensus among scientists is that race is a social construct, and that using it as a proxy for genetic differences among populations is misleading.

A genetic marker is a gene or DNA sequence with a known location on a chromosome that can be used to identify individuals or species. It can be described as a variation that can be observed. A genetic marker may be a short DNA sequence, such as a sequence surrounding a single base-pair change, or a long one, like minisatellites.

Genetic association is when one or more genotypes within a population co-occur with a phenotypic trait more often than would be expected by chance occurrence.

<span class="mw-page-title-main">Human genetic variation</span> Genetic diversity in human populations

Human genetic variation is the genetic differences in and among populations. There may be multiple variants of any given gene in the human population (alleles), a situation called polymorphism.

In molecular biology, SNP array is a type of DNA microarray which is used to detect polymorphisms within a population. A single nucleotide polymorphism (SNP), a variation at a single site in DNA, is the most frequent type of variation in the genome. Around 335 million SNPs have been identified in the human genome, 15 million of which are present at frequencies of 1% or higher across different populations worldwide.

A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium that represents a group of SNPs called a haplotype. It is possible to identify genetic variation and association to phenotypes without genotyping every SNP in a chromosomal region. This reduces the expense and time of mapping genome areas associated with disease, since it eliminates the need to study every individual SNP. Tag SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs across the entire genome are genotyped.

In genetic genealogy, a unique-event polymorphism (UEP) is a genetic marker that corresponds to a mutation that is likely to occur so infrequently that it is believed overwhelmingly probable that all the individuals who share the marker, worldwide, will have inherited it from the same common ancestor, and the same single mutation event.

<span class="mw-page-title-main">DNAPrint Genomics</span>

DNAPrint Genomics was a genetics company with a wide range of products related to genetic profiling. They were the first company to introduce forensic and consumer genomics products, which were developed immediately upon the publication of the first complete draft of the human genome in the early 2000s. They researched, developed, and marketed the first ever consumer genomics product, based on "Ancestry Informative Markers" which they used to correctly identify the BioGeographical Ancestry (BGA) of a human based on a sample of their DNA. They also researched, developed and marketed the first ever forensic genomics product - DNAWITNESS - which was used to create a physical profile of donors of crime scene DNA. The company reached a peak of roughly $3M/year revenues but ceased operations in February 2009.

dbSNP Genetics database

The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI). Although the name of the database implies a collection of one class of polymorphisms only, it in fact contains a range of molecular variation: (1) SNPs, (2) short deletion and insertion polymorphisms (indels/DIPs), (3) microsatellite markers or short tandem repeats (STRs), (4) multinucleotide polymorphisms (MNPs), (5) heterozygous sequences, and (6) named variants. The dbSNP accepts apparently neutral polymorphisms, polymorphisms corresponding to known phenotypes, and regions of no variation. It was created in September 1998 to supplement GenBank, NCBI’s collection of publicly available nucleic acid and protein sequences.

Genetic admixture occurs when previously isolated populations interbreed resulting in a population that is descended from multiple sources. It can occur between species, such as with hybrids, or within species, such as when geographically distant individuals migrate to new regions. It results in gene pool that is a mix of the source populations.

A Y-SNP is a single-nucleotide polymorphism on the Y chromosome. Y-SNPs are often used in paternal genealogical DNA testing.

<span class="mw-page-title-main">Restriction site associated DNA markers</span> Type of genetic marker

Restriction site associated DNA (RAD) markers are a type of genetic marker which are useful for association mapping, QTL-mapping, population genetics, ecological genetics and evolutionary genetics. The use of RAD markers for genetic mapping is often called RAD mapping. An important aspect of RAD markers and mapping is the process of isolating RAD tags, which are the DNA sequences that immediately flank each instance of a particular restriction site of a restriction enzyme throughout the genome. Once RAD tags have been isolated, they can be used to identify and genotype DNA sequence polymorphisms mainly in form of single nucleotide polymorphisms (SNPs). Polymorphisms that are identified and genotyped by isolating and analyzing RAD tags are referred to as RAD markers. Although genotyping by sequencing presents an approach similar to the RAD-seq method, they differ in some substantial ways.

<span class="mw-page-title-main">Gene polymorphism</span> Occurrence in an interbreeding population of two or more discontinuous genotypes

A gene is said to be polymorphic if more than one allele occupies that gene's locus within a population. In addition to having more than one allele at a specific locus, each allele must also occur in the population at a rate of at least 1% to generally be considered polymorphic.

<span class="mw-page-title-main">DNA phenotyping</span> DNA profiling technique

DNA phenotyping is the process of predicting an organism's phenotype using only genetic information collected from genotyping or DNA sequencing. This term, also known as molecular photofitting, is primarily used to refer to the prediction of a person's physical appearance and/or biogeographic ancestry for forensic purposes.

<span class="mw-page-title-main">Polygenic score</span> Numerical score aimed at predicting a trait based on variation in multiple genetic loci

In genetics, a polygenic score (PGS) is a number that summarizes the estimated effect of many genetic variants on an individual's phenotype. The PGS is also called the polygenic index (PGI) or genome-wide score; in the context of disease risk, it is called a polygenic risk score or genetic risk score. The score reflects an individual's estimated genetic predisposition for a given trait and can be used as a predictor for that trait. It gives an estimate of how likely an individual is to have a given trait based only on genetics, without taking environmental factors into account; and it is typically calculated as a weighted sum of trait-associated alleles.

Human genetic clustering refers to patterns of relative genetic similarity among human individuals and populations, as well as the wide range of scientific and statistical methods used to study this aspect of human genetic variation.

References

  1. "Polymorphism (genetics)". AccessScience. doi:10.1036/1097-8542.535500.
  2. Pennisi, Elizabeth (2007). "Human Genetic Variation". Science. 318 (5858): 1842–1843. doi: 10.1126/science.318.5858.1842 . PMID   18096770.
  3. Houck, Max M (2015). Forensic biology. Oxford, England ; San Diego, California : Academic Press. ISBN   9780128007112.
  4. Sampson, Joshua N.; Kidd, Kenneth K.; Kidd, Judith R.; Zhao, Hongyu (2011-06-14). "Selecting SNPs to Identify Ancestry". Annals of Human Genetics. 75 (4): 539–553. doi:10.1111/j.1469-1809.2011.00656.x. ISSN   0003-4800. PMC   3141729 . PMID   21668909.
  5. Qu, Hui-Qi; Li, Quan; Xu, Shuhua; McCormick, Joseph B.; Fisher-Hoch, Susan P.; Xiong, Momiao; Qian, Ji; Jin, Li (2012). "Ancestry Informative Marker Set for Han Chinese Population". G3: Genes, Genomes, Genetics. 2 (3): 339–341. doi:10.1534/g3.112.001941. PMC   3291503 . PMID   22413087.
  6. Bauchet, Marc; McEvoy, Brian; Pearson, Laurel N.; Quillen, Ellen E.; Sarkisian, Tamara; Hovhannesyan, Kristine; Deka, Ranjan; Bradley, Daniel G.; Shriver, Mark D. (2007). "Measuring European Population Stratification with Microarray Genotype Data". The American Journal of Human Genetics. 80 (5): 948–956. doi:10.1086/513477. PMC   1852743 . PMID   17436249.
  7. Elhaik, Eran; Pirooznia, Mehdi; Syed, Syakir; Das, Ranajit; Esposito, Umberto (2018-12-12). "Ancient Ancestry Informative Markers for Identifying Fine-Scale Ancient Population Structure in Eurasians". Genes. 9 (12): 625. doi: 10.3390/genes9120625 . PMC   6316245 . PMID   30545160.
  8. Davey, John W.; Hohenlohe, Paul A.; Etter, Paul D.; Boone, Jason Q.; Catchen, Julian M.; Blaxter, Mark L. (July 2011). "Genome-wide genetic marker discovery and genotyping using next-generation sequencing". Nature Reviews Genetics. 12 (7): 499–510. doi:10.1038/nrg3012. ISSN   1471-0056. PMID   21681211. S2CID   15080731.
  9. Loenen, Wil A. M.; Dryden, David T. F.; Raleigh, Elisabeth A.; Wilson, Geoffrey G.; Murray, Noreen E. (2013-10-18). "Highlights of the DNA cutters: a short history of the restriction enzymes". Nucleic Acids Research. 42 (1): 3–19. doi:10.1093/nar/gkt990. ISSN   1362-4962. PMC   3874209 . PMID   24141096.
  10. Keene, Keith L.; Mychaleckyj, Josyf C.; Leak, Tennille S.; Smith, Shelly G.; Perlegas, Peter S.; Divers, Jasmin; Langefeld, Carl D.; Freedman, Barry I.; Bowden, Donald W. (2008-07-25). "Exploration of the utility of ancestry informative markers for genetic association studies of African Americans with type 2 diabetes and end stage renal disease". Human Genetics. 124 (2): 147–154. doi:10.1007/s00439-008-0532-6. ISSN   0340-6717. PMC   2786006 . PMID   18654799.
  11. Severini, S.; Carnevali, E.; Margiotta, G.; Garcia-González, M.A.; Carracedo, Á. (2015-12-01). "Use of ancestry-informative markers as a scientific tool to combat the illegal traffic in human kidneys". Forensic Science International: Genetics Supplement Series. 5: e302–e304. doi:10.1016/j.fsigss.2015.09.120. ISSN   1875-1768.
  12. Stoeklé, Henri-Corto; Mamzer-Bruneel, Marie-France; Vogt, Guillaume; Hervé, Christian (2016-03-31). "23andMe: a new two-sided data-banking market model". BMC Medical Ethics. 17 (1): 19. doi: 10.1186/s12910-016-0101-9 . ISSN   1472-6939. PMC   4826522 . PMID   27059184.
  13. Slaughter (April 25, 2007). "Statement of Administration Policy: Genetic Information Nondiscrimination Act (2007)" (PDF).
  14. 1 2 Yang, Nan; Li, Hongzhe; Criswell, Lindsey A.; Gregersen, Peter K.; Alarcon-Riquelme, Marta E.; Kittles, Rick; Shigeta, Russell; Silva, Gabriel; Patel, Pragna I. (2005-09-29). "Examination of ancestry and ethnic affiliation using highly informative diallelic DNA markers: application to diverse and admixed populations and implications for clinical epidemiology and forensic medicine". Human Genetics. 118 (3–4): 382–392. doi:10.1007/s00439-005-0012-1. ISSN   0340-6717. PMID   16193326. S2CID   20152083.
  15. Vergara, Candelaria; Caraballo, Luis; Mercado, Dilia; Jimenez, Silvia; Rojas, Winston; Rafaels, Nicholas; Hand, Tracey; Campbell, Monica; Tsai, Yuhjung J. (2009-03-17). "African ancestry is associated with risk of asthma and high total serum IgE in a population from the Caribbean Coast of Colombia". Human Genetics. 125 (5–6): 565–579. doi:10.1007/s00439-009-0649-2. ISSN   0340-6717. PMID   19290544. S2CID   21141741.
  16. Tandon, Arti; Patterson, Nick; Reich, David (2010-12-22). "Ancestry informative marker panels for African Americans based on subsets of commercially available SNP arrays". Genetic Epidemiology. 35 (1): 80–83. doi:10.1002/gepi.20550. ISSN   0741-0395. PMC   4386999 . PMID   21181899.
General