Population genomics

Last updated

Population genomics is the large-scale comparison of DNA sequences of populations. Population genomics is a neologism that is associated with population genetics. Population genomics studies genome-wide effects to improve our understanding of microevolution so that we may learn the phylogenetic history and demography of a population. [1]

Contents

History

Population genomics has been of interest to scientists since Darwin. Some of the first methods used for studying genetic variability at multiple loci included gel electrophoresis and restriction enzyme mapping. [2] Previously genomics was restricted to only the study of a low amount of loci. However recent advancements in sequencing and computer storage and power have allowed for the study of hundreds of thousands of loci from populations. [3] Analysis of this data requires identification of non-neutral or outlier loci that indicate selection in that region of the genome. This will allow the researcher to remove these loci to study genome wide effects or to focus on these loci if they are of interest.

Research applications

In the study of Schizosaccharomyces pombe (more commonly known as fission yeast), a popular model organism, population genomics has been used to understand the reason for the phenotypic variation within a species. However, since the genetic variation within this species was previously poorly understood due to technological restrictions, population genomics allows us to learn about the species' genetic differences. [4] In the human population, population genomics has been used to study the genetic change since humans began to migrate away from Africa approximately 50,000-100,000 years ago. It has been shown that not only were genes related to fertility and reproduction highly selected for, but also that the further humans moved away from Africa, the greater the presence of lactase. [5]

A 2007 study done by Begun et al. compared the whole genome sequence of multiple lines of Drosophila simulans to the assembly of D. melanogaster and D. yakuba . This was done by aligning DNA from whole genome shotgun sequences of D. simulans to a standard reference sequence before carrying out whole genome analysis of polymorphism and divergence. This revealed a large number of proteins that had experienced directional selection. They discovered previously unknown, large scale fluctuations in both polymorphism and divergence along chromosome arms. They found that the X chromosome had faster divergence and significantly less polymorphism than previously expected. They also found regions of the genome (e.g. UTRs) that signaled adaptive evolution. [6]

In 2014 Jacquot et al. studied the diversification and epidemiology of endemic bacterial pathogens by using the Borrelia burgdorferi species complex (the bacteria responsible for Lyme disease) as a model. They also wished to compare the genetic structure between B. burgdorferi and the closely related species B. garinii and B. afzelii . They began by sequencing samples from a culture and then mapping the raw read onto reference sequences. SNP based and phylogenetic analyses were used on both intraspecific and interspecific levels. When looking at the degree of genetic isolation, they found that intraspecific recombination rate was ~50 times higher than the interspecific rate. They also found that by using most of the genome conspecific strains didn't cluster in clades, raising questions about previous strategies used when investigating pathogen epidemiology. [7]

Moore et al conducted a study in 2014 in which a group of Atlantic Salmon populations which were previously analyzed with traditional population genetic analyses (microsatellites, SNP-array genotyping, BayeScan (which uses the Dirichlet-multinomial distribution)) to place them into defined conservational units. This genomic assessment mostly agreed with previous results, but did identify more differences between regionally and genetically discrete groups, suggesting there were potentially even greater number of conservation units of salmon in those regions. These results verified the usefulness of genome-wide analysis in order to improve the accuracy of future designation of conservation units. [8]

In highly migratory marine species, traditional population genetic analyses often fail to identify population structure. In tunas, traditional markers such as short-range PCR products, microsatellites and SNP-arrays have struggled to distinguish fish stocks from separate ocean basins. However, population genomic research using RAD sequencing in yellowfin tuna [9] [10] and albacore [11] [12] has been able to distinguish populations from different ocean basins and reveal fine-scale population structure. These studies identify putatively adaptive loci that reveal strong population structure, even though these sites represent a relatively small proportion of the overall DNA sequence data. In contrast, the majority of sequenced loci that are presumed to be selectively neutral do not reveal patterns of population differentiation, matching results for traditional DNA markers. [9] [10] [11] [12] The same pattern of putatively adaptive loci and RAD sequencing revealing population structure, compared to limited insight provided by traditional DNA markers is also observed for other marine fishes, including striped marlin [13] and lingcod. [14]

Ethical implications

In the United States, federal regulations that govern human subject research stem from three ethical principles that were identified in the Belmont Report: respect for persons, beneficence and justice.

Acknowledgement of personal autonomy

In research that involves human subjects, the proper exercise of autonomy demands that research participants agree to enter into research voluntarily and with adequate information.

protection of individuals

Beneficence entails two requirements: maximize the possible benefits, while minimizing the possible harms. Justice looks at how to fairly distribute the benefits and burdens of research. These protections are especially important for populations designated as vulnerable, usually defined as those at heightened risk of harm, exploitation, and limited capacity for consent and/or autonomy. [15]

The modern definition of the term hinges primarily on three constructs: i) study information, ii) participants’ comprehension and understanding, and iii) voluntariness. First, the pillar of ‘information’ states that it is vital to disclose all information about a study to the participants. Further, all risks need to be divulged, regardless of the effect they may have on the participant's willingness to participate in the study. The second construct, ‘comprehension’, evaluates the mental capacity of participants and their ability to fully understand and process the information communicated to them by the researchers (including the risks that could arise from sharing their personal data with research institutions and the benefits to the society that can result from their participation ). It has also been suggested that comprehension and information are related, as comprehension measures how well an individual is able to grasp the information that is provided to them in the first pillar. Third, ‘voluntariness’ emphasizes the importance of a participants' consent to be voluntary. In this respect, voluntariness not only includes the act of joining a research study, but also the act of withdrawing from it (dynamique consent). [16]

individualism and communalisme

individuals may implicitly view their participation as a communal exercise rather than a personal decision, a stance that runs counter to the individualistic approach favored by modern research ethics guidelines for informed consent. Alternatively, inter-communal and intra-communal tensions can be aggravated if the prospect of a monetary or other economic reward is attached to participation, even at relatively low levels of compensation. [17]

consent's paradox

The main argument against informed consent (particularly from the perspective of a big data setting) is the excessive load passed onto participants. Some argue that it is asking participants more than they are able to deliver, as they have the responsibility to decide on complex issues they lack the time, or capacity, to fully comprehend and assess. [18]

Equality

Distributive justice suggests that varied societal groups should have equal access and chance of benefit from interventions. However, it is well recognised that screening uptake is poorer in minority and lower socioeconomic groups.

Economic evaluation

Is essential in a public health programme, and in the current restrictive financial climate, the extensive investment required for population genetic screening would need to be well justified. For example breast cancer, BRCA screening in the context of high-risk populations has already been shown to be cost-effective in the USA and UK [10, 55], and recent evidence has shown screening for hereditary breast and ovarian cancer genes in an unselected population to be cost-effective over testing based on clinical or family history criteria. [19]

Report back

For over two decades, researchers have faced the ethical dilemma of what information should be communicated back to study participants—an issue often referred to as “return” (in genomics research) of results. In recent years, increased evidence of potential benefits, minimal harms, and participant desire to receive results have led to prominent bodies supporting the practice of report back, such as the National Academies of Sciences, Engineering, and Medicine (NASEM) report in 2018. NASEM recommends that investigators testing human biospecimens routinely consider whether and how to report back individual results and include plans in their protocols. These recommendations are rooted in principles of reciprocity, respect, and transparency. However, many challenges remain regarding the scope of what to report back to individuals and communities and how best to communicate findings to maximize the benefits and minimize harms. [20]

Respectful handling of biological samples

Proper consultation with the groups involved in the study can ensure that procedures related to the collection, storage, use, destruction, and repatriation of biological samples are conducted in accordance with the local cultural values and expectations. As samples obtained from the communities may hold important cultural and spiritual meanings, the same respectful treatment conferred to communities and participants should be extended to the management of the biological samples. All members of the research team should acknowledge the significance and value of the samples to their donors and communities. [21]

Privacy

Many companies use consumer data for ancillary objectives and share and/or sell information to third parties (Daviet et al., 2022). Corporate ownership of personal genomic information is worrisome because companies use data at their own discretion, completely out of the control of consumers. One of the largest DTCGT companies, 23andMe, has confirmed that it resells users’ genomic data to clinical research and pharmaceutical establishments. This has implications for autonomy because individuals are unable to make decisions about the use of their own genomic information once testing is initiated.

Individuals’ genomic data potentially allow them to draw conclusions about their relatives. When an individual submits a personal sample, they have indirectly submitted the genomic data of their entire family without their consent. Unlike direct-to-consumer genomic ancestry testing, clinical testing does not provide results that link relatives. Privacy may be further compromised with genomic testing because of cybersecurity vulnerability or requests for data access from law enforcement during criminal searches. If individuals cannot control the access and use of their own genomic data, patient autonomy is nonexistent.

Genetic Information Nondiscrimination Act

The Genetic Information Nondiscrimination Act was passed to protect individuals against discrimination associated with their genomic information. This legislation prevents health insurers and employers from discriminating based on family history of disease and/or genomic test results. This has implications in the context of nonmaleficence. Individuals may assume that the benefits of GT will outweigh the risks and that information gained will benefit them and their future medical management. However, positive test findings can have significant implications on future insurability for life, long-term care, or disability insurance. If these risks are not communicated fairly and equitably during pretest counseling or informed consent processes, then the ethical principle of justice is in jeopardy as well.

Psychological Impact

For some individuals, genomic testing results can be upsetting. In a survey of 23,196 consumers who had undergone GT, 61% of respondents reported findings that relayed new healthcare information about themselves or a relative. This included distressing news that a parent was not their biologic parent, or they had a sibling of whom they were unaware. Consumers who learned they were conceived via donor (sperm, egg, or embryo) reported that they regretted their decision to pursue GT. Patients are at risk for uncovering life-altering information [22] , including situations of rape, abandonment, or other family confidences. [23] Ethically, pretest counseling offers opportunities to weigh concerns about beneficence and nonmaleficence and places patients in a position of autonomy where, once educated and informed, they can decide about testing at their discretion.

Patients are more likely to undergo GT if they feel their needs are unmet or if their family history and potential health risks are ignored. With beneficence in mind, healthcare systems can develop evidence-based policies and educational offerings to better prepare nurses for these situations, which can address future liability concerns.

Identifying the clinical utility of a test or a recommendation for prevention is already a challenging task because healthcare professionals work to link genomic variants to associated disease risks. False-positive results may lead to unnecessary invasive intervention, and false-negative results could lead to the absence of intervention for patients who would greatly benefit. The misinterpretation of results represents a liability concern.

A study demonstrated that multiple variants confirmed by sequencing had been incorrectly classified as pathogenic (Tandy-Connor et al., 2018). [24]

Mathematical models

Understanding and analyzing the vast data that comes from population genomics studies requires various mathematical models. One method of analyzing this vast data is through QTL mapping. QTL mapping has been used to help find the genes that are responsible for adaptive phenotypes. [25] To quantify the genetic diversity within a population a value known as the fixation index, or FST is used. When used with Tajima's D, FST has been used to show how selection acts upon a population. [26] The McDonald–Kreitman test (or MK test) is also favored when looking for selection because it is not as sensitive to changes in a species' demography that would throw off other selection tests. [27]

Future developments

Most developments within population genomics have to do with increases in the sequencing technology. For example, restriction-site associated DNA sequencing, or RADSeq is a relatively new technology that sequences at a lower complexity and delivers higher resolution at a reasonable cost. [28] High-throughput sequencing technologies are also a rapidly growing field that allows for more information to be gathered on genomic divergence during speciation. [29] High-throughput sequencing is also very useful for SNP detection, which plays a key role in personalized medicine. [30] Another relatively new approach is reduced-representation library (RRL) sequencing which discovers and genotypes SNPs and also doesn't require reference genomes. [31]

See also

Notes

  1. Luikart, G.; England, P. R.; Tallmon, D.; Jordan S.; Taberlet P. (2003). "The Power and Promise of Population Genomics: From Genotyping to Genome Typing". Nature Reviews (4): 981-994
  2. Charlesworth, B. (2011). "Molecular population genomics: A short history" (PDF). Genetics Research. 92 (5–6): 397–411. doi: 10.1017/S0016672310000522 . PMID   21429271.
  3. Schilling, M. P.; Wolf, P. G.; Duffy, A. M.; Rai, H. S.; Rowe, C. A.; Richardson, B. A.; Mock, K. E. (2014). "Genotyping-by-Sequencing for Populus Population Genomics: An Assessment of Genome Sampling Patterns and Filtering Approaches". PLOS ONE. 9 (4) e95292. Bibcode:2014PLoSO...995292S. doi: 10.1371/journal.pone.0095292 . PMC   3991623 . PMID   24748384.
  4. Fawcett, J. A.; Iida, T.; Takuno, S.; Sugino, R. P.; Kado, T.; Kugou, K.; Mura, S.; Kobayashi, T.; Ohta, K.; Nakayama, J. I.; Innan, H. (2014). "Population Genomics of the Fission Yeast Schizosaccharomyces pombe". PLOS ONE. 9 (8) e104241. Bibcode:2014PLoSO...9j4241F. doi: 10.1371/journal.pone.0104241 . PMC   4128662 . PMID   25111393.
  5. Lachance, J.; Tishkoff, S. A. (2013). "Population Genomics of Human Adaptation". Annual Review of Ecology, Evolution, and Systematics. 44: 123–143. doi:10.1146/annurev-ecolsys-110512-135833. PMC   4221232 . PMID   25383060.
  6. Begun, D. J.; Holloway, A. K.; Stevens, K.; Hillier, L. W.; Poh, Y. P.; Hahn, M. W.; Nista, P. M.; Jones, C. D.; Kern, A. D.; Dewey, C. N.; Pachter, L.; Myers, E.; Langley, C. H. (2007). "Population Genomics: Whole-Genome Analysis of Polymorphism and Divergence in Drosophila simulans". PLOS Biology. 5 (11) e310. doi: 10.1371/journal.pbio.0050310 . PMC   2062478 . PMID   17988176.
  7. Jacquot, M.; Gonnet, M.; Ferquel, E.; Abrial, D.; Claude, A.; Gasqui, P.; Choumet, V. R.; Charras-Garrido, M.; Garnier, M.; Faure, B.; Sertour, N.; Dorr, N.; De Goër, J.; Vourc'h, G. L.; Bailly, X. (2014). "Comparative Population Genomics of the Borrelia burgdorferi Species Complex Reveals High Degree of Genetic Isolation among Species and Underscores Benefits and Constraints to Studying Intra-Specific Epidemiological Processes". PLOS ONE. 9 (4) e94384. Bibcode:2014PLoSO...994384J. doi: 10.1371/journal.pone.0094384 . PMC   3993988 . PMID   24721934.
  8. Moore, Jean-Sébastien; Bourret, Vincent; Dionne, Mélanie; Bradbury, Ian; O'Reilly, Patrick; Kent, Matthew; Chaput, Gérald; Bernatchez, Louis (December 2014). "Conservation genomics of anadromous Atlantic salmon across its North American range: outlier loci identify the same patterns of population structure as neutral loci". Molecular Ecology. 23 (23): 5680–5697. Bibcode:2014MolEc..23.5680M. doi:10.1111/mec.12972. PMID   25327895. S2CID   12251497.
  9. 1 2 Grewe, P.M.; Feutry, P.; Hill, P.L.; Gunasekera, R.M.; Schaefer, K.M.; Itano, D.G.; Fuller, D.W.; Foster, S.D.; Davies, C.R. (2015). "Evidence of discrete yellowfin tuna (Thunnus albacares) populations demands rethink of management for this globally important resource". Scientific Reports. 5 16916. Bibcode:2015NatSR...516916G. doi: 10.1038/srep16916 . PMC   4655351 . PMID   26593698.
  10. 1 2 Pecoraro, Carlo; Babbucci, Massimiliano; Franch, Rafaella; Rico, Ciro; Papetti, Chiara; Chassot, Emmanuel; Bodin, Nathalie; Cariani, Alessia; Bargelloni, Luca; Tinti, Fausto (2018). "The population genomics of yellowfin tuna (Thunnus albacares) at global geographic scale challenges current stock delineation". Scientific Reports. 8 (1): 13890. Bibcode:2018NatSR...813890P. doi: 10.1038/s41598-018-32331-3 . PMC   6141456 . PMID   30224658.
  11. 1 2 Anderson, Giulia; Hampton, John; Smith, Neville; Rico, Ciro (2019). "Indications of strong adaptive population genetic structure in albacore tuna (Thunnus alalunga) in the southwest and central Pacific Ocean". Ecology and Evolution. 9 (18): 10354–10364. Bibcode:2019EcoEv...910354A. doi: 10.1002/ece3.5554 . PMC   6787800 . PMID   31624554.
  12. 1 2 Vaux, Felix; Bohn, Sandra; Hyde, John R.; O'Malley, Kathleen G. (2021). "Adaptive markers distinguish North and South Pacific Albacore amid low population differentiation". Evolutionary Applications. 14 (5): 1343–1364. Bibcode:2021EvApp..14.1343V. doi: 10.1111/eva.13202 . ISSN   1752-4571. PMC   8127716 . PMID   34025772.
  13. Mamoozadeh, Nadya R.; Graves, John E.; McDowell, Jan R. (2020). "Genome-wide SNPs resolve spatiotemporal patterns of connectivity within striped marlin (Kajikia audax), a broadly distributed and highly migratory pelagic species". Evolutionary Applications. 13 (4): 677–698. Bibcode:2020EvApp..13..677M. doi: 10.1111/eva.12892 . PMC   7086058 . PMID   32211060.
  14. Longo, Gary C.; Lam, Laurel; Basnett, Bonnie; Samhouri, Jameal; Hamilton, Scott; Andrews, Kelly; Williams, Greg; Goetz, Giles; McClure, Michelle; Nichols, Krista M. (2020). "Strong population differentiation in lingcod (Ophiodon elongatus) is driven by a small portion of the genome". Evolutionary Applications. 13 (10): 2536–2554. Bibcode:2020EvApp..13.2536L. doi: 10.1111/eva.13037 . PMC   7691466 . PMID   33294007.
  15. Bankoff, Richard J; Perry, George H (December 2016). "Hunter–gatherer genomics: evolutionary insights and ethical considerations". Current Opinion in Genetics & Development. 41: 1–7. doi:10.1016/j.gde.2016.06.015. PMC   5360101 . PMID   27400119.
  16. Rothstein, Mark A.; Epps, Phyllis Griffin (March 2001). "Ethical and legal implications of pharmacogenomics". Nature Reviews Genetics. 2 (3): 228–231. doi:10.1038/35056075. ISSN   1471-0056. PMID   11256075 via medline.
  17. Bankoff, Richard J; Perry, George H (December 2016). "Hunter–gatherer genomics: evolutionary insights and ethical considerations". Current Opinion in Genetics & Development. 41: 1–7. doi:10.1016/j.gde.2016.06.015. PMC   5360101 . PMID   27400119.
  18. Dankar, Fida K.; Gergely, Marton; Malin, Bradley; Badji, Radja; Dankar, Samar K.; Shuaib, Khaled (2020). "Dynamic-informed consent: A potential solution for ethical dilemmas in population sequencing initiatives". Computational and Structural Biotechnology Journal. 18: 913–921. doi:10.1016/j.csbj.2020.03.027. PMID   32346464.
  19. The International HapMap Consortium; Foster, Morris W. (2004-06-01). "Integrating ethics and science in the International HapMap Project". Nature Reviews Genetics. 5 (6): 467–475. doi:10.1038/nrg1351. ISSN   1471-0056. PMC   2271136 . PMID   15153999.
  20. Calluori, Stephanie; Heimke, Kaitlin Kirkpatrick; Caga-anan, Charlisse; Kaufman, David; Mechanic, Leah E.; McAllister, Kimberly A. (February 2025). "Ethical, Legal, and Social Implications of Gene-Environment Interaction Research". Genetic Epidemiology. 49 (1) e22591. doi:10.1002/gepi.22591. ISSN   0741-0395.
  21. Soares, Gustavo H.; Hedges, Joanne; Sethi, Sneha; Poirier, Brianna; Jamieson, Lisa (September 2023). "From biocolonialism to emancipation: considerations on ethical and culturally respectful omics research with indigenous Australians". Medicine, Health Care and Philosophy. 26 (3): 487–496. doi:10.1007/s11019-023-10151-1. ISSN   1386-7423. PMID   37171744.
  22. Vears, Danya F.; Savulescu, Julian; Christodoulou, John; Wall, Meaghan; Newson, Ainsley J. (2023-07-01). "Are We Ready for Whole Population Genomic Sequencing of Asymptomatic Newborns?". Pharmacogenomics and Personalized Medicine. 16: 681–691. doi: 10.2147/PGPM.S376083 . PMID   37415831.
  23. Verstrate, C. A.; Mahon, S. M. (2023-08-01). "Direct-to-Consumer Genomic Testing Through an Ethics Lens: Oncology Nursing Considerations". Clinical Journal of Oncology Nursing. 27 (4): 380–388. doi:10.1188/23.CJON.380-388. PMID   37677769.
  24. Chowdhury, Susmita; Dent, Tom; Pashayan, Nora; Hall, Alison; Lyratzopoulos, Georgios; Hallowell, Nina; Hall, Per; Pharoah, Paul; Burton, Hilary (June 2013). "Incorporating genomics into breast and prostate cancer screening: assessing the implications". Genetics in Medicine. 15 (6): 423–432. doi:10.1038/gim.2012.167. PMID   23412607.
  25. Stinchcombe, J. R.; Hoekstra, H. E. (2007). "Combining population genomics and quantitative genetics: Finding the genes underlying ecologically important traits". Heredity. 100 (2): 158–170. doi: 10.1038/sj.hdy.6800937 . PMID   17314923.
  26. Hohenlohe, P. A.; Bassham, S.; Etter, P. D.; Stiffler, N.; Johnson, E. A.; Cresko, W. A. (2010). "Population Genomics of Parallel Adaptation in Threespine Stickleback using Sequenced RAD Tags". PLOS Genetics. 6 (2) e1000862. doi: 10.1371/journal.pgen.1000862 . PMC   2829049 . PMID   20195501.
  27. Harpur, B. A.; Kent, C. F.; Molodtsova, D.; Lebon, J. M. D.; Alqarni, A. S.; Owayss, A. A.; Zayed, A. (2014). "Population genomics of the honey bee reveals strong signatures of positive selection on worker traits". Proceedings of the National Academy of Sciences. 111 (7): 2614–2619. Bibcode:2014PNAS..111.2614H. doi: 10.1073/pnas.1315506111 . PMC   3932857 . PMID   24488971.
  28. Davey, J. W.; Blaxter, M. L. (2011). "RADSeq: Next-generation population genetics". Briefings in Functional Genomics. 9 (5–6): 416–423. doi:10.1093/bfgp/elq031. PMC   3080771 . PMID   21266344.
  29. Ellegren, H. (2014). "Genome sequencing and population genomics in non-model organisms". Trends in Ecology & Evolution. 29 (1): 51–63. Bibcode:2014TEcoE..29...51E. doi:10.1016/j.tree.2013.09.008. PMID   24139972.
  30. You, N.; Murillo, G.; Su, X.; Zeng, X.; Xu, J.; Ning, K.; Zhang, S.; Zhu, J.; Cui, X. (2012). "SNP calling using genotype model selection on high-throughput sequencing data". Bioinformatics. 28 (5): 643–650. doi:10.1093/bioinformatics/bts001. PMC   3338331 . PMID   22253293.
  31. Greminger, M. P.; Stölting, K. N.; Nater, A.; Goossens, B.; Arora, N.; Bruggmann, R. M.; Patrignani, A.; Nussberger, B.; Sharma, R.; Kraus, R. H. S.; Ambu, L. N.; Singleton, I.; Chikhi, L.; Van Schaik, C. P.; Krützen, M. (2014). "Generation of SNP datasets for orangutan population genomics using improved reduced-representation sequencing and direct comparisons of SNP calling algorithms". BMC Genomics. 15: 16. doi: 10.1186/1471-2164-15-16 . PMC   3897891 . PMID   24405840.

References