DNA encryption

Last updated

DNA encryption is the process of hiding or perplexing genetic information by a computational method in order to improve genetic privacy in DNA sequencing processes. The human genome is complex and long, but it is very possible to interpret important, and identifying, information from smaller variabilities, rather than reading the entire genome. A whole human genome is a string of 3.2 billion base paired nucleotides, the building blocks of life, but between individuals the genetic variation differs only by 0.5%, an important 0.5% that accounts for all of human diversity, the pathology of different diseases, and ancestral story. [1] Emerging strategies incorporate different methods, such as randomization algorithms and cryptographic approaches, to de-identify the genetic sequence from the individual, and fundamentally, isolate only the necessary information while protecting the rest of the genome from unnecessary inquiry. The priority now is to ascertain which methods are robust, and how policy should ensure the ongoing protection of genetic privacy.

Contents

History

In 2003, the National Human Genome Research Institute and its affiliated partners successfully sequenced the first whole human genome, a project that took just under $3 billion to complete. [2] Four years later, James Watson – one of the co-discoverers of the structure of DNA – was able to sequence his genome for less than $1.5 million. [3] As genetic sequencing technologies have proliferated, streamlined and become adapted to clinical means, they can now provide incredible insight into individual genetic identities at a much lower cost, with biotech competitors vying for the title of the $1,000 genome. [4] Genetic material can now be extracted from a person's saliva, hair, skin, blood, or other sources, sequenced, digitized, stored, and used for numerous purposes. Whenever data is digitized and stored, there is the possibility of privacy breaches. [5] While modern whole genome sequencing technology has allowed for unprecedented access and understanding of the human genome, and excitement for the potentialities of personalized medicine, it has also generated serious conversation about the ethics and privacy risks that accompany this process of uncovering an individual's essential instructions of being: their DNA sequence.

Research

Genetic sequencing is a pivotal component of producing scientific knowledge about disease origins, disease prevention, and developing meaningful therapeutic interventions. Much of research utilizes large-group DNA samples or aggregate genome-wide datasets to compare and identify genes associated with particular diseases or phenotypes; therefore, there is much opposition to restricting genome database accessibility and much support for fortifying such wide-scale research. For example, if an informed consent clause were to be enforced for all genetics research, existing genetic databases could not be reused for new studies - all datasets would either need to be destroyed at the end of every study or all participants would need to re-authorize permissions with each new study. [1] As genetic datasets can be extrapolated to closely related family members, this adds another dimension of required consent in the research process. This fundamentally raises the question of whether or not these restrictions are necessary privacy protections or a hindrance to scientific progress.

Clinical Use

In medicine, genetic sequencing is not only important for traditional uses, such as paternity tests, but also for facilitating ease in diagnosis and treatment. Personalized medicine has been heralded as the future of healthcare, as whole genome sequencing have provided the possibility personalizing treatment to individual expression and experience of disease. As pharmacology and drug development are based on population studies, current treatments are normalized to whole populations statistics, which might reduce treatment efficacy for individuals, as everyone's response to a disease and to drug therapy is uniquely bound to their genetic predispositions. Already, genetic sequencing has expedited prognostic counseling in monogenic diseases that requires rapid, differential diagnosis in neonatal care. [6] However, the often blurred distinction between medical usage and research usage can complicate how privacy between these two realms are handled, as they often require different levels of consent and leverage different policy.

Commercial Use

Even in the consumer market, people have flocked to Ancestry.com and 23andMe to discover their heritage and elucidate their genotypes. As the nature of consumer transactions allows for these electronic click wrap models to bypass traditional forms of consent in research and healthcare, consumers may not completely comprehend the implications of having their genetic sequence digitized and stored. Furthermore, corporate privacy policies often operate outside the realm of federal jurisdiction, exposing consumers to informational risks, both in terms of their genetic privacy and their self-disclosed consumer profile, including self-disclosed family history, health status, race, ethnicity, social networks, and much more. [7] Simply having databases invites potential privacy risks, as data storage inherently entails the possibility of data breaches and governmental solicitation of datasets. 23andMe have already received four requests from the Federal Bureau of Investigation (FBI) to access consumer datasets and although those requests were denied, this reveals a similar conundrum as the FBI–Apple encryption dispute. [7]

Forensic Use

DNA-information can be used to solve criminal cases by establishing a match between a known suspect of a particular crime and an unknown suspect of an unsolved crime. However, DNA-information on its own can lead to expected errors of a certain probability and should not be used as entirely reliable evidence on its own. [8]

Policy

As an individual's genomic sequence can reveal telling medical information about themselves, and their family members, privacy proponents believe that there should be certain protections in place to ultimately protect the privacy and identity of the user from possible discrimination by insurance companies or employers, the major concern voiced. There have been instances in which genetic discrimination has occurred, often revealing how science can be misinterpreted by non-experts. In 1970, African-Americans were denied insurance coverage or charged higher premiums because they were known carriers of sickle-cell anemia, but as carriers, they do not have any medical problems themselves, and this carrier advantage actually confers resistance against malaria. [9] The legitimacy of these policies has been challenged by scientists who condemn this attitude of genetic determinism, that genotype wholly determines phenotype. Environmental factors, differential development patterns, and the field of epigenetics would argue gene expression is much more complex and genes are not a diagnosis, nor a reliable diagnosis, of an individual's medical future. [10]

Incipient legislations have manifested in response to genetic exceptionalism, the heightened scrutiny expected of genomics research, such as the 2008 Genetic Information Nondiscrimination Act (GINA) in the United States; however, in many cases, the scope and accountability of formal legislation is rather uncertain, as the science seems to be proceeding at a much more rapid pace than the law, and specialized ethics committees have had to fill this necessary niche. [1] Much of the criticism targets how policy fundamentally lacks an understanding of technical issues involved in genome sequencing and fails to address how in the event of a data breach, an individual's personal genome can not be replaced, complicating privacy protection even further. [1] As computational genomics is such a technical field, the translation of expert language to policy is difficult - let alone translation to laymen language -, presenting a certain barrier to public perception about the capabilities of current genomic sequencing technologies which, ultimately, makes the discourse about protecting genetic privacy without impeding scientific advancement an even more difficult one to have.

Across the world, each country has unique healthcare and research frameworks that produce different policy needs – genetic privacy policy is further complicated when considering international collaborations on genetic research or international biobanks, databases that store biological samples and DNA information. Furthermore, research and healthcare are not the only fields that require formal jurisdiction; other areas of concern include the genetic privacy of those in the criminal justice system and those who engage with private consumer-based genomic sequencing.

Forensic Science

England and Wales

91% of the largest forensic DNA database in the world, the National Criminal Intelligence DNA Database (NDNAD), contains DNA information from residents of England and Wales. [8] The NDNAD stores genetic information of criminally convicted individuals, those who were charged but acquitted of a recordable offence, those who were arrested but never charged with a recordable offense, and those who are under counterterrorism control. Of the 5.5 million people in the database, which represents 10% of the total population, 1.2 million have never been convicted of a crime. [8] The European Court of Human Rights decided, in the case of S and Marper v United Kingdom (2008), that the government must present sufficient justification for differential treatment of DNA profiles of those in the criminal justice system compared to that of non-convicted individuals; essentially, there must be no abuse of retained biological materials and DNA-information. [8] The decision highlighted several existing issues with the current system that poses privacy risks for the individuals involved: the storage of personal information with genetic information, the storage of DNA profiles with the inherent capacity to determine genetic relationships, and fundamentally, the act of storing of cellular samples and DNA profiles produces opportunities for privacy risks. As a result, the Protection of Freedoms Act 2012 was created to ensure proper use of collected DNA materials and regulate their storage and destruction. [8] However, many problems still persist, as samples can still be retained indefinitely in databases, regardless of whether or not the affected individual was convicted – and even the samples of juvenile delinquents. Critics have argued that this long-term retention could lead to stigmatization of affected individuals and inhibit their re-integration into society and also, are subject to misuse by discriminatory behavior innate to the criminal justice system. [8]

Germany

In 1990, the Federal Supreme Court of Germany and the Federal Constitutional Court of Germany decided that sections of the German Code of Criminal Procedure provided justifiable legal basis for the use of genetic fingerprinting in identifying criminals and absolving innocents. [8] The decisions, however, lacked specific details on how biological materials can be obtained and how genetic fingerprinting can be utilized; only regulations of blood tests and physical examinations were explicitly outlined. In 1998, the German Parliament authorized the establishment of a national DNA database, due to mounting pressure to prevent cases of sexual abuse and homicides involving children. [8] This decision rendered as constitutional and supported by a compelling public interest by the Federal Constitutional Court in 2001, despite some criticism that the right of informational self-determination was violated. The court did mandate that DNA information and samples must be supported by evidence that the individual can commit a similar crime in the future. To address the legal uncertainty, the Act on Forensic DNA Analysis of 2005 introduced provisions that included exact and limited legal grounds for the use of DNA based information in criminal proceedings. [8] Some sections order that DNA samples may only be used if they are necessary to accelerate the investigation, eliminate suspects, and a court must order genetic fingerprinting. Since its implementation, there has been a monthly addition of 8000 new sets to the database, bringing into question the necessity of such wide scale data collection and whether or not the wording of the provisions provided effective privacy protection. [8] A recent controversial decision by the German government expanded the range of familial searching by DNA dragnet to identify genetic relatives of sexual and violent perpetrators – an action that was previously deemed as having no legal basis by the Federal Supreme Court of Germany in 2012. [8]

South Korea

The National Forensic Service of South Korea and the Public Prosecution Authority of South Korea established separate DNA analysis departments in 1991, despite initial public criticism that the data collection was enacted without considering the informational privacy of subjects involved, a criticism that turned to support with a series of high-profile cases. [8] In 2006, a proposed bill by the General Assembly on the collection and operationalization of DNA information outlined crime categories for the storage, the control, and the destruction of DNA samples and DNA information. However, the bill failed to pass as it could not translate into any significant change in actual practice. The incomprehensive crime categories included were only applicable in obtaining biological information without an individual's consent, and the protocol to destroy collected samples were unclear, exposing them to misuse. [8]

The DNA Information Act of 2009 attempted to resolve these weaknesses, including provisions that stated biologically sensitive information may only be collected from convicted individuals, confined suspects, and crime scenes. Genetic fingerprinting was made permissible for specific crimes, including arson, murder, kidnapping, rape or sexual molestation, trespass upon residence at night for stealing, larceny, and burglary, and numerous other violent crimes. [8] The act also required a written warrant for acquiring samples from convicted criminals or suspects if the concerned individuals do not give written consent. All samples must be destroyed in a timely manner if the concerned individual is proclaimed innocent, acquitted, their prosecution is dismissed, and upon their death. [8] Importantly, if collected samples are used to ascertain individuals at the crime scene, the DNA information must be destroyed upon successful identification. However, there are still several flaws and criticisms to this legislation, in terms of clarifying the presumption of innocence, the rather trivial enforcement of sample destruction (only 2.03% of samples are deleted annually) and requisite of a written warrant (99.6% of samples are obtained without a warrant), and there is still much debate about whether or not this legislations violates the right of informational self-determination. [8]

Biobanks

United States

In the United States, biobanks are primarily under the jurisdiction of the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule and the Federal Policy for Protection of Human Subjects (Common Rule). As neither of these rules was conceived with the intention of regulating biobanks and the decentralized levels of regulation, there have been many challenges in their application and enforcement, and federal law fails to directly tackle international policy and how data can be shared outside of the EU-US Safe Harbor Agreement. [11] An area that needs clarification is how federal and state laws are differentially and specifically applied to different biobanks, researchers, or projects, a situation further complicated by the fact that most biobanks are part of larger entities, or in collaboration with other institutions, confusing the line public and private interests. About 80% of all biobanks have internal oversight boards that regulate data collection, usage, and distribution. There are three basic access models applied to the accessibility of biobank samples and data: open access (unrestricted to anyone), tiered access (some restrictions to access dependent on the nature of the project), or controlled access (tightly controlled access). [11]

GINA provisions prohibit health insurers from requiring genetic testing or requesting genetic information for enrollment purposes and prohibit employers from requesting genetic testing or genetic information for any type of employment assessment,(hiring, promotion, termination). However, insurers can request genetic information to determine coverage of a specific procedure. Some groups are also excluded from following GINA's provisions, including insurers and employers of federal government employees, military, and employers with fewer than 15 employees. [12]

China

China has a widespread network of hospitals and research institutes. It is currently undergoing a plan to create a more cohesive framework for data sharing among existing biobanks, which was previously under the jurisdiction of overlapping and confusing regulatory laws. Many biobanks operate under independently, or within a network of other networks, with the most prominent being the Shanghai Biobank network. [13] Under this main network, guidelines detail specific de-identification policies and explicitly endorse broad consent. Recently, the Chinese Constitution has formally recognized individual privacy as a distinct and independent constitutional right, and therefore, legislators have begun developing a Draft Ordinance on Human Genetics Resources to organize national laws on biobanking management measures, legal liability, and punishment for violations. International data sharing will be even more strictly regulated under these federal laws. [13]

Australia

Biobanks in Australia are mainly under the regulation of healthcare privacy guidelines and human research ethics committees – no formal biobank legislation exists but international data sharing is widely permitted. The National Health and Medical Research Council (NHMRC) develops guidelines for and funds many of these institutions. There is discussion towards broad consent for biobanking. [14]

Consumer Genetic Testing

Electronic Frontier Foundation, a privacy advocate, found that existing legislation does not have formal jurisdiction in ensuring consumer privacy where DNA information is concerned. Genetic information stored by consumer businesses are not protected by the HIPAA; therefore, these companies can share genetic information with third parties, conditions contingent upon their own privacy statements. [7] Most genetic testing companies only share anonymized, aggregated data with users’ consent. Ancestry.com and 23andMe do sell such data to research institutions and other organizations, and can ask for a case-by-case consent to release non-anonymized data to other parties, including employers or insurers. 23andMe even issues a warning that re-identification may take place and is possible. If a consumer explicitly refuses research use or requests for their data to be destroyed, 23andMe is still allowed to use their consumer identifying and behavioral information, such as browsing patterns and geographical location, for other marketing services. [7]

Areas of Concern

Many computational experts have developed, and are developing, more secure systems of genomics sequencing to protect the future of this field from misguided jurisdiction, wrongful application of genetics data, and above all, the genetic privacy of individuals. There are currently four major areas of genetics research in which privacy-preserving technologies are being developed for:

String searching and comparison

Paternity tests, genetic compatibility tests, and ancestry testing are all types of medical tools that rely on string searching and comparison algorithms. [1] Simply, this is a needle-in-a-haystack approach, in which a dataset is searched for a matching “string”, the sequence or pattern of interest. As these types of testing have become more common, and adapted to consumer genomic models, such as smartphone apps or trendy DNA tests, current privacy securing methods are focused on fortifying this process and protecting both healthcare and private usage.

Aggregate data release

The modern age of big data and large scale genomic testing necessitates processing systems that minimize privacy risks when releasing aggregate genomic data, which essentially means ensuring that individual data cannot be discerned within a genomic database. [1] This differential privacy approach is a simple evaluation of the security of a genomic database and many researchers provide "checks" on the stringency of existing infrastructures.

Alignment of raw genomic data

One of the most important developments in the field of genomics is the capacity for read mapping, in which millions of short sequences can be aligned to a reference DNA sequence in order to process large datasets efficiently. As this high-capacity process is often divided up between public and private computing environments, there is a lot of associated risk and stages where genetic privacy is particularly vulnerable; therefore, current studies focus on how to provide secure operations within two different data domains without sacrificing efficiency and accuracy. [1]

Clinical use

With the advent of high throughput genomic technology allowing unprecedented access to genetic information, personalized medicine is gaining momentum as the promised future of healthcare, rendering secure genomic testing models as imperative for the progress of medicine. Particularly, concerns voice how this process will involve multiparty engagement and access to data. [1] The distinction between genetic sequencing for medicine and research purposes is a contentious one, and furthermore, anytime healthcare is involved in a discussion, the dimension of patient privacy must be considered, as it may conflict or complement genetic privacy.

Encryption Methods

Secure read mapping

Sensitive read mapping is essential to genomics research, as read mapping is not only important for DNA sequencing, but also for identifying target regulatory molecules in RNA-Seq. A solution proposes splitting read mapping into two tasks on a hybridized computing operation: the exact matching of reads using keyed hash values can be conducted on a public cloud and the alignment of reads can be conducted on a private cloud. As only keyed hash values are exposed to public scrutiny, the privacy of the original sequence is preserved. [15] However, as alignment processes tends to be high volume and work intensive, most sequencing schemes still functionally require third party computing operations, which reintroduce privacy risks in the public cloud domain.

Secure string searching

Numerous genetic screening tests rely on string searching and have become commonplace in healthcare; therefore, the privacy of such methodologies have been an important area of development. One protocol hides the position and size of partial substrings, allowing one party (the researcher or physician) with the digitized genome and a second party (research subject or patient) with sole propriety of his or her DNA marker to conduct secure genetic tests. Only the researcher or the physician knows the conclusion of the string searching and comparison scheme and neither party can access other information, ensuring privacy preservation. [16]

Secure genome query

The basis of personalized medicine and preventative healthcare is establishing genetic compatibility by comparing an individual's genome against known variations to estimate susceptibility to diseases, such as breast cancer or diabetes, to evaluate pharmacogenomics, and to query biological relationships among individuals. [5] For disease risk tests, studies have proposed a privacy preserving technique that utilizes homomorphic encryption and secure integer comparison, and suggests storing and processing sensitive data in an encrypted form. To ensure privacy, the storage and processing unit (SPU) stores all the single-nucleotide polymorphism (SNPs) as real SNPs - the observed SNPs in the patient - with redundant content from set of potential SNPs. [17] [18] Another solution developed three protocols to secure calculating edit distance using intersections of Yao's Garbled Circuit and a banded alignment algorithm. The major drawback of this solution is its inability of performing large scale computations while retaining accuracy. [19]

Secure genome-wide association studies

Genome-wide association studies (GWAS) are important in locating specific variations in genome sequences that lead to disease. Privacy preserving algorithms that identify SNPs significantly associated with diseases are based on introducing random noise to aggregate statistics to protect individual privacy. [20] In another study, the nature of linkage disequilibrium is utilized in selecting the most useful datasets while maximizing protection of patient privacy with injected noise; however, it may lack effective disease association capabilities. [21] Critics of these methods note that a substantial amount of noise is required to satisfy differential privacy for a small ratio of SNPs, an impracticality in conducting efficient research. [5]

Authenticated encryption storage

The nature of genomic sequences requires a specific encryption tool to protect against low complexity (repetitive content) attacks and KPA (Known-plaintext attack), given several expected symbols. Cryfa [22] uses packing (reducing the storage size), a shuffling mechanism (randomizing the symbol positions), and the AES cipher (Advanced Encryption Standard) to securely store FASTA, FASTQ, VCF, SAM and BAM files with authenticated encryption.

Related Research Articles

<span class="mw-page-title-main">Genetic testing</span> Medical test

Genetic testing, also known as DNA testing, is used to identify changes in DNA sequence or chromosome structure. Genetic testing can also include measuring the results of genetic changes, such as RNA analysis as an output of gene expression, or through biochemical analysis to measure specific protein output. In a medical setting, genetic testing can be used to diagnose or rule out suspected genetic disorders, predict risks for specific conditions, or gain information that can be used to customize medical treatments based on an individual's genetic makeup. Genetic testing can also be used to determine biological relatives, such as a child's biological parentage through DNA paternity testing, or be used to broadly predict an individual's ancestry. Genetic testing of plants and animals can be used for similar reasons as in humans, to gain information used for selective breeding, or for efforts to boost genetic diversity in endangered populations.

<span class="mw-page-title-main">DNA sequencing</span> Process of determining the nucleic acid sequence

DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.

<span class="mw-page-title-main">UK Biobank</span> Long-term biobank study of 500,000 people

UK Biobank is a large long-term biobank study in the United Kingdom (UK) which is investigating the respective contributions of genetic predisposition and environmental exposure to the development of disease. It began in 2006. UK Biobank has been cited as an important resource for cancer research.

deCODE genetics is a biopharmaceutical company based in Reykjavík, Iceland. The company was founded in 1996 by Kári Stefánsson with the aim of using population genetics studies to identify variations in the human genome associated with common diseases, and to apply these discoveries "to develop novel methods to identify, treat and prevent diseases."

<span class="mw-page-title-main">Personalized medicine</span> Medical model that tailors medical practices to the individual patient

Personalized medicine, also referred to as precision medicine, is a medical model that separates people into different groups—with medical decisions, practices, interventions and/or products being tailored to the individual patient based on their predicted response or risk of disease. The terms personalized medicine, precision medicine, stratified medicine and P4 medicine are used interchangeably to describe this concept though some authors and organisations use these expressions separately to indicate particular nuances.

Genetic discrimination occurs when people treat others differently because they have or are perceived to have a gene mutation(s) that causes or increases the risk of an inherited disorder. It may also refer to any and all discrimination based on the genotype of a person rather than their individual merits, including that related to race, although the latter would be more appropriately included under racial discrimination. Some legal scholars have argued for a more precise and broader definition of genetic discrimination: "Genetic discrimination should be defined as when an individual is subjected to negative treatment, not as a result of the individual's physical manifestation of disease or disability, but solely because of the individual's genetic composition." Genetic Discrimination is considered to have its foundations in genetic determinism and genetic essentialism, and is based on the concept of genism, i.e. distinctive human characteristics and capacities are determined by genes.

<span class="mw-page-title-main">Ancestry-informative marker</span>

In population genetics, an ancestry-informative marker (AIM) is a single-nucleotide polymorphism that exhibits substantially different frequencies between different populations. A set of many AIMs can be used to estimate the proportion of ancestry of an individual derived from each population.

Public health genomics is the use of genomics information to benefit public health. This is visualized as more effective preventive care and disease treatments with better specificity, tailored to the genetic makeup of each patient. According to the Centers for Disease Control and Prevention (U.S.), Public Health genomics is an emerging field of study that assesses the impact of genes and their interaction with behavior, diet and the environment on the population's health.

A DNA database or DNA databank is a database of DNA profiles which can be used in the analysis of genetic diseases, genetic fingerprinting for criminology, or genetic genealogy. DNA databases may be public or private, the largest ones being national DNA databases.

<span class="mw-page-title-main">Biobank</span> Repository of biological samples used for research

A biobank is a type of biorepository that stores biological samples for use in research. Biobanks have become an important resource in medical research, supporting many types of contemporary research like genomics and personalized medicine.

Personal genomics or consumer genetics is the branch of genomics concerned with the sequencing, analysis and interpretation of the genome of an individual. The genotyping stage employs different techniques, including single-nucleotide polymorphism (SNP) analysis chips, or partial or full genome sequencing. Once the genotypes are known, the individual's variations can be compared with the published literature to determine likelihood of trait expression, ancestry inference and disease risk.

<span class="mw-page-title-main">1000 Genomes Project</span> International research effort on genetic variation

The 1000 Genomes Project (1KGP), taken place from January 2008 to 2015, was an international research effort to establish the most detailed catalogue of human genetic variation at the time. Scientists planned to sequence the genomes of at least one thousand anonymous healthy participants from a number of different ethnic groups within the following three years, using advancements in newly developed technologies. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journal Nature. In 2012, the sequencing of 1092 genomes was announced in a Nature publication. In 2015, two papers in Nature reported results and the completion of the project and opportunities for future research.

A biorepository is a facility that collects, catalogs, and stores samples of biological material for laboratory research. Biorepositories collect and manage specimens from animals, plants, and other living organisms. Biorepositories store many different types of specimens, including samples of blood, urine, tissue, cells, DNA, RNA, and proteins. If the samples are from people, they may be stored with medical information along with written consent to use the samples in laboratory studies.

<span class="mw-page-title-main">Whole genome sequencing</span> Determining nearly the entirety of the DNA sequence of an organisms genome at a single time

Whole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA contained in the mitochondria and, for plants, in the chloroplast.

<span class="mw-page-title-main">Exome sequencing</span> Sequencing of all the exons of a genome

Exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome. It consists of two steps: the first step is to select only the subset of DNA that encodes proteins. These regions are known as exons—humans have about 180,000 exons, constituting about 1% of the human genome, or approximately 30 million base pairs. The second step is to sequence the exonic DNA using any high-throughput DNA sequencing technology.

<span class="mw-page-title-main">DNA nanoball sequencing</span>

DNA nanoball sequencing is a high throughput sequencing technology that is used to determine the entire genomic sequence of an organism. The method uses rolling circle replication to amplify small fragments of genomic DNA into DNA nanoballs. Fluorescent nucleotides bind to complementary nucleotides and are then polymerized to anchor sequences bound to known sequences on the DNA template. The base order is determined via the fluorescence of the bound nucleotides This DNA sequencing method allows large numbers of DNA nanoballs to be sequenced per run at lower reagent costs compared to other next generation sequencing platforms. However, a limitation of this method is that it generates only short sequences of DNA, which presents challenges to mapping its reads to a reference genome. After purchasing Complete Genomics, the Beijing Genomics Institute (BGI) refined DNA nanoball sequencing to sequence nucleotide samples on their own platform.

Biobank ethics refers to the ethics pertaining to all aspects of biobanks. The issues examined in the field of biobank ethics are special cases of clinical research ethics.

Dynamic consent is an approach to informed consent that enables on-going engagement and communication between individuals and the users and custodians of their data. It is designed to address the many issues that are raised by the use of digital technologies in research and clinical care that enable the wide-scale use, linkage, analysis and integration of diverse datasets and the use of AI and big data analyses. These issues include how to obtain informed consent in a rapidly-changing environment; growing expectations that people should know how their data is being used; increased legal and regulatory requirements for the management of secondary use of data in biobanks and other medical research infrastructure. The approach started to be implemented in 2007 by an Italian group who introduced the ways to have an ongoing process of interaction between researcher and participant where "technology now allows the establishment of dynamic participant–researcher partnerships." The use of digital interfaces in this way was first described as 'Dynamic Consent' in the EnCoRe project. Dynamic Consent therefore describes a personalised, digital interface that enables two-way communication between participants and researchers and is a practical example of how software can be developed to give research participants greater understanding and control over how their data is used. It also enables clinical trial managers, researchers and clinicians to know what type of consent is attached to the use of data they hold and to have an easy way to seek a new consent if the use of the data changes. It is able to support greater accountability and transparency, streamlining consent processes to enable compliance with regulatory requirements.

Genetic privacy involves the concept of personal privacy concerning the storing, repurposing, provision to third parties, and displaying of information pertaining to one's genetic information. This concept also encompasses privacy regarding the ability to identify specific individuals by their genetic sequence, and the potential to gain information on specific characteristics about that person via portions of their genetic information, such as their propensity for specific diseases or their immediate or distant ancestry.

Elective genetic and genomic testing are DNA tests performed for an individual who does not have an indication for testing. An elective genetic test analyzes selected sites in the human genome while an elective genomic test analyzes the entire human genome. Some elective genetic and genomic tests require a physician to order the test to ensure that individuals understand the risks and benefits of testing as well as the results. Other DNA-based tests, such as a genealogical DNA test do not require a physician's order. Elective testing is generally not paid for by health insurance companies. With the advent of personalized medicine, also called precision medicine, an increasing number of individuals are undertaking elective genetic and genomic testing.

References

  1. 1 2 3 4 5 6 7 8 Ayday E, De Cristofaro E, Hubaux JP, Tsudik G (February 2015). "Whole genome sequencing: Revolutionary medicine or privacy nightmare?". Computer. 48 (2): 58–66. doi:10.1109/MC.2015.59. hdl: 11693/22558 . S2CID   2431877.
  2. Guttmacher AE, Collins FS (September 2003). "Welcome to the genomic era". The New England Journal of Medicine. 349 (10): 996–8. doi: 10.1056/NEJMe038132 . PMID   12954750. S2CID   32233347.
  3. Wadman M (April 2008). "James Watson's genome sequenced at high speed". Nature. 452 (7189): 788. Bibcode:2008Natur.452R....W. doi: 10.1038/452788b . PMID   18431822. S2CID   205037214.
  4. Service RF (March 2006). "Gene sequencing. The race for the $1000 genome". Science. 311 (5767): 1544–6. doi:10.1126/science.311.5767.1544. PMID   16543431. S2CID   23411598.
  5. 1 2 3 Akgün M, Bayrak AO, Ozer B, Sağıroğlu MŞ (August 2015). "Privacy preserving processing of genomic data: A survey". Journal of Biomedical Informatics. 56 (Supplement C): 103–11. doi: 10.1016/j.jbi.2015.05.022 . PMID   26056074.
  6. Saunders CJ, Miller NA, Soden SE, Dinwiddie DL, Noll A, Alnadi NA, et al. (October 2012). "Rapid whole-genome sequencing for genetic disease diagnosis in neonatal intensive care units". Science Translational Medicine. 4 (154): 154ra135. doi:10.1126/scitranslmed.3004041. PMC   4283791 . PMID   23035047.
  7. 1 2 3 4 Drabiak K (2017-01-01). "Caveat Emptor: How the Intersection of Big Data and Consumer Genomics Exponentially Increases Information Privacy Risks". Health Matrix: The Journal of Law-Medicine. 27 (1): 143. Gale   A495831384.
  8. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Lee J (March 2016). "The presence and future of the use of DNA-Information and the protection of genetic informational privacy: A comparative perspective". International Journal of Law, Crime and Justice. 44: 212–29. doi:10.1016/j.ijlcj.2015.10.001.
  9. Brown SM (2003). Essentials of Medical Genomics. John Wiley & Sons. p. 219. ISBN   978-0-471-27061-4.
  10. Weinhold B (March 2006). "Epigenetics: the science of change". Environmental Health Perspectives. 114 (3): A160-7. doi:10.1289/ehp.114-a160. PMC   1392256 . PMID   16507447.
  11. 1 2 Harrell HL, Rothstein MA (March 2016). "Biobanking Research and Privacy Laws in the United States". The Journal of Law, Medicine & Ethics. 44 (1): 106–27. doi:10.1177/1073110516644203. PMID   27256128. S2CID   3670078.
  12. Gammon A, Neklason DW (October 2015). "Confidentiality & the Risk of Genetic Discrimination: What Surgeons Need to Know". Surgical Oncology Clinics of North America. 24 (4): 667–81. doi:10.1016/j.soc.2015.06.004. PMC   4568442 . PMID   26363536.
  13. 1 2 Chen H, Chan B, Joly Y (2015). "Privacy and Biobanking in China: A Case of Policy in Transition". The Journal of Law, Medicine & Ethics. 43 (4): 726–42. doi:10.1111/jlme.12315. PMID   26711413. S2CID   23026547.
  14. Chalmers D (2015). "Biobanking and Privacy Laws in Australia". The Journal of Law, Medicine & Ethics. 43 (4): 703–13. doi:10.1111/jlme.12313. PMID   26711411. S2CID   34539242.
  15. Wang XF. "Large-Scale Privacy-Preserving Mapping of Human Genomic Sequences on Hybrid Clouds. | Welcome". www.informatics.indiana.edu. Retrieved 2017-11-03.
  16. De Cristofaro E, Faber S, Tsudik G (November 2013). Secure genomic testing with size-and position-hiding private substring matching. Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society. New York, NY, USA. pp. 107–118. doi:10.1145/2517840.2517849.
  17. "Privacy-Preserving Computation of Disease Risk by Using Genomic, Clinical, and Environmental Data | USENIX". www.usenix.org. Retrieved 2017-11-03.
  18. Ayday E, Raisaro JL, Hubaux JP (2012). "Privacy-Enhancing Technologies for Medical Tests Using Genomic Data".{{cite journal}}: Cite journal requires |journal= (help)
  19. Aziz MM, Alhadidi D, Mohammed N (July 2017). "Secure approximation of edit distance on genomic data". BMC Medical Genomics. 10 (Suppl 2): 41. doi: 10.1186/s12920-017-0279-9 . PMC   5547448 . PMID   28786362.
  20. Johnson A, Shmatikov V (August 2013). "Privacy-preserving data exploration in genome-wide association studies". Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. Vol. 2013. pp. 1079–1087. doi:10.1145/2487575.2487687. ISBN   9781450321747. PMC   4681528 . PMID   26691928.
  21. Zhao Y, Wang X, Jiang X, Ohno-Machado L, Tang H (January 2015). "Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery". Journal of the American Medical Informatics Association. 22 (1): 100–8. doi:10.1136/amiajnl-2014-003043. PMC   4433380 . PMID   25352565.
  22. Hosseini M, Pratas D, Pinho AJ (January 2019). "Cryfa: a secure encryption tool for genomic data". Bioinformatics. 35 (1): 146–148. doi:10.1093/bioinformatics/bty645. PMC   6298042 . PMID   30020420.