openSNP is an open source website where users can share their genetic information. [1] Users upload their genes, including gender, age, eye color, medical history, Fitbit data. With a focus on user patient-led research (PLR), there is potential to redefine the way health research is conducted.
"It promises to be a vital supplement to standard research: it can focus on conditions that are neglected by standard research, such as rare diseases or side effects, and can draw on a broader range of data and deliver outcomes more rapidly. It can also be a way of realising valuable forms of social interaction and support in cases where members of a community conduct PLR together, for example, patients suffering from the same illness." [2]
The name of the project is inspired by single nucleotide polymorphism (SNP), which is a DNA variation at a specific location on a strand. Scientists have discovered that there is a correlation between certain SNPs and genetic predispositions such as Mendelian disease. [3]
The code of the project is on GitHub and the CSS is licensed under the Apache 2.0 license. [1]
Since openSNP is an open-sourced social network that is readily available on the internet, there have been questions raised surrounding privacy issues and other risks. [4] Though the sign-up page warns potential users of the record lasting forever, participants must decide for themselves whether the benefits outweigh the pitfalls. As health research continues to progress, more and more scientific analysis places a greater role on PLR, leading to increased demands for a new social contract to secure conditions for participants. [2] Human participant research not only places subjects into potentially harmful situations, but also can lead to other risks such as exploitation and self-experimentation under non-controlled environments. There is also the risk of biases and distortions "arising from self-reporting and self-collected data". However, at this current state and time, the effects of genetic discrimination are unknown due to the lack of evidence. [5]
"Till date no systematic evaluation of the true value of anonymity with respect to the cost of genome information and insight has been assessed in real-life settings. This would require appropriate availability of information including caveats to whole genome assessment and analysis" [6]
Still, with the rise of open genomic research, privacy protection frameworks need strengthened efforts beyond "traditional legal and organizational safeguards", technical solutions such as data encryption, and mutual understanding. [7] In a study of an article done through the University of San Diego School of Law, Sejin Ahn discovered that perhaps the most critical solution that needs to be strengthened is the legislative ban on re-identification and anti-discrimination protection. Ahn explains that these remedies must be addressed and updated in order to protect participants from privacy breaches. [8]
A survey of users of the site found that while most respondents 'were well aware of the privacy risks of their involvement in open genetic data sharing and considered the possibility of direct, personal repercussions troubling, they estimated the risk of this happening to be negligible'. [9]
The website provides a proof-of-concept mechanism for allowing anyone to be involved in any stage of genomics research. This model allows partnerships to form which can be independent of governments, academia or for-profit organisations and is a way of creating the enabling conditions for anyone to access, influence and get involved in every stage of the genomics research cycle. [10] The model reflects the value that users of such sites attach to sharing data as 'contributing to the common good of research'. [9]
The transparent open-source code arguably allows greater scrutiny and oversight than similar closed-source projects. [11]
The website was founded by German biologist, Bastian Greshake.
In 2012, Greshake sent a vial of his saliva to 23andMe, a genomics company, to study his own DNA. His results suggested that he was at risk of prostate cancer, and then recommended to his father to receive a medical examination as well. The doctor found a growing tumor in his father's prostate and was able to catch it early. After receiving his results, he posted them on GitHub, hoping to find other users willing to share their personal genetic makeup. Upon realizing that many people were unwilling or did not include a lot of information that was necessary for scientific research, Greshake created openSNP. [12]
"Maybe there are people who are interested in publishing their genetic information on the web to make it available, but those people don't have the opportunity," [12]
Though Greshake acknowledges that there are services that allow people to test their own genes and discover inherent predispositions, they are often expensive, or difficult to access. In 2013, the Food and Drug Administration (FDA) forced company 23andMe to stop marketing their spit-box screening tests due to lack of scientific evidence. However, in 2015, the FDA eased their restrictions and stated that carrier screening tests would not have to undergo preliminary review. [13]
Greshake hopes that by making openSNP accessible and simple, it will not only attract the public to get interested in their genetic makeup, but also to take it down innovative avenues, such as turning openSNP data into music. [14]
The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.
In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome that is present in a sufficiently large fraction of considered population.
Pharmacogenomics is the study of the role of the genome in drug response. Its name reflects its combining of pharmacology and genomics. Pharmacogenomics analyzes how the genetic makeup of a patient affects their response to drugs. It deals with the influence of acquired and inherited genetic variation on drug response, by correlating DNA mutations with pharmacokinetic, pharmacodynamic, and/or immunogenic endpoints.
Genetic genealogy is the use of genealogical DNA tests, i.e., DNA profiling and DNA testing, in combination with traditional genealogical methods, to infer genetic relationships between individuals. This application of genetics came to be used by family historians in the 21st century, as DNA tests became affordable. The tests have been promoted by amateur groups, such as surname study groups or regional genealogical groups, as well as research projects such as the Genographic Project.
UK Biobank is a large long-term biobank study in the United Kingdom (UK) which is investigating the respective contributions of genetic predisposition and environmental exposure to the development of disease. It began in 2006.
Genetic discrimination occurs when people treat others differently because they have or are perceived to have a gene mutation(s) that causes or increases the risk of an inherited disorder. It may also refer to any and all discrimination based on the genotype of a person rather than their individual merits, including that related to race, although the latter would be more appropriately included under racial discrimination. Some legal scholars have argued for a more precise and broader definition of genetic discrimination: "Genetic discrimination should be defined as when an individual is subjected to negative treatment, not as a result of the individual's physical manifestation of disease or disability, but solely because of the individual's genetic composition." Genetic Discrimination is considered to have its foundations in genetic determinism and genetic essentialism, and is based on the concept of genism, i.e. distinctive human characteristics and capacities are determined by genes.
In genomics, a genome-wide association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.
Personal genomics or consumer genetics is the branch of genomics concerned with the sequencing, analysis and interpretation of the genome of an individual. The genotyping stage employs different techniques, including single-nucleotide polymorphism (SNP) analysis chips, or partial or full genome sequencing. Once the genotypes are known, the individual's variations can be compared with the published literature to determine likelihood of trait expression, ancestry inference and disease risk.
Population genomics is the large-scale comparison of DNA sequences of populations. Population genomics is a neologism that is associated with population genetics. Population genomics studies genome-wide effects to improve our understanding of microevolution so that we may learn the phylogenetic history and demography of a population.
The Gene Wiki is a project within Wikipedia that aims to describe the relationships and functions of all human genes. It was established to transfer information from scientific resources to Wikipedia stub articles.
WGAViewer is a bioinformatics software tool which is designed to visualize, annotate, and help interpret the results generated from a genome wide association study (GWAS). Alongside the P values of association, WGAViewer allows a researcher to visualize and consider other supporting evidence, such as the genomic context of the SNP, linkage disequilibrium (LD) with ungenotyped SNPs, gene expression database, and the evidence from other GWAS projects, when determining the potential importance of an individual SNP.
Biobank ethics refers to the ethics pertaining to all aspects of biobanks. The issues examined in the field of biobank ethics are special cases of clinical research ethics.
De-identification is the process used to prevent someone's personal identity from being revealed. For example, data produced during human subject research might be de-identified to preserve the privacy of research participants. Biological data may be de-identified in order to comply with HIPAA regulations that define and stipulate patient privacy laws.
Predictive genomics is at the intersection of multiple disciplines: predictive medicine, personal genomics and translational bioinformatics. Specifically, predictive genomics deals with the future phenotypic outcomes via prediction in areas such as complex multifactorial diseases in humans. To date, the success of predictive genomics has been dependent on the genetic framework underlying these applications, typically explored in genome-wide association (GWA) studies. The identification of associated single-nucleotide polymorphisms underpin GWA studies in complex diseases that have ranged from Type 2 Diabetes (T2D), Age-related macular degeneration (AMD) and Crohn's disease.
In the field of genetic sequencing, genotyping by sequencing, also called GBS, is a method to discover single nucleotide polymorphisms (SNP) in order to perform genotyping studies, such as genome-wide association studies (GWAS). GBS uses restriction enzymes to reduce genome complexity and genotype multiple DNA samples. After digestion, PCR is performed to increase fragments pool and then GBS libraries are sequenced using next generation sequencing technologies, usually resulting in about 100bp single-end reads. It is relatively inexpensive and has been used in plant breeding. Although GBS presents an approach similar to restriction-site-associated DNA sequencing (RAD-seq) method, they differ in some substantial ways.
In genetics, a polygenic score (PGS) is a number that summarizes the estimated effect of many genetic variants on an individual's phenotype. The PGS is also called the polygenic index (PGI) or genome-wide score; in the context of disease risk, it is called a polygenic risk score or genetic risk score. The score reflects an individual's estimated genetic predisposition for a given trait and can be used as a predictor for that trait. It gives an estimate of how likely an individual is to have a given trait based only on genetics, without taking environmental factors into account; and it is typically calculated as a weighted sum of trait-associated alleles.
Genetic privacy involves the concept of personal privacy concerning the storing, repurposing, provision to third parties, and displaying of information pertaining to one's genetic information. This concept also encompasses privacy regarding the ability to identify specific individuals by their genetic sequence, and the potential to gain information on specific characteristics about that person via portions of their genetic information, such as their propensity for specific diseases or their immediate or distant ancestry.
DNA encryption is the process of hiding or perplexing genetic information by a computational method in order to improve genetic privacy in DNA sequencing processes. The human genome is complex and long, but it is very possible to interpret important, and identifying, information from smaller variabilities, rather than reading the entire genome. A whole human genome is a string of 3.2 billion base paired nucleotides, the building blocks of life, but between individuals the genetic variation differs only by 0.5%, an important 0.5% that accounts for all of human diversity, the pathology of different diseases, and ancestral story. Emerging strategies incorporate different methods, such as randomization algorithms and cryptographic approaches, to de-identify the genetic sequence from the individual, and fundamentally, isolate only the necessary information while protecting the rest of the genome from unnecessary inquiry. The priority now is to ascertain which methods are robust, and how policy should ensure the ongoing protection of genetic privacy.
Amanda M. Hulse-Kemp is a computational biologist with the United States Department of Agriculture – Agricultural Research Service. She works in the Genomics and Bioinformatics Research Unit and is stationed on the North Carolina State University campus in Raleigh, North Carolina.
Eftychia ("Effy") Vayena is a Greek and Swiss bioethicist. Since 2017 she has held the position of chair of bioethics at the Swiss Institute of Technology in Zurich, ETH Zurich. She is an elected member of the Swiss Academy of Medical Sciences.