A gene is said to be polymorphic if more than one allele occupies that gene's locus within a population. [1] In addition to having more than one allele at a specific locus, each allele must also occur in the population at a rate of at least 1% to generally be considered polymorphic. [2]
Gene polymorphisms can occur in any region of the genome. The majority of polymorphisms are silent, meaning they do not alter the function or expression of a gene. [3] Some polymorphisms are visible. For example, in dogs the E locus can have any of five different alleles, known as E, Em, Eg, Eh, and e. [4] Varying combinations of these alleles contribute to the pigmentation and patterns seen in dog coats. [5]
A polymorphic variant of a gene can lead to the abnormal expression or to the production of an abnormal form of the protein; this abnormality may cause or be associated with disease. For example, a polymorphic variant of the gene encoding the enzyme CYP4A11, in which thymidine replaces cytosine at the gene's nucleotide 8590 position encodes a CYP4A11 protein that substitutes phenylalanine with serine at the protein's amino acid position 434. [6] This variant protein has reduced enzyme activity in metabolizing arachidonic acid to the blood pressure-regulating eicosanoid, 20-hydroxyeicosatetraenoic acid. A study has shown that humans bearing this variant in one or both of their CYP4A11 genes have an increased incidence of hypertension, ischemic stroke, and coronary artery disease. [6]
Most notably, the genes coding for the major histocompatibility complex (MHC) are in fact the most polymorphic genes known. MHC molecules are involved in the immune system and interact with T-cells. There are more than 32,000 different alleles of human MHC class I and II genes, and it has been estimated that there are 200 variants at the HLA-B HLA-DRB1 loci alone. [7]
Some polymorphism may be maintained by balancing selection.
A rule of thumb that is sometimes used is to classify genetic variants that occur below 1% allele frequency as mutations rather than polymorphisms. [8] However, since polymorphisms may occur at low allele frequency, this is not a reliable way to tell new mutations from polymorphisms. [9] A mutation is a change to an inherited genetic sequence.
In the case of silent mutations there isn't a change in fitness, and the pressures responsible for Hardy-Weinberg equilibrium have no impact on the accumulation of silent polymorphisms over time. Most often, a polymorphism is variation in a single nucleotide (SNP), but also can be insertion or deletion of one or more nucleotides, changes in the number of times a short or longer sequence is repeated (both of these are common in parts of DNA that don't directly code for a protein, as are SNPs, but can have major effects on gene expression). [11] [12] Polymorphisms which result in a change in fitness are the grist for the mill of evolution by natural selection. All genetic polymorphisms start out as a mutation, but only if they are germline and are not lethal can they spread into a population. Polymorphisms are classified based on what happens at the level of the individual mutation in the DNA sequence (or RNA sequence in the case of RNA viruses), and what effect the mutation has on the phenotype (i.e. silent or resulting in some change in function or change in fitness). Polymorphisms are also classified based on whether the change is in the sequence of the resulting protein or in the regulation of the expression of the gene, which can occur at sites that are typically upstream and adjacent to the gene, but not always. [13] [11]
Polymorphisms can be identified in the laboratory using a variety of methods. Many methods employ PCR to amplify the sequence of a gene. Once amplified, polymorphisms and mutations in the sequence can be detected by DNA sequencing, either directly or after screening for variation with a method such as single strand conformation polymorphism analysis. [14]
A polymorphism can be any sequence difference. Examples include:
Many different human disease result from polymorphisms. Polymorphisms also play significant role as risk factors for development of disease. [19] Finally, polymorphisms in drug metabolism, esp. cytochrome p450 isoenzymes, proteins involved in drug transport (whether into the body, into protected areas of the body like the brain, or secreted out) as well as in specific cell surface receptor proteins alter the effect of various drugs. [13] This is a rapidly evolving area of drug safety research. [20] [21] Resources such as HapMap, DbSNP,Ensembl, DNA Data Bank of Japan, DrugBank, Kyoto Encyclopedia of Genes and Genomes (KEGG), GenBank, and other parts of the International Nucleotide Sequence Database Collaboration have become crucial in Personalized medicine, bioinformatics, and pharmacogenomics. [22]
Polymorphisms have been discovered in multiple XPD exons. XPD refers to "xeroderma pigmentosum group D" and is involved in a DNA repair mechanism used during DNA replication. XPD works by cutting and removing segments of DNA that have been damaged due to things such as cigarette smoking and inhalation of other environmental carcinogens. [23] Asp312Asn and Lys751Gln are the two common polymorphisms of XPD that result in a change in a single amino acid. [24] This variation in Asn and Gln alleles has been related to individuals having a reduced DNA repair efficiency. [25] Several studies have been conducted to see if this diminished capacity to repair DNA is related to an increased risk of lung cancer. These studies examined the XPD gene in lung cancer patients of varying age, gender, race, and pack-years. The studies provided mixed results, from concluding individuals who are homozygous for the Asn allele or homozygous for the Gln allele had an increased risk of developing lung cancer, [26] to finding no statistical significance between smokers who have either allele polymorphism and their susceptibility to lung cancer. [27] Research continues to be conducted to determine the relationship between XPD polymorphisms and lung cancer risk.
As a cornerstone of Peronalized medicine cancers, Sequence analysis is becoming increasingly important to understand the specific mutations involved in the individual's cancer, such as needed to select specific molecular targets such as mutations in various receptors, but also understanding the polymorphisms they inherited which play important roles in diagnosis, prognosis, and treatment, such as treatment of leukemia with 6-mercaptopurine where toxicity largely depends on polymorphisms in multiple different genes involved in its metabolism. [28]
Asthma is an inflammatory disease of the lungs and more than 100 loci have been identified as contributing to the development and severity of the condition. [29] By using the traditional linkage analysis, these asthma correlated genes were able to be identified in small quantities using genome-wide association studies (GWAS). There have been a number of studies looking into various polymorphisms of asthma-associated genes and how those polymorphisms interact with the carrier's environment. One example is the gene CD14, which is known to have a polymorphism that is associated with increased amounts of CD14 protein as well as reduced levels of IgE serum. [30] A study was conducted on 624 children looking at their IgE serum levels as it related to the polymorphism in CD14. The study found that IgE serum levels differed in children with the C allele in the CD14/-260 gene based on the type of allergens they regularly exposed to. [31] Children who were in regular contact with house pets showed higher serum levels of IgE while children who were regularly exposed to stable animals showed lower serum levels of IgE. [31] Continued research into gene-environment interactions may lead to more specialized treatment plans based on an individual's surroundings.
An allele, or allelomorph, is a variant of the sequence of nucleotides at a particular location, or locus, on a DNA molecule.
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis, or meiosis or other types of damage to DNA, which then may undergo error-prone repair, cause an error during other forms of repair, or cause an error during replication. Mutations may also result from insertion or deletion of segments of DNA due to mobile genetic elements.
In molecular biology, restriction fragment length polymorphism (RFLP) is a technique that exploits variations in homologous DNA sequences, known as polymorphisms, populations, or species or to pinpoint the locations of genes within a sequence. The term may refer to a polymorphism itself, as detected through the differing locations of restriction enzyme sites, or to a related laboratory technique by which such differences can be illustrated. In RFLP analysis, a DNA sample is digested into fragments by one or more restriction enzymes, and the resulting restriction fragments are then separated by gel electrophoresis according to their size.
The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.
A microsatellite is a tract of repetitive DNA in which certain DNA motifs are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. They have a higher mutation rate than other areas of DNA leading to high genetic diversity. Microsatellites are often referred to as short tandem repeats (STRs) by forensic geneticists and in genetic genealogy, or as simple sequence repeats (SSRs) by plant geneticists.
In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population, many publications do not apply such a frequency threshold.
A frameshift mutation is a genetic mutation caused by indels of a number of nucleotides in a DNA sequence that is not divisible by three. Due to the triplet nature of gene expression by codons, the insertion or deletion can change the reading frame, resulting in a completely different translation from the original. The earlier in the sequence the deletion or insertion occurs, the more altered the protein. A frameshift mutation is not the same as a single-nucleotide polymorphism in which a nucleotide is replaced, rather than inserted or deleted. A frameshift mutation will in general cause the reading of the codons after the mutation to code for different amino acids. The frameshift mutation will also alter the first stop codon encountered in the sequence. The polypeptide being created could be abnormally short or abnormally long, and will most likely not be functional.
The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease and responses to drugs and environmental factors. The information produced by the project is made freely available for research.
A genetic marker is a gene or DNA sequence with a known location on a chromosome that can be used to identify individuals or species. It can be described as a variation that can be observed. A genetic marker may be a short DNA sequence, such as a sequence surrounding a single base-pair change, or a long one, like minisatellites.
Nucleotide excision repair is a DNA repair mechanism. DNA damage occurs constantly because of chemicals, radiation and other mutagens. Three excision repair pathways exist to repair single stranded DNA damage: Nucleotide excision repair (NER), base excision repair (BER), and DNA mismatch repair (MMR). While the BER pathway can recognize specific non-bulky lesions in DNA, it can correct only damaged bases that are removed by specific glycosylases. Similarly, the MMR pathway only targets mismatched Watson-Crick base pairs.
In population genetics, an ancestry-informative marker (AIM) is a single-nucleotide polymorphism that exhibits substantially different frequencies between different populations. A set of many AIMs can be used to estimate the proportion of ancestry of an individual derived from each population.
Genotyping is the process of determining differences in the genetic make-up (genotype) of an individual by examining the individual's DNA sequence using biological assays and comparing it to another individual's sequence or a reference sequence. It reveals the alleles an individual has inherited from their parents. Traditionally genotyping is the use of DNA sequences to define biological populations by use of molecular tools. It does not usually involve defining the genes of an individual.
Human genetic variation is the genetic differences in and among populations. There may be multiple variants of any given gene in the human population (alleles), a situation called polymorphism.
In molecular biology, SNP array is a type of DNA microarray which is used to detect polymorphisms within a population. A single nucleotide polymorphism (SNP), a variation at a single site in DNA, is the most frequent type of variation in the genome. Around 335 million SNPs have been identified in the human genome, 15 million of which are present at frequencies of 1% or higher across different populations worldwide.
Deoxyribonuclease I, is an endonuclease of the DNase family coded by the human gene DNASE1. DNase I is a nuclease that cleaves DNA preferentially at phosphodiester linkages adjacent to a pyrimidine nucleotide, yielding 5'-phosphate-terminated polynucleotides with a free hydroxyl group on position 3', on average producing tetranucleotides. It acts on single-stranded DNA, double-stranded DNA, and chromatin. In addition to its role as a waste-management endonuclease, it has been suggested to be one of the deoxyribonucleases responsible for DNA fragmentation during apoptosis.
SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms (SNPs) between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation. An SNP is a single base pair mutation at a specific locus, usually consisting of two alleles. SNPs are found to be involved in the etiology of many human diseases and are becoming of particular interest in pharmacogenetics. Because SNPs are conserved during evolution, they have been proposed as markers for use in quantitative trait loci (QTL) analysis and in association studies in place of microsatellites. The use of SNPs is being extended in the HapMap project, which aims to provide the minimal set of SNPs needed to genotype the human genome. SNPs can also provide a genetic fingerprint for use in identity testing. The increase of interest in SNPs has been reflected by the furious development of a diverse range of SNP genotyping methods.
An allele-specific oligonucleotide (ASO) is a short piece of synthetic DNA complementary to the sequence of a variable target DNA. It acts as a probe for the presence of the target in a Southern blot assay or, more commonly, in the simpler dot blot assay. It is a common tool used in genetic testing, forensics, and molecular biology research.
Disease gene identification is a process by which scientists identify the mutant genotypes responsible for an inherited genetic disorder. Mutations in these genes can include single nucleotide substitutions, single nucleotide additions/deletions, deletion of the entire gene, and other genetic abnormalities.
Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.
Personalized genomics is the human genetics-derived study of analyzing and interpreting individualized genetic information by genome sequencing to identify genetic variations compared to the library of known sequences. International genetics communities have spared no effort from the past and have gradually cooperated to prosecute research projects to determine DNA sequences of the human genome using DNA sequencing techniques. The methods that are the most commonly used are whole exome sequencing and whole genome sequencing. Both approaches are used to identify genetic variations. Genome sequencing became more cost-effective over time, and made it applicable in the medical field, allowing scientists to understand which genes are attributed to specific diseases.