Mutational signatures are characteristic combinations of mutation types arising from specific mutagenesis processes such as DNA replication infidelity, exogenous and endogenous genotoxin exposures, defective DNA repair pathways, and DNA enzymatic editing. [1]
The term is used for two distinct concepts, often conflated: mutagen signatures and tumor signatures. Its original use, mutagen signature, referred to a pattern of mutations made in the laboratory by a known mutagen and not made by other mutagens – unique to the mutagen as a human signature is unique to the signer. Uniqueness allows the mutagen to be deduced from a cell's mutations [2] Later, the phrase referred to a pattern of mutations characteristic of a tumor type, although usually not unique to the tumor type nor to a mutagen. [3] [4] If a tumor mutational signature matches a unique mutagen mutational signature, it is valid to deduce the carcinogen exposure or mutagenesis process that occurred in the patient's distant past. [2] Increasingly refined tumor signatures are becoming assignable to mutagen signatures. [5]
Deciphering mutational signatures in cancer provides insight into the biological mechanisms involved in carcinogenesis and normal somatic mutagenesis. [6] Mutational signatures have shown their applicability in cancer treatment and cancer prevention. Advances in the fields of oncogenomics have enabled the development and use of molecularly targeted therapy, but such therapies historically focused on inhibition of oncogenic drivers (e.g. EGFR gain-of-function mutation and EGFR inhibitor treatment in colorectal cancer [7] ). More recently, mutational signatures profiling has proven successful in guiding oncological management and use of targeted therapies (e.g. immunotherapy in mismatch repair deficient of diverse cancer types, [8] platinum and PARP inhibitor to exploit synthetic lethality in homologous recombination deficient breast cancer). [9]
The biological mutagenesis mechanisms underlying mutational signatures (e.g. COSMIC Signatures 1 to 30) include, but are not limited to: [lower-alpha 1] [4]
Cancer mutational signatures analyses require genomic data from cancer genome sequencing with paired-normal DNA sequencing in order to create the tumor mutation catalog (mutation types and counts) of a specific tumor. Different types of mutations (e.g. single nucleotide variants, indels, structural variants) can be used individually or in combination to model mutational signatures in cancer.
There are six classes of base substitution: C>A, C>G, C>T, T>A, T>C, T>G. The G>T substitution is considered equivalent to the C>A substitution because it is not possible to differentiate on which DNA strand (forward or reverse) the substitution initially occurred. Both the C>A and G>T substitutions are therefore counted as part of the "C>A" class. For the same reason the G>C, G>A, A>T, A>G and A>C mutations are counted as part of the "C>G", "C>T", "T>A", "T>C" and "T>G" classes respectively.
Taking the information from the 5' and 3' adjacent bases (also called flanking base pairs or trinucleotide context) lead to 96 possible mutation types (e.g. A[C>A]A, A[C>A]T, etc.). The mutation catalog of a tumor is created by categorizing each single nucleotide variant (SNV) (synonyms: base-pair substitution or substitution point mutation) in one of the 96 mutation types and counting the total number of substitutions for each of these 96 mutation types (see figure).
Once the mutation catalog (e.g. counts for each of the 96 mutation types) of a tumor is obtained, there are two approaches to decipher the contributions of different mutational signatures to tumor genomic landscape:
Identifying the contributions of diverse mutational signatures to carcinogenesis provides insight into tumor biology and can offer opportunities for targeted therapy.
Signature 3, seen in homologous recombination (HR) deficient tumour, is associated with increased burden of large indels (up to 50 nucleotides) with overlapping microhomology at the breakpoints. [4] In such tumors, DNA double-strand breaks are repaired by the imprecise repair mechanisms of non-homologous end joining (NHEJ) or microhomology-mediated end joining (MMEJ) instead of high fidelity HR repair.
Signature 6, seen in tumors with microsatellite instability, also features enrichment of 1bp indels in nucleotide repeat regions.
Homologous recombination deficiency leads to Signature 3 substitution pattern, but also to increase burden of structural variants. In the absence of homologous recombination, non-homologous end joining leads to large structural variants such as chromosomal translocations, chromosomal inversions and copy number variants.
A brief description of selected mutational processes and their associated mutational signatures in cancer will be included in the sections below. Some signatures are ubiquitous across diverse cancer types (e.g. Signature 1) while some others tend to associate with specific cancers (e.g. Signature 9 and lymphoid malignancies). [4]
Some mutational signatures feature strong transcriptional-bias with substitutions preferentially affecting one of the DNA strands, either the transcribed or untranscribed strand (Signatures 5, 7, 8, 10, 12, 16). [4]
Signature 1 features a predominance of C>T transition (genetics) in the Np[C>T]G trinucleotide contexts and correlates with the age of patient at time of cancer diagnosis. The underlying proposed biological mechanism is the spontaneous deamination of 5-methylcytosine. [4]
Signature 5 has a predominance of T>C substitutions in the ApTpN trinucleotide context with transcriptional strand bias. [6]
Signature 3 displays high mutation counts of multiple mutation classes and is associated with germline and somatic (biology) BRCA1 and BRCA2 mutations in several cancer types (e.g. breast, pancreatic, ovarian, prostate). This signature results from DNA double-strand break repair deficiency (or homologous recombination deficiency). Signature 3 is associated with high burden of indels with microhomology at the breakpoints. [6]
APOBEC3 family of cytidine deaminase enzymes respond to viral infections by editing viral genome, but the enzymatic activity of APOBEC3A and APOBEC3B has also been found to cause unwanted host genome editing and may even participate to oncogenesis in human papillomavirus-related cancers. [11]
Signature 2 and Signature 13 are enriched for C>T and C>G substitutions and are thought to arise from cytidine deaminase activity of the AID/APOBEC enzymes family. [6]
A germline deletion polymorphism involving APOBEC3A and APOBEC3B is associated with high burden of Signature 2 and Signature 13 mutations. [12] This polymorphism is considered to be of moderate penetrance (two-fold above background risk) for breast cancer risk. [13] The exact roles and mechanisms underlying APOBEC-mediated genome editing are not yet fully delineated, but activation-induced cytidine deaminase(AID)/APOBEC complex is thought to be involved in host immune response to viral infections and lipid metabolism. [14]
Both Signature 2 and Signature 13 feature cytosine to uracil substitutions due to cytidine deaminases. Signature 2 has a higher proportion of C[T>C]N substitutions and Signature 13 a higher proportion of T[C>G]N substitutions. APOBEC3A and APOBEC3B -mediated mutagenesis preferentially involve the lagging DNA strand during replication. [15]
Four COSMIC mutational signatures have been associated with DNA mismatch repair deficiency and found in tumors with microsatellite instability: Signature 6, 15, 20 and 26. [6] Loss of function MLH1 , MSH2 , MSH6 or PMS2 genes cause defective DNA mismatch repair.
Signature 10 has a transcriptional bias and is enriched for C>A substitutions in the TpCpT context as well as T>G substitutions in the TpTpTp context. [6] Signature 10 is associated with altered function of DNA polymerase epsilon, which result in deficient DNA proofreading activity. Both germline and somatic POLE (gene) exonuclease domain mutations are associated with Signature 10. [16]
Somatic enrichment for transversion mutations (G:C>T:A) has been associated with base excision repair (BER) deficiency and linked to defective MUTYH , a DNA glycosylase, in colorectal cancer. [17] Direct DNA oxidation damage leads to the creation of 8-Oxoguanine, which if remains un-repaired, will lead to incorporation of adenine instead of cytosine during DNA replication. MUTYH encodes the mutY adenine glycosylase enzyme which excise the mismatched adenine from 8-Oxoguanine:adenine base pairing, therefore enabling DNA repair mechanisms involving OGG1 (Oxoguanine glycosylase) and NUDT1 (Nudix hydrolase 1, also known as MTH1, MutT homolog 1) to remove the damaged 8-Oxoguanine. [18]
Selected exogenous genotoxins/carcinogens and their mutagen-induced DNA damage and repair mechanisms have been linked to specific molecular signatures.
Signature 9 has been identified in chronic lymphocytic leukemia and malignant B-cell lymphoma and feature enrichment for T>G transversion events. It is thought to result from error-prone polymerase η ( POLH gene)-associated mutagenesis. [4]
Recently, polymerase η error-prone synthesis signature has been linked to non-hematological cancers (e.g. skin cancer) and was hypothesized to contribute to YCG motif mutagenesis and could partly explain the increase TC dinucleotides substitutions. [25]
During the 1990s, Curtis Harris at the US National Cancer Institute and Bert Vogelstein at the Johns Hopkins Oncology Center in Baltimore reviewed data showing that different types of cancer had their own unique suite of mutations in p53, which were likely to have been caused by different agents, [3] [26] such as the chemicals in tobacco smoke or ultraviolet light from the sun. [19] [27] With the advent of next-generation sequencing, Michael Stratton saw the potential for the technology to revolutionize our understanding of the genetic changes inside individual tumors, setting the Wellcome Sanger Institute's huge banks of DNA-sequencing machines in motion to read every single letter of DNA in a tumor. [28] By 2009, Stratton and his team had produced the first whole cancer genome sequences. These were detailed maps showing all the genetic changes and mutations that had occurred within two individual cancers—a melanoma from the skin and a lung tumor. [29] [30] The melanoma and lung cancer genomes were powerful proof that the fingerprints of specific culprits could be seen in cancers with one major cause. These tumors still contained many mutations that could not be explained by ultraviolet light or tobacco smoking. The detective work became a lot more complicated for cancers with complex, multiple or even completely unknown origins. By way of analogy, imagine a forensic scientist dusting for fingerprints at a murder scene. The forensic scientist might strike it lucky and find a set of perfect prints on a windowpane or door handle that match a known killer. However, they are much more likely to uncover a mish-mash of fingerprints belonging to a whole range of folk—from the victim and potential suspects to innocent parties and police investigators—all laid on top of each other on all sorts of surfaces. [28] This is very similar to cancer genomes where multiple mutational patterns are commonly overlaid one over another making the data incomprehensible. Fortunately, a PhD student of Stratton's, Ludmil Alexandrov came up with a way of mathematically solving the problem. Alexandrov demonstrated that mutational patterns from individual mutagens found in a tumor can be distinguished from one another using a mathematical approach called blind source separation. The newly disentangled patterns of mutations were termed mutational signatures. [28] In 2013, Alexandrov and Stratton published the first computational framework for deciphering mutational signatures from cancer genomics data. [31] Subsequently, they applied this framework to more than seven thousand cancer genomes creating the first comprehensive map of mutational signatures in human cancer. [32] Currently, more than one hundred mutational signatures have been identified across the repertoire of human cancer. [33] In April 2022 58 new mutational signatures were described. [34] [35] [36]
Mutagenesis is a process by which the genetic information of an organism is changed by the production of a mutation. It may occur spontaneously in nature, or as a result of exposure to mutagens. It can also be achieved experimentally using laboratory procedures. A mutagen is a mutation-causing agent, be it chemical or physical, which results in an increased rate of mutations in an organism's genetic code. In nature mutagenesis can lead to cancer and various heritable diseases, and it is also a driving force of evolution. Mutagenesis as a science was developed based on work done by Hermann Muller, Charlotte Auerbach and J. M. Robson in the first half of the 20th century.
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis, or meiosis or other types of damage to DNA, which then may undergo error-prone repair, cause an error during other forms of repair, or cause an error during replication. Mutations may also result from insertion or deletion of segments of DNA due to mobile genetic elements.
Deamination is the removal of an amino group from a molecule. Enzymes that catalyse this reaction are called deaminases.
DNA repair is a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encodes its genome. In human cells, both normal metabolic activities and environmental factors such as radiation can cause DNA damage, resulting in tens of thousands of individual molecular lesions per cell per day. Many of these lesions cause structural damage to the DNA molecule and can alter or eliminate the cell's ability to transcribe the gene that the affected DNA encodes. Other lesions induce potentially harmful mutations in the cell's genome, which affect the survival of its daughter cells after it undergoes mitosis. As a consequence, the DNA repair process is constantly active as it responds to damage in the DNA structure. When normal repair processes fail, and when cellular apoptosis does not occur, irreparable DNA damage may occur. This can eventually lead to malignant tumors, or cancer as per the two-hit hypothesis.
A germline mutation, or germinal mutation, is any detectable variation within germ cells. Mutations in these cells are the only mutations that can be passed on to offspring, when either a mutated sperm or oocyte come together to form a zygote. After this fertilization event occurs, germ cells divide rapidly to produce all of the cells in the body, causing this mutation to be present in every somatic and germline cell in the offspring; this is also known as a constitutional mutation. Germline mutation is distinct from somatic mutation.
DNA mismatch repair (MMR) is a system for recognizing and repairing erroneous insertion, deletion, and mis-incorporation of bases that can arise during DNA replication and recombination, as well as repairing some forms of DNA damage.
Activation-induced cytidine deaminase, also known as AICDA, AID and single-stranded DNA cytosine deaminase, is a 24 kDa enzyme which in humans is encoded by the AICDA gene. It creates mutations in DNA by deamination of cytosine base, which turns it into uracil. In other words, it changes a C:G base pair into a U:G mismatch. The cell's DNA replication machinery recognizes the U as a T, and hence C:G is converted to a T:A base pair. During germinal center development of B lymphocytes, error-prone DNA repair following AID action also generates other types of mutations, such as C:G to A:T. AID is a member of the APOBEC family.
Mitotic recombination is a type of genetic recombination that may occur in somatic cells during their preparation for mitosis in both sexual and asexual organisms. In asexual organisms, the study of mitotic recombination is one way to understand genetic linkage because it is the only source of recombination within an individual. Additionally, mitotic recombination can result in the expression of recessive alleles in an otherwise heterozygous individual. This expression has important implications for the study of tumorigenesis and lethal recessive alleles. Mitotic homologous recombination occurs mainly between sister chromatids subsequent to replication. Inter-sister homologous recombination is ordinarily genetically silent. During mitosis the incidence of recombination between non-sister homologous chromatids is only about 1% of that between sister chromatids.
Oncogenomics is a sub-field of genomics that characterizes cancer-associated genes. It focuses on genomic, epigenomic and transcript alterations in cancer.
Serine/threonine-protein kinase ATR, also known as ataxia telangiectasia and Rad3-related protein (ATR) or FRAP-related protein 1 (FRP1), is an enzyme that, in humans, is encoded by the ATR gene. It is a large kinase of about 301.66 kDa. ATR belongs to the phosphatidylinositol 3-kinase-related kinase protein family. ATR is activated in response to single strand breaks, and works with ATM to ensure genome integrity.
Missense mRNA is a messenger RNA bearing one or more mutated codons that yield polypeptides with an amino acid sequence different from the wild-type or naturally occurring polypeptide. Missense mRNA molecules are created when template DNA strands or the mRNA strands themselves undergo a missense mutation in which a protein coding sequence is mutated and an altered amino acid sequence is coded for.
ERCC4 is a protein designated as DNA repair endonuclease XPF that in humans is encoded by the ERCC4 gene. Together with ERCC1, ERCC4 forms the ERCC1-XPF enzyme complex that participates in DNA repair and DNA recombination.
Probable DNA dC->dU-editing enzyme APOBEC-3B is a protein that in humans is encoded by the APOBEC3B gene.
APOBEC is a family of evolutionarily conserved cytidine deaminases.
Somatic hypermutation is a cellular mechanism by which the immune system adapts to the new foreign elements that confront it. A major component of the process of affinity maturation, SHM diversifies B cell receptors used to recognize foreign elements (antigens) and allows the immune system to adapt its response to new threats during the lifetime of an organism. Somatic hypermutation involves a programmed process of mutation affecting the variable regions of immunoglobulin genes. Unlike germline mutation, SHM affects only an organism's individual immune cells, and the mutations are not transmitted to the organism's offspring. Because this mechanism is merely selective and not precisely targeted, somatic hypermutation has been strongly implicated in the development of B-cell lymphomas and many other cancers.
Genome instability refers to a high frequency of mutations within the genome of a cellular lineage. These mutations can include changes in nucleic acid sequences, chromosomal rearrangements or aneuploidy. Genome instability does occur in bacteria. In multicellular organisms genome instability is central to carcinogenesis, and in humans it is also a factor in some neurodegenerative diseases such as amyotrophic lateral sclerosis or the neuromuscular disease myotonic dystrophy.
In molecular biology, kataegis describes a pattern of localized hypermutations identified in some cancer genomes, in which a large number of highly patterned basepair mutations occur in a small region of DNA. The mutational clusters are usually several hundred basepairs long, alternating between a long range of C→T substitutional pattern and a long range of G→A substitutional pattern. This suggests that kataegis is carried out on only one of the two template strands of DNA during replication. Compared to other cancer-related mutations, such as chromothripsis, kataegis is more commonly seen; it is not an accumulative process but likely happens during one cycle of replication.
In molecular biology, mutagenesis is an important laboratory technique whereby DNA mutations are deliberately engineered to produce libraries of mutant genes, proteins, strains of bacteria, or other genetically modified organisms. The various constituents of a gene, as well as its regulatory elements and its gene products, may be mutated so that the functioning of a genetic locus, process, or product can be examined in detail. The mutation may produce mutant proteins with interesting properties or enhanced or novel functions that may be of commercial use. Mutant strains may also be produced that have practical application or allow the molecular basis of a particular cell function to be investigated.
Antimutagens are the agents that interfere with the mutagenicity of a substance. The interference can be in the form of prevention of the transformation of a promutagenic compound into actual active mutagen, inactivation, or otherwise the prevention of Mutagen-DNA reaction.
Illegitimate recombination, or nonhomologous recombination, is the process by which two unrelated double stranded segments of DNA are joined. This insertion of genetic material which is not meant to be adjacent tends to lead to genes being broken causing the protein which they encode to not be properly expressed. One of the primary pathways by which this will occur is the repair mechanism known as non-homologous end joining (NHEJ).