In genetics, a fusion gene is a hybrid gene formed from two previously independent genes. It can occur as a result of translocation, interstitial deletion, or chromosomal inversion. Fusion genes have been found to be prevalent in all main types of human neoplasia. [1] The identification of these fusion genes play a prominent role in being a diagnostic and prognostic marker. [2]
The first fusion gene [1] was described in cancer cells in the early 1980s. The finding was based on the discovery in 1960 by Peter Nowell and David Hungerford in Philadelphia of a small abnormal marker chromosome in patients with chronic myeloid leukemia—the first consistent chromosome abnormality detected in a human malignancy, later designated the Philadelphia chromosome. [3] In 1973, Janet Rowley in Chicago showed that the Philadelphia chromosome had originated through a translocation between chromosomes 9 and 22, and not through a simple deletion of chromosome 22 as was previously thought. Several investigators in the early 1980s showed that the Philadelphia chromosome translocation led to the formation of a new BCR::ABL1 fusion gene, composed of the 3' part of the ABL1 gene in the breakpoint on chromosome 9 and the 5' part of a gene called BCR in the breakpoint in chromosome 22. In 1985 it was clearly established that the fusion gene on chromosome 22 produced an abnormal chimeric BCR::ABL1 protein with the capacity to induce chronic myeloid leukemia.
It has been known for 30 years that the corresponding gene fusion plays an important role in tumorigenesis. [4] Fusion genes can contribute to tumor formation because fusion genes can produce much more active abnormal protein than non-fusion genes. Often, fusion genes are oncogenes that cause cancer; these include BCR-ABL, [5] TEL-AML1 (ALL with t(12 ; 21)), AML1-ETO (M2 AML with t(8 ; 21)), and TMPRSS2-ERG with an interstitial deletion on chromosome 21, often occurring in prostate cancer. [6] In the case of TMPRSS2-ERG, by disrupting androgen receptor (AR) signaling and inhibiting AR expression by oncogenic ETS transcription factor, the fusion product regulates the prostate cancer. [7] Most fusion genes are found from hematological cancers, sarcomas, and prostate cancer. [1] [8] BCAM-AKT2 is a fusion gene that is specific and unique to high-grade serous ovarian cancer. [9]
Oncogenic fusion genes may lead to a gene product with a new or different function from the two fusion partners. Alternatively, a proto-oncogene is fused to a strong promoter, and thereby the oncogenic function is set to function by an upregulation caused by the strong promoter of the upstream fusion partner. The latter is common in lymphomas, where oncogenes are juxtaposed to the promoters of the immunoglobulin genes. [10] Oncogenic fusion transcripts may also be caused by trans-splicing or read-through events. [11]
Since chromosomal translocations play such a significant role in neoplasia, a specialized database of chromosomal aberrations and gene fusions in cancer has been created. This database is called Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer. [12]
Presence of certain chromosomal aberrations and their resulting fusion genes is commonly used within cancer diagnostics in order to set a precise diagnosis. Chromosome banding analysis, fluorescence in situ hybridization (FISH), and reverse transcription polymerase chain reaction (RT-PCR) are common methods employed at diagnostic laboratories. These methods all have their distinct shortcomings due to the very complex nature of cancer genomes. Recent developments such as high-throughput sequencing [13] and custom DNA microarrays bear promise of introduction of more efficient methods. [14]
Gene fusion plays a key role in the evolution of gene architecture. We can observe its effect if gene fusion occurs in coding sequences. [15] Duplication, sequence divergence, and recombination are the major contributors at work in gene evolution. [16] These events can probably produce new genes from already existing parts. When gene fusion happens in non-coding sequence region, it can lead to the misregulation of the expression of a gene now under the control of the cis-regulatory sequence of another gene. If it happens in coding sequences, gene fusion cause the assembly of a new gene, then it allows the appearance of new functions by adding peptide modules into a multi-domain protein. [15] The detecting methods to inventory gene fusion events on a large biological scale can provide insights about the multi modular architecture of proteins. [17] [18] [19]
The purines adenine and guanine are two of the four information encoding bases of the universal genetic code. Biosynthesis of these purines occurs by similar, but not identical, pathways in different species of the three domains of life, the Archaea, Bacteria and Eukaryotes. A major distinctive feature of the purine biosynthetic pathways in Bacteria is the prevalence of gene fusions where two or more purine biosynthetic enzymes are encoded by a single gene. [20] Such gene fusions are almost exclusively between genes that encode enzymes that perform sequential steps in the biosynthetic pathway. Eukaryotic species generally exhibit the most common gene fusions seen in the Bacteria, but in addition have new fusions that potentially increase metabolic flux.
In recent years, next generation sequencing technology has already become available to screen known and novel gene fusion events on a genome wide scale. However, the precondition for large scale detection is a paired-end sequencing of the cell's transcriptome. The direction of fusion gene detection is mainly towards data analysis and visualization. Some researchers already developed a new tool called Transcriptome Viewer (TViewer) to directly visualize detected gene fusions on the transcript level. [21]
Biologists may also deliberately create fusion genes for research purposes. The fusion of reporter genes to the regulatory elements of genes of interest allows researches to study gene expression. Reporter gene fusions can be used to measure activity levels of gene regulators, identify the regulatory sites of genes (including the signals required), identify various genes that are regulated in response to the same stimulus, and artificially control the expression of desired genes in particular cells. [22] For example, by creating a fusion gene of a protein of interest and green fluorescent protein, the protein of interest may be observed in cells or tissue using fluorescence microscopy. [23] The protein synthesized when a fusion gene is expressed is called a fusion protein .
An oncogene is a gene that has the potential to cause cancer. In tumor cells, these genes are often mutated, or expressed at high levels.
The Philadelphia chromosome or Philadelphia translocation (Ph) is a specific genetic abnormality in chromosome 22 of leukemia cancer cells. This chromosome is defective and unusually short because of reciprocal translocation, t(9;22)(q34;q11), of genetic material between chromosome 9 and chromosome 22, and contains a fusion gene called BCR-ABL1. This gene is the ABL1 gene of chromosome 9 juxtaposed onto the breakpoint cluster region BCR gene of chromosome 22, coding for a hybrid protein: a tyrosine kinase signaling protein that is "always on", causing the cell to divide uncontrollably by interrupting the stability of the genome and impairing various signaling pathways governing the cell cycle.
Chronic myelogenous leukemia (CML), also known as chronic myeloid leukemia, is a cancer of the white blood cells. It is a form of leukemia characterized by the increased and unregulated growth of myeloid cells in the bone marrow and the accumulation of these cells in the blood. CML is a clonal bone marrow stem cell disorder in which a proliferation of mature granulocytes and their precursors is found; characteristic increase in basophils is clinically relevant. It is a type of myeloproliferative neoplasm associated with a characteristic chromosomal translocation called the Philadelphia chromosome.
Tyrosine-protein kinase ABL1 also known as ABL1 is a protein that, in humans, is encoded by the ABL1 gene located on chromosome 9. c-Abl is sometimes used to refer to the version of the gene found within the mammalian genome, while v-Abl refers to the viral gene, which was initially isolated from the Abelson murine leukemia virus.
The breakpoint cluster region protein (BCR) also known as renal carcinoma antigen NY-REN-26 is a protein that in humans is encoded by the BCR gene. BCR is one of the two genes in the BCR-ABL fusion protein, which is associated with the Philadelphia chromosome. Two transcript variants encoding different isoforms have been found for this gene.
The gag-onc fusion protein is a general term for a fusion protein formed from a group-specific antigen ('gag') gene and that of an oncogene ('onc'), a gene that plays a role in the development of a cancer. The name is also written as Gag-v-Onc, with "v" indicating that the Onc sequence resides in a viral genome. Onc is a generic placeholder for a given specific oncogene, such as C-jun..
Fusion proteins or chimeric (kī-ˈmir-ik) proteins are proteins created through the joining of two or more genes that originally coded for separate proteins. Translation of this fusion gene results in a single or multiple polypeptides with functional properties derived from each of the original proteins. Recombinant fusion proteins are created artificially by recombinant DNA technology for use in biological research or therapeutics. Chimeric or chimera usually designate hybrid proteins made of polypeptides having different functions or physico-chemical patterns. Chimeric mutant proteins occur naturally when a complex mutation, such as a chromosomal translocation, tandem duplication, or retrotransposition creates a novel coding sequence containing parts of the coding sequences from two different genes. Naturally occurring fusion proteins are commonly found in cancer cells, where they may function as oncoproteins. The bcr-abl fusion protein is a well-known example of an oncogenic fusion protein, and is considered to be the primary oncogenic driver of chronic myelogenous leukemia.
ETV6 protein is a transcription factor that in humans is encoded by the ETV6 gene. The ETV6 protein regulates the development and growth of diverse cell types, particularly those of hematological tissues. However, its gene, ETV6 frequently suffers various mutations that lead to an array of potentially lethal cancers, i.e., ETV6 is a clinically significant proto-oncogene in that it can fuse with other genes to drive the development and/or progression of certain cancers. However, ETV6 is also an anti-oncogene or tumor suppressor gene in that mutations in it that encode for a truncated and therefore inactive protein are also associated with certain types of cancers.
RNA-binding protein EWS is a protein that in humans is encoded by the EWSR1 gene on human chromosome 22, specifically 22q12.2. It is one of 3 proteins in the FET protein family.
ERG is an oncogene. ERG is a member of the ETS family of transcription factors. The ERG gene encodes for a protein, also called ERG, that functions as a transcriptional regulator. Genes in the ETS family regulate embryonic development, cell proliferation, differentiation, angiogenesis, inflammation, and apoptosis.
Protein SSX2 is a protein that in humans is encoded by the SSX2 gene.
Felix Mitelman is a Swedish geneticist and is professor of clinical genetics in Lund, Sweden. He is best known for his pioneering work on chromosome changes in cancer.
Pvt1 oncogene, also known as PVT1 or Plasmacytoma Variant Translocation 1 is a long non-coding RNA gene. In mice, this gene was identified as a breakpoint site in chromosome 6;15 translocations. These translocations are associated with murine plasmacytomas. The equivalent translocation in humans is t(2;8), which is associated with a rare variant of Burkitt's lymphoma. In rats, this breakpoint was shown to be a common site of proviral integration in retrovirally induced T lymphomas. Transcription of PVT1 is regulated by Myc.
Arul M. Chinnaiyan is a Hicks Endowed Professor of Pathology and professor of pathology and urology at the University of Michigan Medical School. He is also a Howard Hughes medical Investigator (HHMI) at the Howard Hughes Medical Institute.
Solute carrier family 45 member 3 (SLC45A3), also known as prostate cancer-associated protein 6 or prostein, is a protein that in humans is encoded by the SLC45A3 gene.
Chimeric RNA, sometimes referred to as a fusion transcript, is composed of exons from two or more different genes that have the potential to encode novel proteins. These mRNAs are different from those produced by conventional splicing as they are produced by two or more gene loci.
Chromoplexy refers to a class of complex DNA rearrangement observed in the genomes of cancer cells. This phenomenon was first identified in prostate cancer by whole genome sequencing of prostate tumors. Chromoplexy causes genetic material from one or more chromosomes to become scrambled as multiple strands of DNA are broken and ligated to each other in a new configuration. In prostate cancer, chromoplexy may cause multiple oncogenic events within a single cell cycle, providing a proliferative advantage to a (pre-)cancerous cell. Several oncogenic mutations in prostate cancer occur through chromoplexy, such as disruption of the tumor suppressor gene PTEN or creation of the TMPRSS2-ERG fusion gene.
The Cancer Genome Anatomy Project (CGAP), created by the National Cancer Institute (NCI) in 1997 and introduced by Al Gore, is an online database on normal, pre-cancerous and cancerous genomes. It also provides tools for viewing and analysis of the data, allowing for identification of genes involved in various aspects of tumor progression. The goal of CGAP is to characterize cancer at a molecular level by providing a platform with readily accessible updated data and a set of tools such that researchers can easily relate their findings to existing knowledge. There is also a focus on development of software tools that improve the usage of large and complex datasets. The project is directed by Daniela S. Gerhard, and includes sub-projects or initiatives, with notable ones including the Cancer Chromosome Aberration Project (CCAP) and the Genetic Annotation Initiative (GAI). CGAP contributes to many databases and organisations such as the NCBI contribute to CGAP's databases.
Clonal hypereosinophilia, also termed primary hypereosinophilia or clonal eosinophilia, is a grouping of hematological disorders all of which are characterized by the development and growth of a pre-malignant or malignant population of eosinophils, a type of white blood cell that occupies the bone marrow, blood, and other tissues. This population consists of a clone of eosinophils, i.e. a group of genetically identical eosinophils derived from a sufficiently mutated ancestor cell.
EWS/FLI1 is an oncogenic protein that is pathognomonic for Ewing sarcoma. It is found in approximately 90% of all Ewing sarcoma tumors with the remaining 10% of fusions substituting one fusion partner with a closely related family member.