Conjoined gene

Last updated

A conjoined gene (CG) is defined as a gene, which gives rise to transcripts by combining at least part of one exon from each of two or more distinct known (parent) genes which lie on the same chromosome, are in the same orientation, and often (95%) translate independently into different proteins. In some cases, the transcripts formed by CGs are translated to form chimeric or completely novel proteins.

CGDef 1.png
Cartoonic representation of the formation of conjoined gene A-B from parent genes A and B.

Several alternative names are used to address conjoined genes, including combined gene and complex gene, [1] fusion gene, fusion protein, read-through transcript, co-transcribed genes, bridged genes, spanning genes, hybrid genes, locus-spanning transcripts, etc.

At present, 800 CGs have been identified in the entire human genome by different research groups across the world including Prakash et al., [2] Akiva et al., [3] Parra et al., [4] Kim et al., [5] and in the 1% of the human genome in the ENCODE pilot project. [6] 36% of all these CGs could be validated experimentally using RT-PCR and sequencing techniques. However, only a very limited number of these CGs are found in the public human genome resources such as the Entrez Gene database, the UCSC Genome Browser and the Vertebrate Genome Annotation (Vega) database. More than 70% of the human conjoined genes are found to be conserved across other vertebrate genomes with higher order vertebrates showing more conservation, including the closest human ancestor, chimpanzee. Formation of CGs is not only limited to the human genome but some CGs have also been identified in other eukaryotic genomes, including mouse and drosophila. There are a few web resources which include information about some CGs in addition to the other fusion genes, for example, ChimerDB and HYBRIDdb. Another database, ConjoinG, is a comprehensive resource dedicated only to the 800 Conjoined Genes identified in the entire human genome.

See also

Related Research Articles

Exon Gene portion that is not removed during RNA splicing and becomes part of mature mRNA

An exon is any part of a gene that will encode a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term exon refers to both the DNA sequence within a gene and to the corresponding sequence in RNA transcripts. In RNA splicing, introns are removed and exons are covalently joined to one another as part of generating the mature messenger RNA. Just as the entire set of genes for a species constitutes the genome, the entire set of exons constitutes the exome.

Alternative splicing Process by which a single gene can code for multiple proteins

Alternative splicing, or alternative RNA splicing, or differential splicing, is an alternative splicing process during gene expression that allows a single gene to code for multiple proteins. In this process, particular exons of a gene may be included within or excluded from the final, processed messenger RNA (mRNA) produced from that gene. This means the exons are joined in different combinations, leading to different (alternative) mRNA strands. Consequently, the proteins translated from alternatively spliced mRNAs will contain differences in their amino acid sequence and, often, in their biological functions. Notably, alternative splicing allows the human genome to direct the synthesis of many more proteins than would be expected from its 20,000 protein-coding genes.

Trans-splicing is a special form of RNA processing where exons from two different primary RNA transcripts are joined end to end and ligated. It is usually found in eukaryotes and mediated by the spliceosome, although some bacteria and archaea also have "half-genes" for tRNAs.

ALYREF

Aly/REF export factor, also known as THO complex subunit 4 is a protein that in humans is encoded by the ALYREF gene.

KLF12

Krueppel-like factor 12 is a protein that in humans is encoded by the KLF12 gene.

BBS5

Bardet–Biedl syndrome 5 protein is a protein that in humans is encoded by the BBS5 gene.

RBM9

RNA binding motif protein 9 (RBM9), also known as Rbfox2, is a protein which in humans is encoded by the RBM9 gene.

SSR4

Translocon-associated protein subunit delta is a protein that in humans is encoded by the SSR4 gene.

LHX6

LIM/homeobox protein Lhx6 is a protein that in humans is encoded by the LHX6 gene.

Kua-UEV

Ubiquitin-conjugating enzyme E2 variant 1, also known as Kua-UEV, is a human gene.

ZNF649

Zinc finger protein 649 is a protein that in humans is encoded by the ZNF649 gene on Human Chromozone 19 containing 5 exons.

GRINL1A

GRINL1A complex locus protein 1 is a protein that in humans is encoded by the GRINL1A gene.

ZNF366

Zinc finger protein 366, also known as DC-SCRIPT, is a protein that in humans is encoded by the ZNF366 gene. The ZNF366 gene was first identified in a DNA comparison study between 85 kb of Fugu rubripes sequence containing 17 genes with its homologous loci in the human draft genome.

ZNF41

Zinc finger protein 41 is a protein that in humans is encoded by the ZNF41 gene.

LAIR2

Leukocyte-associated immunoglobulin-like receptor 2 is a protein that in humans is encoded by the LAIR2 gene.

HMGN4

High mobility group nucleosome-binding domain-containing protein 4 is a transcription factor that in humans is encoded by the HMGN4 gene.

The Consensus Coding Sequence (CCDS) Project is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies. The CCDS project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier, and ensures that they are consistently represented by the National Center for Biotechnology Information (NCBI), Ensembl, and UCSC Genome Browser. The integrity of the CCDS dataset is maintained through stringent quality assurance testing and on-going manual curation.

De novo transcriptome assembly is the de novo sequence assembly method of creating a transcriptome without the aid of a reference genome.

Chimeric RNA, sometimes referred to as a fusion transcript, is composed of exons from two or more different genes that have the potential to encode novel proteins. These mRNAs are different from those produced by conventional splicing as they are produced by two or more gene loci.

Short interspersed nuclear element

Short interspersed nuclear elements (SINEs) are non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates. SINEs compose about 13% of the mammalian genome.

References

  1. Roginski, et al. (2004). "The human GRINL1A gene defines a complex transcription unit, an unusual form of gene organization in eukaryotes". Genomics. 84 (2): 265–276. doi:10.1016/j.ygeno.2004.04.004. PMID   15233991.
  2. Prakash T, Sharma VK, Adati N, Ozawa R, Kumar N, et al. (October 2010). Michalak P (ed.). "Expression of Conjoined Genes: Another Mechanism for Gene Regulation in Eukaryotes". PLOS ONE. 5 (10): e13284. Bibcode:2010PLoSO...513284P. doi: 10.1371/journal.pone.0013284 . PMC   2953495 . PMID   20967262.
  3. Akiva P, Toporik A, Edelheit S, et al. (January 2006). "Transcription-mediated gene fusion in the human genome". Genome Research. 16 (1): 30–6. doi:10.1101/gr.4137606. PMC   1356126 . PMID   16344562.
  4. Parra G, Reymond A, Dabbouseh N, et al. (January 2006). "Tandem chimerism as a means to increase protein complexity in the human genome". Genome Research. 16 (1): 37–44. doi:10.1101/gr.4145906. PMC   1356127 . PMID   16344564.
  5. Kim P, Yoon S, Kim N, et al. (November 2009). "ChimerDB 2.0--a knowledgebase for fusion genes updated". Nucleic Acids Research. 38 (Database issue): D81–D85. doi:10.1093/nar/gkp982. PMC   2808913 . PMID   19906715.
  6. Denoeud F, Kapranov P, Ucla C, et al. (June 2007). "Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions". Genome Research. 17 (6): 746–59. doi:10.1101/gr.5660607. PMC   1891335 . PMID   17567994.