Protein isoform

Last updated
Protein A, B and C are isoforms encoded from the same gene through alternative splicing. DNA alternative splicing.gif
Protein A, B and C are isoforms encoded from the same gene through alternative splicing.

A protein isoform, or "protein variant", [1] is a member of a set of highly similar proteins that originate from a single gene or gene family and are the result of genetic differences. [2] While many perform the same or similar biological roles, some isoforms have unique functions. A set of protein isoforms may be formed from alternative splicings, variable promoter usage, or other post-transcriptional modifications of a single gene; post-translational modifications are generally not considered. (For that, see Proteoforms.) Through RNA splicing mechanisms, mRNA has the ability to select different protein-coding segments (exons) of a gene, or even different parts of exons from RNA to form different mRNA sequences. Each unique sequence produces a specific form of a protein.

Contents

The discovery of isoforms could explain the discrepancy between the small number of protein coding regions of genes revealed by the human genome project and the large diversity of proteins seen in an organism: different proteins encoded by the same gene could increase the diversity of the proteome. Isoforms at the RNA level are readily characterized by cDNA transcript studies. Many human genes possess confirmed alternative splicing isoforms. It has been estimated that ~100,000 expressed sequence tags (ESTs) can be identified in humans. [1] Isoforms at the protein level can manifest in the deletion of whole domains or shorter loops, usually located on the surface of the protein. [3]

Definition

One single gene has the ability to produce multiple proteins that differ both in structure and composition; [4] [5] this process is regulated by the alternative splicing of mRNA, though it is not clear to what extent such a process affects the diversity of the human proteome, as the abundance of mRNA transcript isoforms does not necessarily correlate with the abundance of protein isoforms. [6] Three-dimensional protein structure comparisons can be used to help determine which, if any, isoforms represent functional protein products, and the structure of most isoforms in the human proteome has been predicted by AlphaFold and publicly released at isoform.io. [7] The specificity of translated isoforms is derived by the protein's structure/function, as well as the cell type and developmental stage during which they are produced. [4] [5] Determining specificity becomes more complicated when a protein has multiple subunits and each subunit has multiple isoforms.

For example, the 5' AMP-activated protein kinase (AMPK), an enzyme, which performs different roles in human cells, has 3 subunits: [8]

In human skeletal muscle, the preferred form is α2β2γ1. [8] But in the human liver, the most abundant form is α1β2γ1. [8]

Mechanism

Different mechanisms of RNA splicing Alternative splicing.jpg
Different mechanisms of RNA splicing

The primary mechanisms that produce protein isoforms are alternative splicing and variable promoter usage, though modifications due to genetic changes, such as mutations and polymorphisms are sometimes also considered distinct isoforms. [9]

Alternative splicing is the main post-transcriptional modification process that produces mRNA transcript isoforms, and is a major molecular mechanism that may contribute to protein diversity. [5] The spliceosome, a large ribonucleoprotein, is the molecular machine inside the nucleus responsible for RNA cleavage and ligation, removing non-protein coding segments (introns). [10]

Because splicing is a process that occurs between transcription and translation, its primary effects have mainly been studied through genomics techniques—for example, microarray analyses and RNA sequencing have been used to identify alternatively spliced transcripts and measure their abundances. [9] Transcript abundance is often used as a proxy for the abundance of protein isoforms, though proteomics experiments using gel electrophoresis and mass spectrometry have demonstrated that the correlation between transcript and protein counts is often low, and that one protein isoform is usually dominant. [11] One 2015 study states that the cause of this discrepancy likely occurs after translation, though the mechanism is essentially unknown. [12] Consequently, although alternative splicing has been implicated as an important link between variation and disease, there is no conclusive evidence that it acts primarily by producing novel protein isoforms. [11]

Alternative splicing generally describes a tightly regulated process in which alternative transcripts are intentionally generated by the splicing machinery. However, such transcripts are also produced by splicing errors in a process called "noisy splicing," and are also potentially translated into protein isoforms. Although ~95% of multi-exonic genes are thought to be alternatively spliced, one study on noisy splicing observed that most of the different low-abundance transcripts are noise, and predicts that most alternative transcript and protein isoforms present in a cell are not functionally relevant. [13]

Other transcriptional and post-transcriptional regulatory steps can also produce different protein isoforms. [14] Variable promoter usage occurs when the transcriptional machinery of a cell (RNA polymerase, transcription factors, and other enzymes) begin transcription at different promoters—the region of DNA near a gene that serves as an initial binding site—resulting in slightly modified transcripts and protein isoforms.

Characteristics

Generally, one protein isoform is labeled as the canonical sequence based on criteria such as its prevalence and similarity to orthologous—or functionally analogous—sequences in other species. [15] Isoforms are assumed to have similar functional properties, as most have similar sequences, and share some to most exons with the canonical sequence. However, some isoforms show much greater divergence (for example, through trans-splicing), and can share few to no exons with the canonical sequence. In addition, they can have different biological effects—for example, in an extreme case, the function of one isoform can promote cell survival, while another promotes cell death—or can have similar basic functions but differ in their sub-cellular localization. [16] A 2016 study, however, functionally characterized all the isoforms of 1,492 genes and determined that most isoforms behave as "functional alloforms." The authors came to the conclusion that isoforms behave like distinct proteins after observing that the functional of most isoforms did not overlap. [17] Because the study was conducted on cells in vitro, it is not known if the isoforms in the expressed human proteome share these characteristics. Additionally, because the function of each isoform must generally be determined separately, most identified and predicted isoforms still have unknown functions.

Glycoform

A glycoform is an isoform of a protein that differs only with respect to the number or type of attached glycan. Glycoproteins often consist of a number of different glycoforms, with alterations in the attached saccharide or oligosaccharide. These modifications may result from differences in biosynthesis during the process of glycosylation, or due to the action of glycosidases or glycosyltransferases. Glycoforms may be detected through detailed chemical analysis of separated glycoforms, but more conveniently detected through differential reaction with lectins, as in lectin affinity chromatography and lectin affinity electrophoresis. Typical examples of glycoproteins consisting of glycoforms are the blood proteins as orosomucoid, antitrypsin, and haptoglobin. An unusual glycoform variation is seen in neuronal cell adhesion molecule, NCAM involving polysialic acids, PSA.

Examples

Monoamine oxidase, a family of enzymes that catalyze the oxidation of monoamines, exists in two isoforms, MAO-A and MAO-B.

See also

Related Research Articles

An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word intron is derived from the term intragenic region, i.e., a region inside a gene. The term intron refers to both the DNA sequence within a gene and the corresponding RNA sequence in RNA transcripts. The non-intron sequences that become joined by this RNA processing to form the mature RNA are called exons.

<span class="mw-page-title-main">RNA splicing</span> Process in molecular biology

RNA splicing is a process in molecular biology where a newly-made precursor messenger RNA (pre-mRNA) transcript is transformed into a mature messenger RNA (mRNA). It works by removing all the introns and splicing back together exons. For nuclear-encoded genes, splicing occurs in the nucleus either during or immediately after transcription. For those eukaryotic genes that contain introns, splicing is usually needed to create an mRNA molecule that can be translated into protein. For many eukaryotic introns, splicing occurs in a series of reactions which are catalyzed by the spliceosome, a complex of small nuclear ribonucleoproteins (snRNPs). There exist self-splicing introns, that is, ribozymes that can catalyze their own excision from their parent RNA molecule. The process of transcription, splicing and translation is called gene expression, the central dogma of molecular biology.

<span class="mw-page-title-main">Gene expression</span> Conversion of a genes sequence into a mature gene product or products

Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, and ultimately affect a phenotype. These products are often proteins, but in non-protein-coding genes such as transfer RNA (tRNA) and small nuclear RNA (snRNA), the product is a functional non-coding RNA. Gene expression is summarized in the central dogma of molecular biology first formulated by Francis Crick in 1958, further developed in his 1970 article, and expanded by the subsequent discoveries of reverse transcription and RNA replication.

<span class="mw-page-title-main">Alternative splicing</span> Process by which a gene can code for multiple proteins

Alternative splicing, or alternative RNA splicing, or differential splicing, is an alternative splicing process during gene expression that allows a single gene to code for multiple proteins. In this process, particular exons of a gene may be included within or excluded from the final, processed messenger RNA (mRNA) produced from that gene. This means the exons are joined in different combinations, leading to different (alternative) mRNA strands. Consequently, the proteins translated from alternatively spliced mRNAs usually contain differences in their amino acid sequence and, often, in their biological functions.

Trans-splicing is a special form of RNA processing where exons from two different primary RNA transcripts are joined end to end and ligated. It is usually found in eukaryotes and mediated by the spliceosome, although some bacteria and archaea also have "half-genes" for tRNAs.

<span class="mw-page-title-main">SR protein</span>

SR proteins are a conserved family of proteins involved in RNA splicing. SR proteins are named because they contain a protein domain with long repeats of serine and arginine amino acid residues, whose standard abbreviations are "S" and "R" respectively. SR proteins are ~200-600 amino acids in length and composed of two domains, the RNA recognition motif (RRM) region and the RS domain. SR proteins are more commonly found in the nucleus than the cytoplasm, but several SR proteins are known to shuttle between the nucleus and the cytoplasm.

<span class="mw-page-title-main">Primary transcript</span> RNA produced by transcription

A primary transcript is the single-stranded ribonucleic acid (RNA) product synthesized by transcription of DNA, and processed to yield various mature RNA products such as mRNAs, tRNAs, and rRNAs. The primary transcripts designated to be mRNAs are modified in preparation for translation. For example, a precursor mRNA (pre-mRNA) is a type of primary transcript that becomes a messenger RNA (mRNA) after processing.

<span class="mw-page-title-main">Post-transcriptional modification</span> RNA processing within a biological cell

Transcriptional modification or co-transcriptional modification is a set of biological processes common to most eukaryotic cells by which an RNA primary transcript is chemically altered following transcription from a gene to produce a mature, functional RNA molecule that can then leave the nucleus and perform any of a variety of different functions in the cell. There are many types of post-transcriptional modifications achieved through a diverse class of molecular mechanisms.

An exonic splicing silencer (ESS) is a short region of an exon and is a cis-regulatory element. A set of 103 hexanucleotides known as FAS-hex3 has been shown to be abundant in ESS regions. ESSs inhibit or silence splicing of the pre-mRNA and contribute to constitutive and alternate splicing. To elicit the silencing effect, ESSs recruit proteins that will negatively affect the core splicing machinery.

<span class="mw-page-title-main">RBM10</span> Protein-coding gene in the species Homo sapiens

RNA-binding motif 10 is a protein that is encoded by the RBM10 gene. This gene maps on the X chromosome at Xp11.23 in humans. RBM10 is a regulator of alternative splicing. Alternative splicing is a process associated with gene expression to produce multiple protein isoforms from a single gene, thereby creating functional diversity and cellular complexity. RBM10 influences the expression of many genes, participating in various cellular processes and pathways such as cell proliferation and apoptosis. Its mutations are associated with various human diseases such as TARP syndrome, an X-linked congenital disorder in males resulting in pre‐ or postnatal lethality, and various cancers in adults.

<span class="mw-page-title-main">RBM4</span> Protein-coding gene in the species Homo sapiens

RNA-binding protein 4 is a protein that in humans is encoded by the RBM4 gene.

<span class="mw-page-title-main">MAPKAP1</span> Protein-coding gene in the species Homo sapiens

Target of rapamycin complex 2 subunit MAPKAP1 is a protein that in humans is encoded by the MAPKAP1 gene. As the name indicates, it is a subunit of mTOR complex 2.

<span class="mw-page-title-main">RBM9</span> Protein-coding gene in the species Homo sapiens

RNA binding motif protein 9 (RBM9), also known as Rbfox2, is a protein which in humans is encoded by the RBM9 gene.

<span class="mw-page-title-main">ENO3</span> Protein-coding gene in the species Homo sapiens

Enolase 3 (ENO3), more commonly known as beta-enolase (ENO-β), is an enzyme that in humans is encoded by the ENO3 gene.

De novo transcriptome assembly is the de novo sequence assembly method of creating a transcriptome without the aid of a reference genome.

Gene isoforms are mRNAs that are produced from the same locus but are different in their transcription start sites (TSSs), protein coding DNA sequences (CDSs) and/or untranslated regions (UTRs), potentially altering gene function.

Chimeric RNA, sometimes referred to as a fusion transcript, is composed of exons from two or more different genes that have the potential to encode novel proteins. These mRNAs are different from those produced by conventional splicing as they are produced by two or more gene loci.

<span class="mw-page-title-main">Circular RNA</span> Type of RNA found in cells

Circular RNA is a type of single-stranded RNA which, unlike linear RNA, forms a covalently closed continuous loop. In circular RNA, the 3' and 5' ends normally present in an RNA molecule have been joined together. This feature confers numerous properties to circular RNA, many of which have only recently been identified.

<span class="mw-page-title-main">HNRNPLL</span> Protein-coding gene in the species Homo sapiens

Heterogeneous nuclear ribonucleoprotein L-like is a protein that in humans is encoded by the HNRNPLL gene.

Exitrons are produced through alternative splicing and have characteristics of both introns and exons, but are described as retained introns. Even though they are considered introns, which are typically cut out of pre mRNA sequences, there are significant problems that arise when exitrons are spliced out of these strands, with the most obvious result being altered protein structures and functions. They were first discovered in plants, but have recently been found in metazoan species as well.

References

  1. 1 2 Brett D, Pospisil H, Valcárcel J, Reich J, Bork P (January 2002). "Alternative splicing and genome complexity". Nature Genetics. 30 (1): 29–30. doi:10.1038/ng803. PMID   11743582. S2CID   2724843.
  2. Schlüter H, Apweiler R, Holzhütter HG, Jungblut PR (September 2009). "Finding one's way in proteomics: a protein species nomenclature". Chemistry Central Journal. 3: 11. doi: 10.1186/1752-153X-3-11 . PMC   2758878 . PMID   19740416.
  3. Kozlowski, L.; Orlowski, J.; Bujnicki, J. M. (2012). "Structure Prediction for Alternatively Spliced Proteins". Alternative pre-mRNA Splicing. p. 582. doi:10.1002/9783527636778.ch54. ISBN   9783527636778.
  4. 1 2 Andreadis A, Gallego ME, Nadal-Ginard B (1987-01-01). "Generation of protein isoform diversity by alternative splicing: mechanistic and biological implications". Annual Review of Cell Biology. 3 (1): 207–42. doi:10.1146/annurev.cb.03.110187.001231. PMID   2891362.
  5. 1 2 3 Breitbart RE, Andreadis A, Nadal-Ginard B (1987-01-01). "Alternative splicing: a ubiquitous mechanism for the generation of multiple protein isoforms from single genes". Annual Review of Biochemistry. 56 (1): 467–95. doi:10.1146/annurev.bi.56.070187.002343. PMID   3304142.
  6. Liu Y, Beyer A, Aebersold R (April 2016). "On the Dependency of Cellular Protein Levels on mRNA Abundance". Cell. 165 (3): 535–50. doi: 10.1016/j.cell.2016.03.014 . PMID   27104977.
  7. Sommer, Markus J.; Cha, Sooyoung; Varabyou, Ales; Rincon, Natalia; Park, Sukhwan; Minkin, Ilia; Pertea, Mihaela; Steinegger, Martin; Salzberg, Steven L. (2022-12-15). "Structure-guided isoform identification for the human transcriptome". eLife. 11: e82556. doi: 10.7554/eLife.82556 . PMC   9812405 . PMID   36519529.
  8. 1 2 3 Dasgupta B, Chhipa RR (March 2016). "Evolving Lessons on the Complex Role of AMPK in Normal Physiology and Cancer". Trends in Pharmacological Sciences. 37 (3): 192–206. doi:10.1016/j.tips.2015.11.007. PMC   4764394 . PMID   26711141.
  9. 1 2 Kornblihtt AR, Schor IE, Alló M, Dujardin G, Petrillo E, Muñoz MJ (March 2013). "Alternative splicing: a pivotal step between eukaryotic transcription and translation". Nature Reviews Molecular Cell Biology. 14 (3): 153–65. doi:10.1038/nrm3525. hdl: 11336/21049 . PMID   23385723. S2CID   54560052.
  10. Lee Y, Rio DC (2015-01-01). "Mechanisms and Regulation of Alternative Pre-mRNA Splicing". Annual Review of Biochemistry. 84 (1): 291–323. doi:10.1146/annurev-biochem-060614-034316. PMC   4526142 . PMID   25784052.
  11. 1 2 Tress ML, Abascal F, Valencia A (February 2017). "Alternative Splicing May Not Be the Key to Proteome Complexity". Trends in Biochemical Sciences. 42 (2): 98–110. doi:10.1016/j.tibs.2016.08.008. PMC   6526280 . PMID   27712956.
  12. Battle A, Khan Z, Wang SH, Mitrano A, Ford MJ, Pritchard JK, Gilad Y (February 2015). "Genomic variation. Impact of regulatory variation from RNA to protein". Science. 347 (6222): 664–7. doi:10.1126/science.1260793. PMC   4507520 . PMID   25657249.
  13. Pickrell JK, Pai AA, Gilad Y, Pritchard JK (December 2010). "Noisy splicing drives mRNA isoform diversity in human cells". PLOS Genetics. 6 (12): e1001236. doi: 10.1371/journal.pgen.1001236 . PMC   3000347 . PMID   21151575.
  14. Smith LM, Kelleher NL (March 2013). "Proteoform: a single term describing protein complexity". Nature Methods. 10 (3): 186–7. doi:10.1038/nmeth.2369. PMC   4114032 . PMID   23443629.
  15. Li HD, Menon R, Omenn GS, Guan Y (December 2014). "Revisiting the identification of canonical splice isoforms through integration of functional genomics and proteomics evidence" (PDF). Proteomics. 14 (23–24): 2709–18. doi:10.1002/pmic.201400170. PMC   4372202 . PMID   25265570.
  16. Sundvall M, Veikkolainen V, Kurppa K, Salah Z, Tvorogov D, van Zoelen EJ, Aqeilan R, Elenius K (December 2010). "Cell death or survival promoted by alternative isoforms of ErbB4". Molecular Biology of the Cell. 21 (23): 4275–86. doi:10.1091/mbc.E10-04-0332. PMC   2993754 . PMID   20943952.
  17. Yang X, Coulombe-Huntington J, Kang S, Sheynkman GM, Hao T, Richardson A, et al. (February 2016). "Widespread Expansion of Protein Interaction Capabilities by Alternative Splicing". Cell. 164 (4): 805–17. doi:10.1016/j.cell.2016.01.029. PMC   4882190 . PMID   26871637.
  18. Barre L, Fournel-Gigleux S, Finel M, Netter P, Magdalou J, Ouzzine M (March 2007). "Substrate specificity of the human UDP-glucuronosyltransferase UGT2B4 and UGT2B7. Identification of a critical aromatic amino acid residue at position 33". The FEBS Journal. 274 (5): 1256–64. doi: 10.1111/j.1742-4658.2007.05670.x . PMID   17263731.
  19. Pathoma, Fundamentals of Pathology