Ribosomal frameshifting, also known as translational frameshifting or translational recoding, is a biological phenomenon that occurs during translation that results in the production of multiple, unique proteins from a single mRNA. [1] The process can be programmed by the nucleotide sequence of the mRNA and is sometimes affected by the secondary, 3-dimensional mRNA structure. [2] It has been described mainly in viruses (especially retroviruses), retrotransposons and bacterial insertion elements, and also in some cellular genes. [3]
Small molecules, proteins, and nucleic acids have also been found to stimulate levels of frameshifting. In December 2023, it was reported that in vitro-transcribed (IVT) mRNAs in response to BNT162b2 (Pfizer–BioNTech) anti-COVID-19 vaccine caused ribosomal frameshifting. [4]
Proteins are translated by reading tri-nucleotides on the mRNA strand, also known as codons, from one end of the mRNA to the other (from the 5' to the 3' end) starting with the amino acid methionine as the start (initiation) codon AUG. Each codon is translated into a single amino acid. The code itself is considered degenerate, meaning that a particular amino acid can be specified by more than one codon. However, a shift of any number of nucleotides that is not divisible by 3 in the reading frame will cause subsequent codons to be read differently. [5] This effectively changes the ribosomal reading frame.
In this example, the following sentence with three-letter words makes sense when read from the beginning:
|Start|THE CAT AND THE MAN ARE FAT ... |Start|123 123 123 123 123 123 123 ...
However, if the reading frame is shifted by one letter to between the T and H of the first word (effectively a +1 frameshift when considering the 0 position to be the initial position of T),
T|Start|HEC ATA NDT HEM ANA REF AT... -|Start|123 123 123 123 123 123 12...
then the sentence reads differently, making no sense.
In this example, the following sequence is a region of the human mitochondrial genome with the two overlapping genes MT-ATP8 and MT-ATP6. When read from the beginning, these codons make sense to a ribosome and can be translated into amino acids (AA) under the vertebrate mitochondrial code:
|Start|AAC GAA AAT CTG TTC GCT TCA ... |Start|123 123 123 123 123 123 123 ... | AA | N E N L F A S ...
However, let's change the reading frame by starting one nucleotide downstream (effectively a "+1 frameshift" when considering the 0 position to be the initial position of A):
A|Start|ACG AAA ATC TGT TCG CTT CA... -|Start|123 123 123 123 123 123 12... | AA | T K I C S L ...
Because of this +1 frameshifting, the DNA sequence is read differently. The different codon reading frame therefore yields different amino acids.
In the case of a translating ribosome, a frameshift can either result in Nonsense mutation, a premature stop codon after the frameshift, or the creation of a completely new protein after the frameshift. In the case where a frameshift results in nonsense, the nonsense-mediated mRNA decay (NMD) pathway may destroy the mRNA transcript, so frameshifting would serve as a method of regulating the expression level of the associated gene. [6]
If a novel or off-target protein is produced, it can trigger other unknown consequences. [4]
In viruses this phenomenon may be programmed to occur at particular sites and allows the virus to encode multiple types of proteins from the same mRNA. Notable examples include HIV-1 (human immunodeficiency virus), [7] RSV (Rous sarcoma virus) [8] and the influenza virus (flu), [9] which all rely on frameshifting to create a proper ratio of 0-frame (normal translation) and "trans-frame" (encoded by frameshifted sequence) proteins. Its use in viruses is primarily for compacting more genetic information into a shorter amount of genetic material.
In eukaryotes it appears to play a role in regulating gene expression levels by generating premature stops and producing nonfunctional transcripts. [3] [10]
The most common type of frameshifting is −1 frameshifting or programmed −1 ribosomal frameshifting (−1 PRF). Other, rarer types of frameshifting include +1 and −2 frameshifting. [2] −1 and +1 frameshifting are believed to be controlled by different mechanisms, which are discussed below. Both mechanisms are kinetically driven.
In −1 frameshifting, the ribosome slips back one nucleotide and continues translation in the −1 frame. There are typically three elements that comprise a −1 frameshift signal: a slippery sequence, a spacer region, and an RNA secondary structure. The slippery sequence fits a X_XXY_YYH motif, where XXX is any three identical nucleotides (though some exceptions occur), YYY typically represents UUU or AAA, and H is A, C or U. Because the structure of this motif contains 2 adjacent 3-nucleotide repeats it is believed that −1 frameshifting is described by a tandem slippage model, in which the ribosomal P-site tRNA anticodon re-pairs from XXY to XXX and the A-site anticodon re-pairs from YYH to YYY simultaneously. These new pairings are identical to the 0-frame pairings except at their third positions. This difference does not significantly disfavor anticodon binding because the third nucleotide in a codon, known as the wobble position, has weaker tRNA anticodon binding specificity than the first and second nucleotides. [2] [11] In this model, the motif structure is explained by the fact that the first and second positions of the anticodons must be able to pair perfectly in both the 0 and −1 frames. Therefore, nucleotides 2 and 1 must be identical, and nucleotides 3 and 2 must also be identical, leading to a required sequence of 3 identical nucleotides for each tRNA that slips. [12]
The slippery sequence for a +1 frameshift signal does not have the same motif, and instead appears to function by pausing the ribosome at a sequence encoding a rare amino acid. [13] Ribosomes do not translate proteins at a steady rate, regardless of the sequence. Certain codons take longer to translate, because there are not equal amounts of tRNA of that particular codon in the cytosol. [14] Due to this lag, there exist in small sections of codons sequences that control the rate of ribosomal frameshifting. Specifically, the ribosome must pause to wait for the arrival of a rare tRNA, and this increases the kinetic favorability of the ribosome and its associated tRNA slipping into the new frame. [13] [15] In this model, the change in reading frame is caused by a single tRNA slip rather than two.
Ribosomal frameshifting may be controlled by mechanisms found in the mRNA sequence (cis-acting). This generally refers to a slippery sequence, a RNA secondary structure, or both. A −1 frameshift signal consists of both elements separated by a spacer region typically 5–9 nucleotides long. [2] Frameshifting may also be induced by other molecules which interact with the ribosome or the mRNA (trans-acting).
Slippery sequences can potentially make the reading ribosome "slip" and skip a number of nucleotides (usually only 1) and read a completely different frame thereafter. In programmed −1 ribosomal frameshifting, the slippery sequence fits a X_XXY_YYH motif, where XXX is any three identical nucleotides (though some exceptions occur), YYY typically represents UUU or AAA, and H is A, C or U. In the case of +1 frameshifting, the slippery sequence contains codons for which the corresponding tRNA is more rare, and the frameshift is favored because the codon in the new frame has a more common associated tRNA. [13] One example of a slippery sequence is the polyA on mRNA, which is known to induce ribosome slippage even in the absence of any other elements. [16]
Efficient ribosomal frameshifting generally requires the presence of an RNA secondary structure to enhance the effects of the slippery sequence. [12] The RNA structure (which can be a stem-loop or pseudoknot) is thought to pause the ribosome on the slippery site during translation, forcing it to relocate and continue replication from the −1 position. It is believed that this occurs because the structure physically blocks movement of the ribosome by becoming stuck in the ribosome mRNA tunnel. [2] This model is supported by the fact that strength of the pseudoknot has been positively correlated with the level of frameshifting for associated mRNA. [3] [17]
Below are examples of predicted secondary structures for frameshift elements shown to stimulate frameshifting in a variety of organisms. The majority of the structures shown are stem-loops, with the exception of the ALIL (apical loop-internal loop) pseudoknot structure. In these images, the larger and incomplete circles of mRNA represent linear regions. The secondary "stem-loop" structures, where "stems" are formed by a region of mRNA base pairing with another region on the same strand, are shown protruding from the linear DNA. The linear region of the HIV ribosomal frameshift signal contains a highly conserved UUU UUU A slippery sequence; many of the other predicted structures contain candidates for slippery sequences as well.
The mRNA sequences in the images can be read according to a set of guidelines. While A, T, C, and G represent a particular nucleotide at a position, there are also letters that represent ambiguity which are used when more than one kind of nucleotide could occur at that position. The rules of the International Union of Pure and Applied Chemistry (IUPAC) are as follows: [18]
Symbol [18] | Description | Bases represented | Complement | ||||
---|---|---|---|---|---|---|---|
A | Adenine | A | 1 | T | |||
C | Cytosine | C | G | ||||
G | Guanine | G | C | ||||
T | Thymine | T | A | ||||
U | Uracil | U | A | ||||
W | Weak | A | T | 2 | W | ||
S | Strong | C | G | S | |||
M | aMino | A | C | K | |||
K | Keto | G | T | M | |||
R | puRine | A | G | R | |||
Y | pYrimidine | C | T | Y | |||
B | not A (B comes after A) | C | G | T | 3 | V | |
D | not C (D comes after C) | A | G | T | H | ||
H | not G (H comes after G) | A | C | T | D | ||
V | not T (V comes after T and U) | A | C | G | B | ||
N | any Nucleotide (not a gap) | A | C | G | T | 4 | N |
Z | Zero | 0 | Z |
These symbols are also valid for RNA, except with U (uracil) replacing T (thymine). [18]
Small molecules, proteins, and nucleic acids have been found to stimulate levels of frameshifting. For example, the mechanism of a negative feedback loop in the polyamine synthesis pathway is based on polyamine levels stimulating an increase in +1 frameshifts, which results in production of an inhibitory enzyme. Certain proteins which are needed for codon recognition or which bind directly to the mRNA sequence have also been shown to modulate frameshifting levels. MicroRNA (miRNA) molecules may hybridize to a RNA secondary structure and affect its strength. [6]
Protein biosynthesis is a core biological process, occurring inside cells, balancing the loss of cellular proteins through the production of new proteins. Proteins perform a number of critical functions as enzymes, structural proteins or hormones. Protein synthesis is a very similar process for both prokaryotes and eukaryotes but there are some distinct differences.
Ribosomes are macromolecular machines, found within all cells, that perform biological protein synthesis. Ribosomal RNA is found in the ribosomal nucleus where this synthesis happens. Ribosomes link amino acids together in the order specified by the codons of messenger RNA molecules to form polypeptide chains. Ribosomes consist of two major components: the small and large ribosomal subunits. Each subunit consists of one or more ribosomal RNA molecules and many ribosomal proteins. The ribosomes and associated molecules are also known as the translational apparatus.
In biology, translation is the process in living cells in which proteins are produced using RNA molecules as templates. The generated protein is a sequence of amino acids. This sequence is determined by the sequence of nucleotides in the RNA. The nucleotides are considered three at a time. Each such triple results in addition of one specific amino acid to the protein being generated. The matching from nucleotide triple to amino acid is called the genetic code. The translation is performed by a large complex of functional RNA and proteins called ribosomes. The entire process is called gene expression.
In molecular biology, a reading frame is a way of dividing the sequence of nucleotides in a nucleic acid molecule into a set of consecutive, non-overlapping triplets. Where these triplets equate to amino acids or stop signals during translation, they are called codons.
Transfer RNA is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length, that serves as the physical link between the mRNA and the amino acid sequence of proteins. Transfer RNA (tRNA) does this by carrying an amino acid to the protein-synthesizing machinery of a cell called the ribosome. Complementation of a 3-nucleotide codon in a messenger RNA (mRNA) by a 3-nucleotide anticodon of the tRNA results in protein synthesis based on the mRNA code. As such, tRNAs are a necessary component of translation, the biological synthesis of new proteins in accordance with the genetic code.
A frameshift mutation is a genetic mutation caused by indels of a number of nucleotides in a DNA sequence that is not divisible by three. Due to the triplet nature of gene expression by codons, the insertion or deletion can change the reading frame, resulting in a completely different translation from the original. The earlier in the sequence the deletion or insertion occurs, the more altered the protein. A frameshift mutation is not the same as a single-nucleotide polymorphism in which a nucleotide is replaced, rather than inserted or deleted. A frameshift mutation will in general cause the reading of the codons after the mutation to code for different amino acids. The frameshift mutation will also alter the first stop codon encountered in the sequence. The polypeptide being created could be abnormally short or abnormally long, and will most likely not be functional.
The 5′ untranslated region is the region of a messenger RNA (mRNA) that is directly upstream from the initiation codon. This region is important for the regulation of translation of a transcript by differing mechanisms in viruses, prokaryotes and eukaryotes. While called untranslated, the 5′ UTR or a portion of it is sometimes translated into a protein product. This product can then regulate the translation of the main coding sequence of the mRNA. In many organisms, however, the 5′ UTR is completely untranslated, instead forming a complex secondary structure to regulate translation.
The Shine–Dalgarno (SD) sequence is a ribosomal binding site in bacterial and archaeal messenger RNA, generally located around 8 bases upstream of the start codon AUG. The RNA sequence helps recruit the ribosome to the messenger RNA (mRNA) to initiate protein synthesis by aligning the ribosome with the start codon. Once recruited, tRNA may add amino acids in sequence as dictated by the codons, moving downstream from the translational start site.
Transfer-messenger RNA is a bacterial RNA molecule with dual tRNA-like and messenger RNA-like properties. The tmRNA forms a ribonucleoprotein complex (tmRNP) together with Small Protein B (SmpB), Elongation Factor Tu (EF-Tu), and ribosomal protein S1. In trans-translation, tmRNA and its associated proteins bind to bacterial ribosomes which have stalled in the middle of protein biosynthesis, for example when reaching the end of a messenger RNA which has lost its stop codon. The tmRNA is remarkably versatile: it recycles the stalled ribosome, adds a proteolysis-inducing tag to the unfinished polypeptide, and facilitates the degradation of the aberrant messenger RNA. In the majority of bacteria these functions are carried out by standard one-piece tmRNAs. In other bacterial species, a permuted ssrA gene produces a two-piece tmRNA in which two separate RNA chains are joined by base-pairing.
Stem-loop intramolecular base pairing is a pattern that can occur in single-stranded RNA. The structure is also known as a hairpin or hairpin loop. It occurs when two regions of the same strand, usually complementary in nucleotide sequence when read in opposite directions, base-pair to form a double helix that ends in an unpaired loop. The resulting structure is a key building block of many RNA secondary structures. As an important secondary structure of RNA, it can direct RNA folding, protect structural stability for messenger RNA (mRNA), provide recognition sites for RNA binding proteins, and serve as a substrate for enzymatic reactions.
Ribosome shunting is a mechanism of translation initiation in which ribosomes bypass, or "shunt over", parts of the 5' untranslated region to reach the start codon. However, a benefit of ribosomal shunting is that it can translate backwards allowing more information to be stored than usual in an mRNA molecule. Some viral RNAs have been shown to use ribosome shunting as a more efficient form of translation during certain stages of viral life cycle or when translation initiation factors are scarce. Some viruses known to use this mechanism include adenovirus, Sendai virus, human papillomavirus, duck hepatitis B pararetrovirus, rice tungro bacilliform viruses, and cauliflower mosaic virus. In these viruses the ribosome is directly translocated from the upstream initiation complex to the start codon (AUG) without the need to unwind RNA secondary structures.
Eukaryotic translation is the biological process by which messenger RNA is translated into proteins in eukaryotes. It consists of four phases: initiation, elongation, termination, and recapping.
The Kozak consensus sequence is a nucleic acid motif that functions as the protein translation initiation site in most eukaryotic mRNA transcripts. Regarded as the optimum sequence for initiating translation in eukaryotes, the sequence is an integral aspect of protein regulation and overall cellular health as well as having implications in human disease. It ensures that a protein is correctly translated from the genetic message, mediating ribosome assembly and translation initiation. A wrong start site can result in non-functional proteins. As it has become more studied, expansions of the nucleotide sequence, bases of importance, and notable exceptions have arisen. The sequence was named after the scientist who discovered it, Marilyn Kozak. Kozak discovered the sequence through a detailed analysis of DNA genomic sequences.
In molecular biology, the coronavirus frameshifting stimulation element is a conserved stem-loop of RNA found in coronaviruses that can promote ribosomal frameshifting. Such RNA molecules interact with a downstream region to form a pseudoknot structure; the region varies according to the virus but pseudoknot formation is known to stimulate frameshifting. In the classical situation, a sequence 32 nucleotides downstream of the stem is complementary to part of the loop. In other coronaviruses, however, another stem-loop structure around 150 nucleotides downstream can interact with members of this family to form kissing stem-loops and stimulate frameshifting.
HIV ribosomal frameshift signal is a ribosomal frameshift (PRF) that human immunodeficiency virus (HIV) uses to translate several different proteins from the same sequence.
A ribosome binding site, or ribosomal binding site (RBS), is a sequence of nucleotides upstream of the start codon of an mRNA transcript that is responsible for the recruitment of a ribosome during the initiation of translation. Mostly, RBS refers to bacterial sequences, although internal ribosome entry sites (IRES) have been described in mRNAs of eukaryotic cells or viruses that infect eukaryotes. Ribosome recruitment in eukaryotes is generally mediated by the 5' cap present on eukaryotic mRNAs.
In biochemistry, wybutosine (yW) is a heavily modified nucleoside of phenylalanine transfer RNA that stabilizes interactions between the codons and anti-codons during protein synthesis. Ensuring accurate synthesis of protein is essential in maintaining health as defects in tRNA modifications are able to cause disease. In eukaryotic organisms, it is found only in position 37, 3'-adjacent to the anticodon, of phenylalanine tRNA. Wybutosine enables correct translation through the stabilization of the codon-anticodon base pairing during the decoding process.
Ribosomal pause refers to the queueing or stacking of ribosomes during translation of the nucleotide sequence of mRNA transcripts. These transcripts are decoded and converted into an amino acid sequence during protein synthesis by ribosomes. Due to the pause sites of some mRNA's, there is a disturbance caused in translation. Ribosomal pausing occurs in both eukaryotes and prokaryotes. A more severe pause is known as a ribosomal stall.
A slippery sequence is a small section of codon nucleotide sequences that controls the rate and chance of ribosomal frameshifting. A slippery sequence causes a faster ribosomal transfer which in turn can cause the reading ribosome to "slip." This allows a tRNA to shift by 1 base (−1) after it has paired with its anticodon, changing the reading frame. A −1 frameshift triggered by such a sequence is a programmed −1 ribosomal frameshift. It is followed by a spacer region, and an RNA secondary structure. Such sequences are common in virus polyproteins.
ORF1ab refers collectively to two open reading frames (ORFs), ORF1a and ORF1b, that are conserved in the genomes of nidoviruses, a group of viruses that includes coronaviruses. The genes express large polyproteins that undergo proteolysis to form several nonstructural proteins with various functions in the viral life cycle, including proteases and the components of the replicase-transcriptase complex (RTC). Together the two ORFs are sometimes referred to as the replicase gene. They are related by a programmed ribosomal frameshift that allows the ribosome to continue translating past the stop codon at the end of ORF1a, in a -1 reading frame. The resulting polyproteins are known as pp1a and pp1ab.