ORF1ab

Last updated
Replicase polyprotein
Identifiers
Organism SARS-CoV
Symbolrep
UniProt P0C6X7
Search for
Structures Swiss-model
Domains InterPro
Replicase polyprotein
Identifiers
Organism SARS-CoV-2
Symbolrep
UniProt P0DTD1
Search for
Structures Swiss-model
Domains InterPro

ORF1ab (also ORF1a/b) refers collectively to two open reading frames (ORFs), ORF1a and ORF1b, that are conserved in the genomes of nidoviruses, a group of viruses that includes coronaviruses. The genes express large polyproteins that undergo proteolysis to form several nonstructural proteins with various functions in the viral life cycle, including proteases and the components of the replicase-transcriptase complex (RTC). [1] [2] [3] Together the two ORFs are sometimes referred to as the replicase gene. [4] They are related by a programmed ribosomal frameshift that allows the ribosome to continue translating past the stop codon at the end of ORF1a, in a -1 reading frame. The resulting polyproteins are known as pp1a and pp1ab. [1] [2] [3] [4]

Contents

Expression

Genomic information
SARS-CoV-2 genome.svg
Genomic organisation of isolate Wuhan-Hu-1, the earliest sequenced sample of SARS-CoV-2, indicating the location of ORF1a and ORF1b
NCBI genome ID 86693
Genome size 29,903 bases
Year of completion 2020
Genome browser (UCSC)

ORF1a is the first open reading frame at the 5' end of the genome. Together ORF1ab occupies about two thirds of the genome, with the remaining third at the 3' end encoding the structural proteins and accessory proteins. [1] [2] [3] It is translated from a 5' capped RNA by cap-dependent translation. [1] Nidoviruses have a complex system of discontinuous subgenomic RNA production to enable expression of genes in their relatively large RNA genomes (typically 27-32kb for coronaviruses [1] ), but ORF1ab is translated directly from the genomic RNA. [5] ORF1ab sequences have been observed in noncanonical subgenomic RNAs, though their functional significance is unclear. [5]

A programmed ribosomal frameshift allows reading through the stop codon that terminates ORF1a to continue in a -1 reading frame, producing the longer polyprotein pp1ab. The frameshift occurs at a slippery sequence which is followed by a pseudoknot RNA secondary structure. [1] This has been measured at between 20-50% efficiency for murine coronavirus, [6] or 45-70% in SARS-CoV-2 [7] yielding a stoichiometry of roughly 1.5 to 2 times as much pp1a as pp1ab protein expressed. [2]

Processing

Top: Organization of the coronavirus genome, illustrating nonstructural proteins within ORF1a and ORF1b. Middle: Domain organization of nsp14 (exonuclease and methyltransferase). Bottom: Components of the coronavirus replicase-transcriptase complex. Replication-transcription complex for Coronaviruses.tif
Top: Organization of the coronavirus genome, illustrating nonstructural proteins within ORF1a and ORF1b. Middle: Domain organization of nsp14 (exonuclease and methyltransferase). Bottom: Components of the coronavirus replicase-transcriptase complex.

The polyproteins pp1a and pp1ab contain about 13 to 17 nonstructural proteins. [3] They undergo auto-proteolysis to release the nonstructural proteins due to the actions of internal cysteine protease domains. [1] [2] [3]

In coronaviruses, there are a total of 16 nonstructural proteins; pp1a protein contains nonstructural proteins nsp1-11 and the pp1ab protein contains nsp1-10 and nsp12-16. Proteolytic processing is performed by two proteases: the papain-like protease protein domain located in the multidomain protein nsp3 cleaves up to nsp4, and the 3CL protease (also known as the main protease, nsp5) performs the remaining cleavages of nsp5 through the polyprotein C-terminus. [1] [2] Proteins nsp12-16, the C-terminal components of the pp1ab polyprotein, contain the core enzymatic activities necessary for viral replication. [1] After proteolytic processing, several of the nonstructural proteins assemble into a large protein complex known as the replicase-transcriptase complex (RTC) which performs genome replication and transcription. [1] [2]

Components

Core replicase domains

Phylogenetic relationships between nidoviruses and their pp1ab protein domain organization, with conserved domains highlighted. NendoU represents the endoribonuclease and 3CLpro represents the main 3C-like protease. 1-s2.0-S0006291X20320593-gr3 lrg.jpg
Phylogenetic relationships between nidoviruses and their pp1ab protein domain organization, with conserved domains highlighted. NendoU represents the endoribonuclease and 3CLpro represents the main 3C-like protease.

A set of five conserved "core replicase" protein domains are present in all nidovirus lineages (arteriviruses, mesoniviruses, roniviruses, and coronaviruses): from ORF1a, the main protease flanked on either end by transmembrane domains; and from ORF1b, a nucleotidyltransferase domain known as NiRAN, RNA-dependent RNA polymerase (RdRp), a zinc-binding domain, and a helicase. [3] [9] (This is sometimes considered seven domains, counting the transmembrane regions separately. [4] ) In addition, an endoribonuclease domain is found in all nidoviruses that infect vertebrate hosts. Arteriviruses, which have smaller genomes than the other nidovirus lineages, also lack methyltransferases as well as a proofreading exoribonuclease, a domain that is conserved in nidoviruses with larger genomes. [3] This proofreading functionality is thought to be required for sufficient fidelity to replicate large RNA genomes, but may also play additional roles in some viruses. [9]

Coronaviruses

In coronaviruses, pp1a and pp1ab together contain sixteen nonstructural proteins, which have the following functions: [1] [2] [10] [11]

Nonstructural proteins derived from coronavirus pp1a and pp1ab proteins
Nonstructural proteinFunction
nonstructural protein 1 Cellular mRNA degradation, host cell translation inhibition, interferon inhibition; not present in Gammacoronavirus
nonstructural protein 2 Unknown; binds prohibitin
nonstructural protein 3 Multi-domain protein with one or two papain-like protease domains for polyprotein processing; interferon antagonist; multiple other roles
nonstructural protein 4 Double-membrane vesicle formation
nonstructural protein 5 3CL protease for polyprotein processing; interferon inhibition
nonstructural protein 6 Double-membrane vesicle formation
nonstructural protein 7 Cofactor and processivity factor for RdRp; forms complex with nsp8 and nsp12
nonstructural protein 8 Cofactor and processivity factor for RdRp; forms complex with nsp7 and nsp12
nonstructural protein 9 Single-stranded RNA binding
nonstructural protein 10 Cofactor for nsp14 and nsp16
nonstructural protein 11 Unknown
nonstructural protein 12 RNA-dependent RNA polymerase (RdRp) and nucleotidyltransferase
nonstructural protein 13 Helicase and RNA triphosphatase
nonstructural protein 14 Proofreading exonuclease, RNA cap formation, guanosine N7-methyltransferase
nonstructural protein 15 Endoribonuclease, immune evasion function
nonstructural protein 16 Ribose 2'-O-methyltransferase, RNA cap formation

Evolution

The structure and organization of the genome, including ORF1a, ORF1b, and the frameshift separating them, is conserved among nidoviruses. Some "non-canonical" nidovirus structures have been described, mainly involving gene fusions. [4] The largest known nidovirus, planarian secretory cell nidovirus (PSCNV), with a 41kb genome, has a non-canonical genome structure in which ORF1a, ORF1b, and downstream ORFs containing structural proteins are fused and expressed as a single large ORF encoding a polyprotein of over 13,000 amino acids. [4] [12] In these non-canonical genomes, other frameshift locations or stop codon readthrough may be used to regulate the stoichiometry of viral proteins. [4]

Nidoviruses vary widely in genome size, from arteriviruses with typically 12-15kb genomes to coronaviruses at 27-32kb. Their evolutionary history has been of research interest in understanding the replication of very large RNA genomes despite the relatively low-fidelity replication mechanism of the viral RNA-dependent RNA polymerase (RdRp). [4] The larger nidovirus genomes (above around 20kb [3] ) encode a proofreading exoribonuclease (nsp14 in coronaviruses) thought to be required for replication fidelity. [9] [1]

Among coronaviruses, ORF1ab is more highly conserved than the 3' ORFs encoding structural proteins. [11] Throughout the COVID-19 pandemic, the genome of SARS-CoV-2 viruses has been sequenced many times, resulting in identification of thousands of distinct variants. In a World Health Organization analysis from July 2020, ORF1ab was the most frequently mutated gene, followed by the S gene encoding the spike protein. The most commonly mutated protein within ORF1ab was papain-like protease (nsp3), and the single most commonly observed missense mutation was in RNA-dependent RNA polymerase. [13] Some PCR tests that detect COVID-19 analyze the specimen for the ORF1ab gene, among others. [14]

Related Research Articles

<span class="mw-page-title-main">RNA virus</span> Subclass of viruses

An RNA virus is a virus—other than a retrovirus—that has ribonucleic acid (RNA) as its genetic material. The nucleic acid is usually single-stranded RNA (ssRNA) but it may be double-stranded (dsRNA). Notable human diseases caused by RNA viruses include the common cold, influenza, SARS, MERS, COVID-19, Dengue Virus, hepatitis C, hepatitis E, West Nile fever, Ebola virus disease, rabies, polio, mumps, and measles.

<span class="mw-page-title-main">Coronavirus</span> Subfamily of viruses in the family Coronaviridae

Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans and birds, they cause respiratory tract infections that can range from mild to lethal. Mild illnesses in humans include some cases of the common cold, while more lethal varieties can cause SARS, MERS and COVID-19, which is causing the ongoing pandemic. In cows and pigs they cause diarrhea, while in mice they cause hepatitis and encephalomyelitis.

<span class="mw-page-title-main">SARS-related coronavirus</span> Species of coronavirus causing SARS and COVID-19

Severe acute respiratory syndrome–related coronavirus is a species of virus consisting of many known strains phylogenetically related to severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1) that have been shown to possess the capability to infect humans, bats, and certain other mammals. These enveloped, positive-sense single-stranded RNA viruses enter host cells by binding to the angiotensin-converting enzyme 2 (ACE2) receptor. The SARSr-CoV species is a member of the genus Betacoronavirus and of the subgenus Sarbecovirus.

<span class="mw-page-title-main">Picornavirus</span> Family of viruses

Picornaviruses are a group of related nonenveloped RNA viruses which infect vertebrates including fish, mammals, and birds. They are viruses that represent a large family of small, positive-sense, single-stranded RNA viruses with a 30 nm icosahedral capsid. The viruses in this family can cause a range of diseases including the common cold, poliomyelitis, meningitis, hepatitis, and paralysis.

<i>Birnaviridae</i> Family of viruses

Birnaviridae is a family of double-stranded RNA viruses. Salmonid fish, birds and insects serve as natural hosts. There are currently 11 species in this family, divided among seven genera. Diseases associated with this family include infectious pancreatic necrosis in salmonid fish, which causes significant losses to the aquaculture industry, with chronic infection in adult salmonid fish and acute viral disease in young salmonid fish.

<i>Tombusviridae</i> Family of viruses

Tombusviridae is a family of single-stranded positive sense RNA plant viruses. There are three subfamilies, 17 genera, and 95 species in this family. The name is derived from Tomato bushy stunt virus (TBSV).

<i>Nidovirales</i> Order of positive-sense, single-stranded RNA viruses

Nidovirales is an order of enveloped, positive-strand RNA viruses which infect vertebrates and invertebrates. Host organisms include mammals, birds, reptiles, amphibians, fish, arthropods, molluscs, and helminths. The order includes the families Coronaviridae, Arteriviridae, Roniviridae, and Mesoniviridae.

<i>Deformed wing virus</i> Species of virus

Deformed wing virus (DWV) is an RNA virus, one of 22 known viruses affecting honey bees. While most commonly infecting the honey bee, Apis mellifera, it has also been documented in other bee species, like Bombus terrestris, thus, indicating it may have a wider host specificity than previously anticipated. The virus was first isolated from a sample of symptomatic honeybees from Japan in the early 1980s and is currently distributed worldwide. It is found also in pollen baskets and commercially reared bumblebees. Its main vector in A. mellifera is the Varroa mite. It is named after what is usually the most obvious deformity it induces in the development of a honeybee pupa, which is shrunken and deformed wings, but other developmental deformities are often present.

Lactate dehydrogenase elevating virus (LDV) constitutes the species Gamamaarterivirus lacdeh which is part of the family Arteriviridae and order Nidovirales. The order Nidovirales also includes the family of coronaviruses. Arteriviruses infect macrophages in animals and cause a variety of diseases. LDV specifically causes lifelong persistent viremia in mice, but does not harm the host and only slightly harms the immune system. The main clinical sign is an increased level of the plasma enzyme lactate dehydrogenase (LDH). LDV has a remarkably narrow cell type specificity, meaning nothing homologous with LDV in mice has been found in another species.

<span class="mw-page-title-main">Hepatitis C virus nonstructural protein 5A</span>

Nonstructural protein 5A (NS5A) is a zinc-binding and proline-rich hydrophilic phosphoprotein that plays a key role in Hepatitis C virus RNA replication. It appears to be a dimeric form without trans-membrane helices.

<span class="mw-page-title-main">3C-like protease</span> Class of enzymes

The 3C-like protease (3CLpro) or main protease (Mpro), formally known as C30 endopeptidase or 3-chymotrypsin-like protease, is the main protease found in coronaviruses. It cleaves the coronavirus polyprotein at eleven conserved sites. It is a cysteine protease and a member of the PA clan of proteases. It has a cysteine-histidine catalytic dyad at its active site and cleaves a Gln–(Ser/Ala/Gly) peptide bond.

<span class="mw-page-title-main">Mesoniviridae</span> Family of viruses

Mesoniviridae is a family of enveloped, positive-strand RNA viruses in the order Nidovirales which infect mosquitoes. The family is named after the size of the genomes relative to other nidoviruses, with meso- coming from the Greek word mesos, which means medium, and -ni being an abbreviation of nido.

<span class="mw-page-title-main">Positive-strand RNA virus</span> Class of viruses in the Baltimore classification

Positive-strand RNA viruses are a group of related viruses that have positive-sense, single-stranded genomes made of ribonucleic acid. The positive-sense genome can act as messenger RNA (mRNA) and can be directly translated into viral proteins by the host cell's ribosomes. Positive-strand RNA viruses encode an RNA-dependent RNA polymerase (RdRp) which is used during replication of the genome to synthesize a negative-sense antigenome that is then used as a template to create a new positive-sense viral genome.

Coronavirus genomes are positive-sense single-stranded RNA molecules with an untranslated region (UTR) at the 5′ end which is called the 5′ UTR. The 5′ UTR is responsible for important biological functions, such as viral replication, transcription and packaging. The 5′ UTR has a conserved RNA secondary structure but different Coronavirus genera have different structural features described below.

Rio Negro virus is an alphavirus that was first isolated in Argentina in 1980. The virus was first called Ag80-663 but was renamed to Rio Negro virus in 2005. It is a former member of the Venezuelan equine encephalitis complex (VEEC), which are a group of alphaviruses in the Americas that have the potential to emerge and cause disease. Río Negro virus was recently reclassified as a distinct species. Closely related viruses include Mucambo virus and Everglades virus.

<span class="mw-page-title-main">ORF3a</span> Gene found in coronaviruses of the subgenus Sarbecovirus

ORF3a is a gene found in coronaviruses of the subgenus Sarbecovirus, including SARS-CoV and SARS-CoV-2. It encodes an accessory protein about 275 amino acid residues long, which is thought to function as a viroporin. It is the largest accessory protein and was the first of the SARS-CoV accessory proteins to be described.

<span class="mw-page-title-main">ORF8</span> Gene that encodes a viral accessory protein

ORF8 is a gene that encodes a viral accessory protein, Betacoronavirus NS8 protein, in coronaviruses of the subgenus Sarbecovirus. It is one of the least well conserved and most variable parts of the genome. In some viruses, a deletion splits the region into two smaller open reading frames, called ORF8a and ORF8b - a feature present in many SARS-CoV viral isolates from later in the SARS epidemic, as well as in some bat coronaviruses. For this reason the full-length gene and its protein are sometimes called ORF8ab. The full-length gene, exemplified in SARS-CoV-2, encodes a protein with an immunoglobulin domain of unknown function, possibly involving interactions with the host immune system. It is similar in structure to the ORF7a protein, suggesting it may have originated through gene duplication.

Planarian secretory cell nidovirus (PSCNV) is a virus of the species Planidovirus 1, a nidovirus notable for its extremely large genome. At 41.1 kilobases, it is the largest known genome of an RNA virus. It was discovered by inspecting the transcriptomes of the planarian flatworm Schmidtea mediterranea and is the first known RNA virus infecting planarians. It was first described in 2018.

<span class="mw-page-title-main">Nidoviral papain-like protease</span> Papain-like protease protein domain

The nidoviral papain-like protease is a papain-like protease protein domain encoded in the genomes of nidoviruses. It is expressed as part of a large polyprotein from the ORF1a gene and has cysteine protease enzymatic activity responsible for proteolytic cleavage of some of the N-terminal viral nonstructural proteins within the polyprotein. A second protease also encoded by ORF1a, called the 3C-like protease or main protease, is responsible for the majority of further cleavages. Coronaviruses have one or two papain-like protease domains; in SARS-CoV and SARS-CoV-2, one PLPro domain is located in coronavirus nonstructural protein 3 (nsp3). Arteriviruses have two to three PLP domains. In addition to their protease activity, PLP domains function as deubiquitinating enzymes (DUBs) that can cleave the isopeptide bond found in ubiquitin chains. They are also "deISGylating" enzymes that remove the ubiquitin-like domain interferon-stimulated gene 15 (ISG15) from cellular proteins. These activities are likely responsible for antagonizing the activity of the host innate immune system. Because they are essential for viral replication, papain-like protease domains are considered drug targets for the development of antiviral drugs against human pathogens such as MERS-CoV, SARS-CoV, and SARS-CoV-2.

<span class="mw-page-title-main">Nsp12</span> Protein in the Coronavirus genome

Nsp12 is a non-structural protein in the Coronavirus genome. Its gene is part of the ORF1ab reading frame and it is part of the pp1ab polyprotein; it is cleaved by 3CLpro.

References

  1. 1 2 3 4 5 6 7 8 9 10 11 12 Hartenian E, Nandakumar D, Lari A, Ly M, Tucker JM, Glaunsinger BA (September 2020). "The molecular virology of coronaviruses". The Journal of Biological Chemistry. 295 (37): 12910–12934. doi: 10.1074/jbc.REV120.013930 . PMC   7489918 . PMID   32661197.
  2. 1 2 3 4 5 6 7 8 V'kovski P, Kratzel A, Steiner S, Stalder H, Thiel V (March 2021). "Coronavirus biology and replication: implications for SARS-CoV-2". Nature Reviews. Microbiology. 19 (3): 155–170. doi:10.1038/s41579-020-00468-6. PMC   7592455 . PMID   33116300.
  3. 1 2 3 4 5 6 7 8 Posthuma CC, Te Velthuis AJ, Snijder EJ (April 2017). "Nidovirus RNA polymerases: Complex enzymes handling exceptional RNA genomes". Virus Research. 234: 58–73. doi:10.1016/j.virusres.2017.01.023. PMC   7114556 . PMID   28174054.
  4. 1 2 3 4 5 6 7 8 Gulyaeva AA, Gorbalenya AE (January 2021). "A nidovirus perspective on SARS-CoV-2". Biochemical and Biophysical Research Communications. 538: 24–34. doi:10.1016/j.bbrc.2020.11.015. PMC   7664520 . PMID   33413979.
  5. 1 2 Wang D, Jiang A, Feng J, Li G, Guo D, Sajid M, et al. (May 2021). "The SARS-CoV-2 subgenome landscape and its novel regulatory features". Molecular Cell. 81 (10): 2135–2147.e5. doi:10.1016/j.molcel.2021.02.036. PMC   7927579 . PMID   33713597.
  6. Irigoyen N, Firth AE, Jones JD, Chung BY, Siddell SG, Brierley I (February 2016). "High-Resolution Analysis of Coronavirus Gene Expression by RNA Sequencing and Ribosome Profiling". PLOS Pathogens. 12 (2): e1005473. doi: 10.1371/journal.ppat.1005473 . PMC   4769073 . PMID   26919232.
  7. Finkel Y, Mizrahi O, Nachshon A, Weingarten-Gabbay S, Morgenstern D, Yahalom-Ronen Y, et al. (January 2021). "The coding capacity of SARS-CoV-2". Nature. 589 (7840): 125–130. Bibcode:2021Natur.589..125F. doi: 10.1038/s41586-020-2739-1 . PMID   32906143. S2CID   221624633.
  8. Smith EC, Denison MR (5 December 2013). "Coronaviruses as DNA wannabes: a new model for the regulation of RNA virus replication fidelity". PLOS Pathogens. 9 (12): e1003760. doi: 10.1371/journal.ppat.1003760 . PMC   3857799 . PMID   24348241.
  9. 1 2 3 Ogando NS, Ferron F, Decroly E, Canard B, Posthuma CC, Snijder EJ (7 August 2019). "The Curious Case of the Nidovirus Exoribonuclease: Its Role in RNA Synthesis and Replication Fidelity". Frontiers in Microbiology. 10: 1813. doi: 10.3389/fmicb.2019.01813 . PMC   6693484 . PMID   31440227.
  10. Rohaim MA, El Naggar RF, Clayton E, Munir M (January 2021). "Structural and functional insights into non-structural proteins of coronaviruses". Microbial Pathogenesis. 150: 104641. doi:10.1016/j.micpath.2020.104641. PMC   7682334 . PMID   33242646.
  11. 1 2 Chen Y, Liu Q, Guo D (April 2020). "Emerging coronaviruses: Genome structure, replication, and pathogenesis". Journal of Medical Virology. 92 (4): 418–423. doi:10.1002/jmv.25681. PMC   7167049 . PMID   31967327.
  12. Saberi A, Gulyaeva AA, Brubacher JL, Newmark PA, Gorbalenya AE (November 2018). "A planarian nidovirus expands the limits of RNA genome size". PLOS Pathogens. 14 (11): e1007314. doi: 10.1371/journal.ppat.1007314 . PMC   6211748 . PMID   30383829. S2CID   53872740.
  13. Koyama T, Platt D, Parida L (July 2020). "Variant analysis of SARS-CoV-2 genomes". Bulletin of the World Health Organization. 98 (7): 495–504. doi:10.2471/BLT.20.253591. PMC   7375210 . PMID   32742035.
  14. Richardson, Robin (August 22, 2021). "Open Wide". The Marshall News Messenger. pp. A1, A2. Retrieved 21 November 2022.