ORF3c

Last updated
ORF3c
Identifiers
Organism SARS-CoV-2
SymbolORF3c
UniProt P0DTG1
Search for
Structures Swiss-model
Domains InterPro

ORF3c is a gene found in coronaviruses of the subgenus Sarbecovirus , including SARS-CoV and SARS-CoV-2. It was first identified in the SARS-CoV-2 genome and encodes a 41 amino acid non-structural protein of unknown function. [1] [2] [3] It is also present in the SARS-CoV genome, but was not recognized until the identification of the SARS-CoV-2 homolog. [4]

Contents

Nomenclature

There has been significant confusion in the scientific literature around the nomenclature used for the accessory proteins of SARS-CoV-2, especially several overlapping genes with ORF3a. [4] The predicted protein product of the ORF3c gene has at least once been referred to as "3b protein", [5] but it is not to be confused with the non-homologous gene ORF3b . [4] It has also been described under the names ORF3h [2] and ORF3a.iORF1. [6] The recommended nomenclature for SARS-CoV-2 uses the term ORF3c for this gene. [4]

Comparative genomics

ORF3c is an overlapping gene whose open reading frame overlaps both ORF3a and ORF3d in the SARS-CoV-2 genome. This potentially represents a rare example of all three possible reading frames of the same sequence region encoding functional proteins. [7] [4]

Bioinformatics analyses of Sarbecovirus sequences suggest that the sequence and length of ORF3c are well conserved, indicating that it is likely to encode a functional protein. [1] [3] [2] It appears to be subject to purifying selection. [1] [7]

Properties

Ribosome profiling experiments confirm that the ORF3c gene expresses a protein product. [6] The relatively short 41-residue protein is predicted to contain a transmembrane domain and has features suggestive of a viroporin. [2]

Related Research Articles

<span class="mw-page-title-main">SARS-related coronavirus</span> Species of coronavirus causing SARS and COVID-19

Severe acute respiratory syndrome–related coronavirus is a species of virus consisting of many known strains phylogenetically related to severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1) that have been shown to possess the capability to infect humans, bats, and certain other mammals. These enveloped, positive-sense single-stranded RNA viruses enter host cells by binding to the angiotensin-converting enzyme 2 (ACE2) receptor. The SARSr-CoV species is a member of the genus Betacoronavirus and of the subgenus Sarbecovirus.

Cauliflower mosaic virus (CaMV) is a member of the genus Caulimovirus, one of the six genera in the family Caulimoviridae, which are pararetroviruses that infect plants. Pararetroviruses replicate through reverse transcription just like retroviruses, but the viral particles contain DNA instead of RNA.

<span class="mw-page-title-main">Phi X 174</span> A single-stranded DNA virus that infects bacteria

The phi X 174 bacteriophage is a single-stranded DNA (ssDNA) virus that infects Escherichia coli, and the first DNA-based genome to be sequenced. This work was completed by Fred Sanger and his team in 1977. In 1962, Walter Fiers and Robert Sinsheimer had already demonstrated the physical, covalently closed circularity of ΦX174 DNA. Nobel prize winner Arthur Kornberg used ΦX174 as a model to first prove that DNA synthesized in a test tube by purified enzymes could produce all the features of a natural virus, ushering in the age of synthetic biology. In 1972–1974, Jerard Hurwitz, Sue Wickner, and Reed Wickner with collaborators identified the genes required to produce the enzymes to catalyze conversion of the single stranded form of the virus to the double stranded replicative form. In 2003, it was reported by Craig Venter's group that the genome of ΦX174 was the first to be completely assembled in vitro from synthesized oligonucleotides. The ΦX174 virus particle has also been successfully assembled in vitro. In 2012, it was shown how its highly overlapping genome can be fully decompressed and still remain functional.

<span class="mw-page-title-main">ORF7a</span> Gene found in coronaviruses of the Betacoronavirus genus

ORF7a is a gene found in coronaviruses of the Betacoronavirus genus. It expresses the Betacoronavirus NS7A protein, a type I transmembrane protein with an immunoglobulin-like protein domain. It was first discovered in SARS-CoV, the virus that causes severe acute respiratory syndrome (SARS). The homolog in SARS-CoV-2, the virus that causes COVID-19, has about 85% sequence identity to the SARS-CoV protein.

Alphasatellites are a single-stranded DNA family of satellite viruses that depend on the presence of another virus to replicate their genomes. As such, they have minimal genomes with very low genomic redundancy. The genome is a single circular single strand DNA molecule. The first alphasatellites were described in 1999 and were associated with cotton leaf curl disease and Ageratum yellow vein disease. As begomoviruses are being characterised at the molecular level an increasing number of alphasatellites are being described.

An overlapping gene is a gene whose expressible nucleotide sequence partially overlaps with the expressible nucleotide sequence of another gene. In this way, a nucleotide sequence may make a contribution to the function of one or more gene products. Overlapping genes are present in and a fundamental feature of both cellular and viral genomes. The current definition of an overlapping gene varies significantly between eukaryotes, prokaryotes, and viruses. In prokaryotes and viruses overlap must be between coding sequences but not mRNA transcripts, and is defined when these coding sequences share a nucleotide on either the same or opposite strands. In eukaryotes, gene overlap is almost always defined as mRNA transcript overlap. Specifically, a gene overlap in eukaryotes is defined when at least one nucleotide is shared between the boundaries of the primary mRNA transcripts of two or more genes, such that a DNA base mutation at any point of the overlapping region would affect the transcripts of all genes involved. This definition includes 5′ and 3′ untranslated regions (UTRs) along with introns.

Rhinolophus bat coronavirus HKU2 is a novel enveloped, single-stranded positive-sense RNA virus species in the Alphacoronavirus, or Group 1, genus with a corona-like morphology.

Coronavirus genomes are positive-sense single-stranded RNA molecules with an untranslated region (UTR) at the 5′ end which is called the 5′ UTR. The 5′ UTR is responsible for important biological functions, such as viral replication, transcription and packaging. The 5′ UTR has a conserved RNA secondary structure but different Coronavirus genera have different structural features described below.

Bat coronavirus RaTG13 is a SARS-like betacoronavirus identified in the droppings of the horseshoe bat Rhinolophus affinis. It was discovered in 2013 in bat droppings from a mining cave near the town of Tongguan in Mojiang county in Yunnan, China. In February 2020, it was identified as the closest known relative of SARS-CoV-2, the virus that causes COVID-19, sharing 96.1% nucleotide identity. However, in 2022, scientists found three closer matches in bats found 530 km south, in Feuang, Laos, designated as BANAL-52, BANAL-103 and BANAL-236.

ORF3b is a gene found in coronaviruses of the subgenus Sarbecovirus, encoding a short non-structural protein. It is present in both SARS-CoV and SARS-CoV-2, though the protein product has very different lengths in the two viruses. The encoded protein is significantly shorter in SARS-CoV-2, at only 22 amino acid residues compared to 153–155 in SARS-CoV. Both the longer SARS-CoV and shorter SARS-CoV-2 proteins have been reported as interferon antagonists. It is unclear whether the SARS-CoV-2 gene expresses a functional protein.

ORF3d is a gene found in SARS-CoV-2 and at least one closely related coronavirus found in pangolins, though it is not found in other closely related viruses within the Sarbecovirus subgenus. It is 57 codons long and encodes a novel 57 amino acid residue protein of unknown function. At least two isoforms have been described, of which the shorter 33-residue form, ORF3d-2, may be more highly expressed, or even the only form expressed. It is reported to be antigenic and antibodies to the ORF3d protein occur in patients recovered from COVID-19. There is no homolog in the genome of the otherwise closely related SARS-CoV.

<span class="mw-page-title-main">Coronavirus nucleocapsid protein</span> Most expressed structure in coronaviruses

The nucleocapsid (N) protein is a protein that packages the positive-sense RNA genome of coronaviruses to form ribonucleoprotein structures enclosed within the viral capsid. The N protein is the most highly expressed of the four major coronavirus structural proteins. In addition to its interactions with RNA, N forms protein-protein interactions with the coronavirus membrane protein (M) during the process of viral assembly. N also has additional functions in manipulating the cell cycle of the host cell. The N protein is highly immunogenic and antibodies to N are found in patients recovered from SARS and COVID-19.

<span class="mw-page-title-main">ORF3a</span> Gene found in coronaviruses of the subgenus Sarbecovirus

ORF3a is a gene found in coronaviruses of the subgenus Sarbecovirus, including SARS-CoV and SARS-CoV-2. It encodes an accessory protein about 275 amino acid residues long, which is thought to function as a viroporin. It is the largest accessory protein and was the first of the SARS-CoV accessory proteins to be described.

<span class="mw-page-title-main">ORF8</span> Gene that encodes a viral accessory protein

ORF8 is a gene that encodes a viral accessory protein, Betacoronavirus NS8 protein, in coronaviruses of the subgenus Sarbecovirus. It is one of the least well conserved and most variable parts of the genome. In some viruses, a deletion splits the region into two smaller open reading frames, called ORF8a and ORF8b - a feature present in many SARS-CoV viral isolates from later in the SARS epidemic, as well as in some bat coronaviruses. For this reason the full-length gene and its protein are sometimes called ORF8ab. The full-length gene, exemplified in SARS-CoV-2, encodes a protein with an immunoglobulin domain of unknown function, possibly involving interactions with the host immune system. It is similar in structure to the ORF7a protein, suggesting it may have originated through gene duplication.

ORF6 is a gene that encodes a viral accessory protein in coronaviruses of the subgenus Sarbecovirus, including SARS-CoV and SARS-CoV-2. It is not present in MERS-CoV. It is thought to reduce the immune system response to viral infection through interferon antagonism.

<span class="mw-page-title-main">ORF9b</span> Gene

ORF9b is a gene that encodes a viral accessory protein in coronaviruses of the subgenus Sarbecovirus, including SARS-CoV and SARS-CoV-2. It is an overlapping gene whose open reading frame is entirely contained within the N gene, which encodes coronavirus nucleocapsid protein. The encoded protein is 97 amino acid residues long in SARS-CoV and 98 in SARS-CoV-2, in both cases forming a protein dimer.

ORF9c is an open reading frame (ORF) in coronavirus genomes of the subgenus Sarbecovirus. It is 73 codons long in the SARS-CoV-2 genome. Although it is often included in lists of Sarbecovirus viral accessory protein genes, experimental and bioinformatics evidence suggests ORF9c may not be a functional protein-coding gene.

ORF10 is an open reading frame (ORF) found in the genome of the SARS-CoV-2 coronavirus. It is 38 codons long. It is not conserved in all Sarbecoviruses. In studies prompted by the COVID-19 pandemic, ORF10 attracted research interest as one of two viral accessory protein genes not conserved between SARS-CoV and SARS-CoV-2 and was initially described as a protein-coding gene likely under positive selection. However, although it is sometimes included in lists of SARS-CoV-2 accessory genes, experimental and bioinformatics evidence suggests ORF10 is likely not a functional protein-coding gene.

ORF1ab refers collectively to two open reading frames (ORFs), ORF1a and ORF1b, that are conserved in the genomes of nidoviruses, a group of viruses that includes coronaviruses. The genes express large polyproteins that undergo proteolysis to form several nonstructural proteins with various functions in the viral life cycle, including proteases and the components of the replicase-transcriptase complex (RTC). Together the two ORFs are sometimes referred to as the replicase gene. They are related by a programmed ribosomal frameshift that allows the ribosome to continue translating past the stop codon at the end of ORF1a, in a -1 reading frame. The resulting polyproteins are known as pp1a and pp1ab.

LYRa11 is a SARS-like coronavirus (SL-COV) which was identified in 2011 in samples of intermediate horseshoe bats in Baoshan, Yunnan, China. The genome of this virus strain is 29805nt long, and the similarity to the whole genome sequence of SARS-CoV that caused the SARS outbreak is 91%. It was published in 2014. Like SARS-CoV and SARS-CoV-2, LYRa11 virus uses ACE2 as a receptor for infecting cells.

References

  1. 1 2 3 Firth AE (October 2020). "A putative new SARS-CoV protein, 3c, encoded in an ORF overlapping ORF3a". The Journal of General Virology. 101 (10): 1085–1089. doi:10.1099/jgv.0.001469. PMC   7660454 . PMID   32667280.
  2. 1 2 3 4 Cagliani R, Forni D, Clerici M, Sironi M (September 2020). "Coding potential and sequence conservation of SARS-CoV-2 and related animal viruses". Infection, Genetics and Evolution. 83: 104353. doi:10.1016/j.meegid.2020.104353. PMC   7199688 . PMID   32387562.
  3. 1 2 Jungreis I, Sealfon R, Kellis M (May 2021). "SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes". Nature Communications. 12 (1): 2642. doi:10.1038/s41467-021-22905-7. hdl: 1721.1/130581 . PMC   8113528 . PMID   33976134.
  4. 1 2 3 4 5 Jungreis I, Nelson CW, Ardern Z, Finkel Y, Krogan NJ, Sato K, et al. (June 2021). "Conflicting and ambiguous names of overlapping ORFs in the SARS-CoV-2 genome: A homology-based resolution". Virology. 558: 145–151. doi:10.1016/j.virol.2021.02.013. hdl: 1721.1/130363 . PMC   7967279 . PMID   33774510.
  5. Pavesi A (July 2020). "New insights into the evolutionary features of viral overlapping genes by discriminant analysis". Virology. 546: 51–66. doi:10.1016/j.virol.2020.03.007. PMC   7157939 . PMID   32452417.
  6. 1 2 Finkel Y, Mizrahi O, Nachshon A, Weingarten-Gabbay S, Morgenstern D, Yahalom-Ronen Y, et al. (January 2021). "The coding capacity of SARS-CoV-2". Nature. 589 (7840): 125–130. doi: 10.1038/s41586-020-2739-1 . PMID   32906143.
  7. 1 2 Nelson CW, Ardern Z, Goldberg TL, Meng C, Kuo CH, Ludwig C, et al. (October 2020). "Dynamically evolving novel overlapping gene as a factor in the SARS-CoV-2 pandemic". eLife. 9: e59633. doi: 10.7554/eLife.59633 . PMC   7655111 . PMID   33001029.