Betacoronavirus NS8 protein | |||||||||
---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||
Symbol | bCoV_NS8 | ||||||||
Pfam | PF12093 | ||||||||
InterPro | IPR022722 | ||||||||
|
ORF8 is a gene that encodes a viral accessory protein, Betacoronavirus NS8 protein, in coronaviruses of the subgenus Sarbecovirus . It is one of the least well conserved and most variable parts of the genome. [2] [3] [4] [5] In some viruses, a deletion splits the region into two smaller open reading frames, called ORF8a and ORF8b - a feature present in many SARS-CoV viral isolates from later in the SARS epidemic, as well as in some bat coronaviruses. [4] [3] For this reason the full-length gene and its protein are sometimes called ORF8ab. [3] [6] The full-length gene, exemplified in SARS-CoV-2, encodes a protein with an immunoglobulin domain of unknown function, possibly involving interactions with the host immune system. [4] [3] [1] It is similar in structure to the ORF7a protein, suggesting it may have originated through gene duplication. [7] [8]
ORF8 in SARS-CoV-2 encodes a protein of 121 amino acid residues with an N-terminal signal sequence. [4] ORF8 forms a dimer that is covalently linked by disulfide bonds. [1] It has an immunoglobulin-like domain with distant similarity to the ORF7a protein. [1] [2] Despite a similar overall fold, an insertion in ORF8 likely is responsible for different protein-protein interactions and creates an additional dimerization interface. [1] [2] Unlike ORF7a, ORF8 lacks a transmembrane helix and is therefore not a transmembrane protein, [1] [4] though it has been suggested it might have a membrane-anchored form. [3]
ORF8 in SARS-CoV and SARS-CoV-2 are very divergent, with less than 20% sequence identity. [1] The full-length ORF8 in SARS-CoV encodes a protein of 122 residues. In many SARS-CoV isolates it is split into ORF8a and ORF8b, separately expressing 39-residue ORF8a and 84-residue ORF8b proteins. [6] It has been suggested that the ORF8a and ORF8b proteins may form a protein complex. [2] [9] The cysteine residue responsible for dimerization of the SARS-CoV-2 protein is not conserved in the SARS-CoV sequence. [1] The ORF8ab protein has also been reported to form disulfide-linked multimers. [10]
The full-length SARS-CoV ORF8ab protein is post-translationally modified by N-glycosylation, [6] which is predicted to be conserved in the SARS-CoV-2 protein. [1] Under experimental conditions, both 8b and 8ab are ubiquitinated. [6]
Along with the genes for other accessory proteins, the ORF8 gene is located near those encoding the structural proteins, at the 5' end of the coronavirus RNA genome. Along with ORF6, ORF7a, and ORF7b, ORF8 is located between the membrane (M) and nucleocapsid (N) genes. [6] [4] The SARS-CoV-2 ORF8 protein has a signal sequence for trafficking to the endoplasmic reticulum (ER) [4] and has been experimentally localized to the ER. [11] It is probably a secreted protein. [4] [3]
There are variable reports in the literature regarding the localization of SARS-CoV ORF8a, ORF8b, or ORF8ab proteins. [6] It is unclear if ORF8b is expressed at significant levels under natural conditions. [10] [12] The full-length ORF8ab appears to localize to the ER. [12]
The function of the ORF8 protein is unknown. It is not essential for viral replication in either SARS-CoV [6] or SARS-CoV-2, [4] though there is conflicting evidence on whether loss of ORF8 affects the efficiency of viral replication. [13]
A function often suggested for ORF8 protein is interacting with the host immune system. [13] The SARS-CoV-2 protein is thought to have a role in immunomodulation via immune evasion or suppressing host immune responses. [4] [1] [3] It has been reported to be a type I interferon antagonist and to downregulate class I MHC. [4] [3] The SARS-CoV-2 ORF8 protein is highly immunogenic and high levels of antibodies to the protein have been found in patients with or recovered from COVID-19. [4] [14] A study indicates that ORF8 is a transcription inhibitor. [15] [16]
It has been suggested that the SARS-CoV ORF8a protein assembles into multimers and forms a viroporin. [17]
The evolutionary history of ORF8 is complex. It is among the least conserved regions of the Sarbecovirus genome. [3] [2] [4] It is subject to frequent mutations and deletions, and has been described as "hypervariable" and a recombination hotspot. [3] It has been suggested that RNA secondary structures in the region are associated with genomic instability. [3] [19]
In SARS-CoV, the ORF8 region is thought to have originated through recombination among ancestral bat coronaviruses. [3] [6] [5] [20] Among the most distinctive features of this region in SARS-CoV is the emergence of a 29-nucleotide deletion that split the full-length open reading frame into two smaller ORFs, ORF8a and ORF8b. Viral isolates from early in the SARS epidemic have a full-length, intact ORF8, but the split structure emerged later in the epidemic. [3] [6] Similar split structures have since been observed in bat coronaviruses. [21] Mutations and deletions have also been seen in SARS-CoV-2 variants. [2] [19] Based on observations in SARS-CoV, it has been suggested that changes in ORF8 may be related to host adaptation, but it is possible that ORF8 does not affect fitness in human hosts. [19] [5] In SARS-CoV, a high dN/dS ratio has been observed in ORF8, consistent with positive selection or with relaxed selection. [5]
ORF8 encodes a protein whose immunoglobulin domain (Ig) has distant similarity to that of ORF7a. [1] It has been suggested that ORF8 likely have evolved from ORF7a through gene duplication, [2] [7] [8] though some bioinformatics analyses suggest the similarity may be too low to support duplication, which is relatively uncommon in viruses. [19] Immunoglobulin domains are uncommon in coronaviruses; other than the subset of betacoronaviruses with ORF8 and ORF7a, only a small number of bat alphacoronaviruses have been identified as containing likely Ig domains, while they are absent from gammacoronaviruses and deltacoronaviruses. [2] [8] ORF8 is notably absent in MERS-CoV. [8] The beta and alpha Ig domains may be independent acquisitions, where ORF8 and ORF7a may have been acquired from host proteins. [2] It is also possible that the absence of ORF8 reflects gene loss in those lineages. [8]
Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans and birds, they cause respiratory tract infections that can range from mild to lethal. Mild illnesses in humans include some cases of the common cold, while more lethal varieties can cause SARS, MERS and COVID-19, which is causing the ongoing pandemic. In cows and pigs they cause diarrhea, while in mice they cause hepatitis and encephalomyelitis.
Severe-acute-respiratory-syndrome–related coronavirus is a species of virus consisting of many known strains. Two strains of the virus have caused outbreaks of severe respiratory diseases in humans: severe acute respiratory syndrome coronavirus 1, which caused the 2002–2004 outbreak of severe acute respiratory syndrome (SARS), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which is causing the ongoing pandemic of COVID-19. There are hundreds of other strains of SARSr-CoV, which are only known to infect non-human mammal species: bats are a major reservoir of many strains of SARSr-CoV; several strains have been identified in Himalayan palm civets, which were likely ancestors of SARS-CoV-1.
Human coronavirus NL63 (HCoV-NL63) is a species of coronavirus, specifically a Setracovirus from among the Alphacoronavirus genus. It was identified in late 2004 in patients in the Netherlands by Lia van der Hoek and Krzysztof Pyrc using a novel virus discovery method VIDISCA. Later on the discovery was confirmed by the researchers from the Rotterdam, the Netherlands The virus is an enveloped, positive-sense, single-stranded RNA virus which enters its host cell by binding to ACE2. Infection with the virus has been confirmed worldwide, and has an association with many common symptoms and diseases. Associated diseases include mild to moderate upper respiratory tract infections, severe lower respiratory tract infection, croup and bronchiolitis.
Murine coronavirus (M-CoV) is a virus in the genus Betacoronavirus that infects mice. Belonging to the subgenus Embecovirus, murine coronavirus strains are enterotropic or polytropic. Enterotropic strains include mouse hepatitis virus (MHV) strains D, Y, RI, and DVIM, whereas polytropic strains, such as JHM and A59, primarily cause hepatitis, enteritis, and encephalitis. Murine coronavirus is an important pathogen in the laboratory mouse and the laboratory rat. It is the most studied coronavirus in animals other than humans, and has been used as an animal disease model for many virological and clinical studies.
Transmissible gastroenteritis virus or Transmissible gastroenteritis coronavirus (TGEV) is a coronavirus which infects pigs. It is an enveloped, positive-sense, single-stranded RNA virus which enters its host cell by binding to the APN receptor. The virus is a member of the genus Alphacoronavirus, subgenus Tegacovirus, species Alphacoronavirus 1.
ORF7a is a gene found in coronaviruses of the Betacoronavirus genus. It expresses the Betacoronavirus NS7A protein, a type I transmembrane protein with an immunoglobulin-like protein domain. It was first discovered in SARS-CoV, the virus that causes severe acute respiratory syndrome (SARS). The homolog in SARS-CoV-2, the virus that causes COVID-19, has about 85% sequence identity to the SARS-CoV protein.
Susan R. Weiss is an American microbiologist who is a Professor of Microbiology at the Perelman School of Medicine at the University of Pennsylvania. She holds vice chair positions for the Department of Microbiology and for Faculty Development. Her research considers the biology of coronaviruses, including SARS, MERS and SARS-CoV-2. As of March 2020, Weiss serves as Co-Director of the University of Pennsylvania/Penn Medicine Center for Research on Coronavirus and Other Emerging Pathogens.
ORF3b is a gene found in coronaviruses of the subgenus Sarbecovirus, encoding a short non-structural protein. It is present in both SARS-CoV and SARS-CoV-2, though the protein product has very different lengths in the two viruses. The encoded protein is significantly shorter in SARS-CoV-2, at only 22 amino acid residues compared to 153–155 in SARS-CoV. Both the longer SARS-CoV and shorter SARS-CoV-2 proteins have been reported as interferon antagonists. It is unclear whether the SARS-CoV-2 gene expresses a functional protein.
The envelope (E) protein is the smallest and least well-characterized of the four major structural proteins found in coronavirus virions. It is an integral membrane protein less than 110 amino acid residues long; in SARS-CoV-2, the causative agent of Covid-19, the E protein is 75 residues long. Although it is not necessarily essential for viral replication, absence of the E protein may produce abnormally assembled viral capsids or reduced replication. E is a multifunctional protein and, in addition to its role as a structural protein in the viral capsid, it is thought to be involved in viral assembly, likely functions as a viroporin, and is involved in viral pathogenesis.
The membrane (M) protein is an integral membrane protein that is the most abundant of the four major structural proteins found in coronaviruses. The M protein organizes the assembly of coronavirus virions through protein-protein interactions with other M protein molecules as well as with the other three structural proteins, the envelope (E), spike (S), and nucleocapsid (N) proteins.
The nucleocapsid (N) protein is a protein that packages the positive-sense RNA genome of coronaviruses to form ribonucleoprotein structures enclosed within the viral capsid. The N protein is the most highly expressed of the four major coronavirus structural proteins. In addition to its interactions with RNA, N forms protein-protein interactions with the coronavirus membrane protein (M) during the process of viral assembly. N also has additional functions in manipulating the cell cycle of the host cell. The N protein is highly immunogenic and antibodies to N are found in patients recovered from SARS and COVID-19.
Spike (S) glycoprotein is the largest of the four major structural proteins found in coronaviruses. The spike protein assembles into trimers that form large structures, called spikes or peplomers, that project from the surface of the virion. The distinctive appearance of these spikes when visualized using negative stain transmission electron microscopy, "recalling the solar corona", gives the virus family its main name.
ORF3c is a gene found in coronaviruses of the subgenus Sarbecovirus, including SARS-CoV and SARS-CoV-2. It was first identified in the SARS-CoV-2 genome and encodes a 41 amino acid non-structural protein of unknown function. It is also present in the SARS-CoV genome, but was not recognized until the identification of the SARS-CoV-2 homolog.
ORF3a is a gene found in coronaviruses of the subgenus Sarbecovirus, including SARS-CoV and SARS-CoV-2. It encodes an accessory protein about 275 amino acid residues long, which is thought to function as a viroporin. It is the largest accessory protein and was the first of the SARS-CoV accessory proteins to be described.
ORF7b is a gene found in coronaviruses of the genus Betacoronavirus, which expresses the accessory protein Betacoronavirus NS7b protein. It is a short, highly hydrophobic transmembrane protein of unknown function.
ORF6 is a gene that encodes a viral accessory protein in coronaviruses of the subgenus Sarbecovirus, including SARS-CoV and SARS-CoV-2. It is not present in MERS-CoV. It is thought to reduce the immune system response to viral infection through interferon antagonism.
ORF9b is a gene that encodes a viral accessory protein in coronaviruses of the subgenus Sarbecovirus, including SARS-CoV and SARS-CoV-2. It is an overlapping gene whose open reading frame is entirely contained within the N gene, which encodes coronavirus nucleocapsid protein. The encoded protein is 97 amino acid residues long in SARS-CoV and 98 in SARS-CoV-2, in both cases forming a protein dimer.
ORF10 is an open reading frame (ORF) found in the genome of the SARS-CoV-2 coronavirus. It is 38 codons long. It is not conserved in all Sarbecoviruses. In studies prompted by the COVID-19 pandemic, ORF10 attracted research interest as one of two viral accessory protein genes not conserved between SARS-CoV and SARS-CoV-2 and was initially described as a protein-coding gene likely under positive selection. However, although it is sometimes included in lists of SARS-CoV-2 accessory genes, experimental and bioinformatics evidence suggests ORF10 is likely not a functional protein-coding gene.
ORF1ab refers collectively to two open reading frames (ORFs), ORF1a and ORF1b, that are conserved in the genomes of nidoviruses, a group of viruses that includes coronaviruses. The genes express large polyproteins that undergo proteolysis to form several nonstructural proteins with various functions in the viral life cycle, including proteases and the components of the replicase-transcriptase complex (RTC). Together the two ORFs are sometimes referred to as the replicase gene. They are related by a programmed ribosomal frameshift that allows the ribosome to continue translating past the stop codon at the end of ORF1a, in a -1 reading frame. The resulting polyproteins are known as pp1a and pp1ab.
The nidoviral papain-like protease is a papain-like protease protein domain encoded in the genomes of nidoviruses. It is expressed as part of a large polyprotein from the ORF1a gene and has cysteine protease enzymatic activity responsible for proteolytic cleavage of some of the N-terminal viral nonstructural proteins within the polyprotein. A second protease also encoded by ORF1a, called the 3C-like protease or main protease, is responsible for the majority of further cleavages. Coronaviruses have one or two papain-like protease domains; in SARS-CoV and SARS-CoV-2, one PLPro domain is located in coronavirus nonstructural protein 3 (nsp3). Arteriviruses have two to three PLP domains. In addition to their protease activity, PLP domains function as deubiquitinating enzymes (DUBs) that can cleave the isopeptide bond found in ubiquitin chains. They are also "deISGylating" enzymes that remove the ubiquitin-like domain interferon-stimulated gene 15 (ISG15) from cellular proteins. These activities are likely responsible for antagonizing the activity of the host innate immune system. Because they are essential for viral replication, papain-like protease domains are considered drug targets for the development of antiviral drugs against human pathogens such as MERS-CoV, SARS-CoV, and SARS-CoV-2.