| Nucleocapsid protein | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Model of the external structure of the SARS-CoV-2 virion. [1] The N protein, contained entirely within the virion, is not visible. ● Blue: envelope ● Turquoise: spike glycoprotein (S) ● Red: envelope proteins (E) ● Green: membrane proteins (M) ● Orange: glycans | |||||||||
| Identifiers | |||||||||
| Symbol | CoV_nucleocap | ||||||||
| Pfam | PF00937 | ||||||||
| InterPro | IPR001218 | ||||||||
| |||||||||
The nucleocapsid (N) protein is a protein that packages the positive-sense RNA genome of coronaviruses to form ribonucleoprotein structures enclosed within the viral capsid. [2] [3] In addition to its interactions with RNA, N forms protein-protein interactions with the coronavirus membrane protein (M) during the process of viral assembly. [2] [3] N also has additional functions in manipulating the cell cycle of the host cell. [3] [4] The N protein is highly immunogenic and antibodies to N are found in patients recovered from SARS and COVID-19. [5]
The N protein is composed of two protein domains connected by an intrinsically disordered region (IDR) known as the linker region, with additional disordered segments at each terminus. [2] [3] A third small domain at the C-terminal tail appears to have an ordered alpha helical secondary structure and may be involved in the formation of higher-order oligomeric assemblies. [6] In SARS-CoV, the causative agent of SARS, the N protein is 422 amino acid residues long [2] and in SARS-CoV-2, the causative agent of COVID-19, it is 419 residues long. [6] [7]
Both the N-terminal and C-terminal domains are capable of binding RNA. The C-terminal domain forms a dimer that is likely to be the native functional state. [2] Parts of the IDR, particularly a conserved sequence motif rich in serine and arginine residues (the SR-rich region), may also be implicated in dimer formation, though reports on this vary. [2] [3] Although higher-order oligomers formed through the C-terminal domain have been observed crystallographically, it is unclear if these structures have a physiological role. [2] [8]
The C-terminal dimer has been structurally characterized by X-ray crystallography for several coronaviruses and has a highly conserved structure. [6] The N-terminal domain - sometimes known as the RNA-binding domain, though other parts of the protein also interact with RNA - has also been crystallized and has been studied by nuclear magnetic resonance spectroscopy in the presence of RNA. [9]
The N protein is post-translationally modified by phosphorylation at sites predominantly located in the IDR, particularly in the SR-rich region. [2] [10] It can be arginine methylated by protein arginine methyltransferase 1 (PRMT1) at residues R95 and R177. [11] Type I PRMT inhibitor (MS023) or substitution of R95 or R177 with lysine inhibited interaction of N protein with the 5'-UTR of SARS-CoV-2 genomic RNA, a property required for viral packaging. In several coronaviruses, ADP-ribosylation of the N protein has also been reported. [12] [10] With unclear functional significance, the SARS-CoV N protein has been observed to be SUMOylated and the N proteins of several coronaviruses including SARS-CoV-2 have been observed to be proteolytically cleaved. [10] [13] [14]
The N protein is one of the most highly expressed coronaviral proteins in infected host cells. [15] Like the other structural proteins, the gene encoding the N protein is located toward the 3' end of the genome. [3]
N protein is localized primarily to the cytoplasm. [3] In many coronaviruses, a population of N protein is localized to the nucleolus, [3] [4] [16] thought to be associated with its effects on the cell cycle. [4]
The N protein binds to RNA to form ribonucleoprotein (RNP) structures for packaging the genome into the viral capsid. [2] [3] The RNP particles formed are roughly spherical and are organized in flexible helical structures inside the virus. [2] [3] Formation of RNPs is thought to involve allosteric interactions between RNA and multiple RNA-binding regions of the protein. [2] [8] Dimerization of N is important for assembly of RNPs. Encapsidation of the genome occurs through interactions between N and M. [2] [3] N is essential for viral assembly. [3] N also serves as a chaperone protein for the formation of RNA structure in the genomic RNA. [3] [8]
Synthesis of genomic RNA appears to involve participation by the N protein. N is physically colocalized with the viral RNA-dependent RNA polymerase early in the replication cycle and forms interactions with non-structural protein 3, a component of the replicase-transcriptase complex. [3] Although N appears to facilitate efficient replication of genomic RNA, it is not required for RNA transcription in all coronaviruses. [3] [18] In at least one coronavirus, transmissible gastroenteritis virus (TGEV), N is involved in template switching in the production of subgenomic mRNAs, a process that is a distinctive feature of viruses in the order Nidovirales . [3] [18] [19]
Coronaviruses manipulate the cell cycle of the host cell through various mechanisms. In several coronaviruses, including SARS-CoV, the N protein has been reported to cause cell cycle arrest in S phase through interactions with cyclin-CDK. [3] [4] In SARS-CoV, a cyclin box-binding region in the N protein can serve as a cyclin-CDK phosphorylation substrate. [3] Trafficking of N to the nucleolus may also play a role in cell cycle effects. [4] More broadly, N may be involved in reduction of host cell protein translation activity. [3]
The N protein is involved in viral pathogenesis via its effects on components of the immune system. In SARS-CoV, [3] [20] [21] MERS-CoV, [22] and SARS-CoV-2, [23] [24] N has been reported as suppressing interferon responses.
The sequences and structures of N proteins from different coronaviruses, particularly the C-terminal domains, appear to be well conserved. [2] [6] [25] Similarities between the structure and topology of the N proteins of coronaviruses and arteriviruses suggest a common evolutionary origin and supports the classification of these two groups in the common order Nidovirales . [2] [3]
Examination of SARS-CoV-2 sequences collected during the COVID-19 pandemic found that missense mutations were most common in the central linker region of the protein, suggesting this relatively unstructured region is more tolerant of mutations than the structured domains. [6] A separate study of SARS-CoV-2 sequences identified at least one site in the N protein under positive selection. [26]
The N protein's properties of being well conserved, not appearing to recombine frequently, and producing a strong T-cell response have led to it being studied as a potential target for coronavirus vaccines. [27] [28] [25] [29] The vaccine candidate UB-612 is one such experimental vaccine that targets the N protein, along with other viral proteins, to attempt to induce broad immunity. [30] [31]