The RNA-binding Proteins Database (RBPDB) is a biological database of RNA-binding protein specificities that includes experimental observations of RNA-binding sites. The experimental results included are both in vitro and in vivo from primary literature. [1] It includes four metazoan species, which are Homo sapiens, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans. RNA-binding domains included in this database are RNA recognition motif, K homology, CCCH zinc finger, and more domains. As of 2021 [update] , the latest RBPDB release (v1.3, September 2012) includes 1,171 RNA-binding proteins. [2]
Transcription and translation processes are different in prokaryotes and eukaryotes. Unlike prokaryotes, these two processes occur separately in eukaryote's nucleus and cytoplasm. Because of this, eukaryotes apply a strategy called post-transcriptional modification which includes splicing, editing and polyadenylation to process the pre-mRNA. RNA-binding proteins ( RBPs ) play critical role during this process. All RBPs can bind to RNA depends on different specificities and affinities. [3] [4] [5] RBPs contain at least one RNA-binding domains and usually they have multiple binding domains. RNA-binding domain (RBD, also known as RNP domain and RNA recognition motif, RRM), K-homology (KH) domain (type I and type II), RGG (Arg-Gly-Gly) box, Sm domain; DEAD/DEAH box, zinc finger (ZnF, mostly C-x8-X-x5-X-x3-H), double stranded RNA-binding domain (dsRBD), cold-shock domain; Pumilio/FBF (PUF or Pum-HD) domain, and the Piwi/Argonaute/Zwille (PAZ) domain have been well characterized. [6] [7]
RBPs are constructed by multiple binding domains. These domains contain a few basic modular units. Comparing with a single motif, RBPs can recognize a much longer stretch of nucleic acids with those multiple motifs. Meanwhile, RBPs bind to RNA by forming weak interactions. The weak interaction surface is largely increased by these motifs. As the result, RBPs can bind RNA with higher specificity and affinity than single domain. [8] RNA-binding protein database has three main specific categories. They are RNA recognition motif (RRM), K-Homology domain (KH domain) and zinc fingers.
In Lunde's article, their group has introduced different types of RNA-binding protein motif and their specific functions. [7]
RNA recognition rotif (RRM) contains about 80–90 amino acids that form four-stranded anti-parallel β-sheet with two helices (βαββαβ topology). The β-sheet plays critical role for RNA recognition. Usually, three conserved residues on the β-sheet are very important for this recognition process. Specifically, an Arg or Lys residue forms a salt bridge to the phosphodiester backbone and another two aromatic residues make stacking interactions with the nucleobases. Each of these four β-sheet recognize one nucleotides. However, with exposed loops and additional secondary structure, RRM can recognized up to 8 nucleotides. [7] [9]
K-homology domain (KH domain) was the first identified in the human. It is from heterogeneous nuclear ribonucleoprotein (hnRNP) K. Therefore, binding domains that belong to this family are called K-Homology domain. It is a domain that binds to both ssDNA and ssRNA. Eukaryotes, eubacteria and archaea usually have this type of domains. The domain contains about 70 amino acids. The important signature sequence of this domain is (I/L/V)IGXXGXX(I/L/V). All KH domains contain three-stranded β-sheet and three α-helices. There are two subfamilies of this domain. Type I KH domain (βααββα topology) and type II KH domain (αββααβ topology). For both classes, the GXXG loop, the flanking helices, the β-strand and the variable loop between β2 and β3 (type I) or between α2 and β2 (type II) play a very important role in recognizing RNA. [7] [10]
Zinc fingers are the domains contain zinc coordinated residues. There are three main types of this domain which are Cys2His2 (CCHH), CCCH or CCHC. Generally, there are several repeats of this domain work together in a protein. When CCHH zinc finger binds to DNA, residues in its recognition α-helix forming hydrogen bonds to Watson–Crick base pairs in the major groove. When It binds to RNA, same residues used to recognize DNA may still be used to recognize RNA. The strategy used by zinc figure to distinguish these two type of nucleotides may contain distinct structural arrangement of this domain. CCCH and CCHC zinc fingers bind to an AU-rich RNA element. Different from CCHH zinc figure, the shape of the protein is the primary determinant of specificity. [7] [11]
In Ray and Kazan's paper, they address the question about sequence preference of RBPs. In their research, one single RBP is incubated with a vast molar excess of a complex pool of RNAs. The protein is recovered by affinity selection and associated RNAs are interrogated by microarray and computational analyses. Their results show that RNA-binding proteins have sequence preference and Identical or closely related RBPs will bind to specific similar RNA sequence. [12]
Right now, RNA-binding protein database (RBPDB) contains 1171 RNA-binding proteins from Homo sapiens, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans. Proteins can be searched by domain or species. Both ways will lead to the detail information list of proteins which includes gene symbol, annotation ID, synonyms, gene description, species, RNA-binding domain, number of experiment and homologs. The link on the number of experiments leads to the research articles related to the protein. Also, in this database users can search experiments related to specific RNA binding sequence. Furthermore, this site can help users predict the binding sites for a sequence.
In molecular biology, a transcription factor (TF) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The function of TFs is to regulate—turn on and off—genes in order to make sure that they are expressed in the desired cells at the right time and in the right amount throughout the life of the cell and the organism. Groups of TFs function in a coordinated fashion to direct cell division, cell growth, and cell death throughout life; cell migration and organization during embryonic development; and intermittently in response to signals from outside the cell, such as a hormone. There are approximately 1600 TFs in the human genome. Transcription factors are members of the proteome as well as regulome.
A zinc finger is a small protein structural motif that is characterized by the coordination of one or more zinc ions (Zn2+) which stabilizes the fold. It was originally coined to describe the finger-like appearance of a hypothesized structure from the African clawed frog (Xenopus laevis) transcription factor IIIA. However, it has been found to encompass a wide variety of differing protein structures in eukaryotic cells. Xenopus laevis TFIIIA was originally demonstrated to contain zinc and require the metal for function in 1983, the first such reported zinc requirement for a gene regulatory protein followed soon thereafter by the Krüppel factor in Drosophila. It often appears as a metal-binding domain in multi-domain proteins.
Histone acetyltransferases (HATs) are enzymes that acetylate conserved lysine amino acids on histone proteins by transferring an acetyl group from acetyl-CoA to form ε-N-acetyllysine. DNA is wrapped around histones, and, by transferring an acetyl group to the histones, genes can be turned on and off. In general, histone acetylation increases gene expression.
DNA-binding proteins are proteins that have DNA-binding domains and thus have a specific or general affinity for single- or double-stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that identify a base pair.
SR proteins are a conserved family of proteins involved in RNA splicing. SR proteins are named because they contain a protein domain with long repeats of serine and arginine amino acid residues, whose standard abbreviations are "S" and "R" respectively. SR proteins are ~200-600 amino acids in length and composed of two domains, the RNA recognition motif (RRM) region and the RS domain. SR proteins are more commonly found in the nucleus than the cytoplasm, but several SR proteins are known to shuttle between the nucleus and the cytoplasm.
RNA-binding proteins are proteins that bind to the double or single stranded RNA in cells and participate in forming ribonucleoprotein complexes. RBPs contain various structural motifs, such as RNA recognition motif (RRM), dsRNA binding domain, zinc finger and others. They are cytoplasmic and nuclear proteins. However, since most mature RNA is exported from the nucleus relatively quickly, most RBPs in the nucleus exist as complexes of protein and pre-mRNA called heterogeneous ribonucleoprotein particles (hnRNPs). RBPs have crucial roles in various cellular processes such as: cellular function, transport and localization. They especially play a major role in post-transcriptional control of RNAs, such as: splicing, polyadenylation, mRNA stabilization, mRNA localization and translation. Eukaryotic cells express diverse RBPs with unique RNA-binding activity and protein–protein interaction. According to the Eukaryotic RBP Database (EuRBPDB), there are 2961 genes encoding RBPs in humans. During evolution, the diversity of RBPs greatly increased with the increase in the number of introns. Diversity enabled eukaryotic cells to utilize RNA exons in various arrangements, giving rise to a unique RNP (ribonucleoprotein) for each RNA. Although RBPs have a crucial role in post-transcriptional regulation in gene expression, relatively few RBPs have been studied systematically.It has now become clear that RNA–RBP interactions play important roles in many biological processes among organisms.
A DNA-binding domain (DBD) is an independently folded protein domain that contains at least one structural motif that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence or have a general affinity to DNA. Some DNA-binding domains may also include nucleic acids in their folded structure.
Therapeutic gene modulation refers to the practice of altering the expression of a gene at one of various stages, with a view to alleviate some form of ailment. It differs from gene therapy in that gene modulation seeks to alter the expression of an endogenous gene whereas gene therapy concerns the introduction of a gene whose product aids the recipient directly.
Cleavage and polyadenylation specificity factor (CPSF) is involved in the cleavage of the 3' signaling region from a newly synthesized pre-messenger RNA (pre-mRNA) molecule in the process of gene transcription. In eukaryotes, messenger RNA precursors (pre-mRNA) are transcribed in the nucleus from DNA by the enzyme, RNA polymerase II. The pre-mRNA must undergo post-transcriptional modifications, forming mature RNA (mRNA), before they can be transported into the cytoplasm for translation into proteins. The post-transcriptional modifications are: the addition of a 5' m7G cap, splicing of intronic sequences, and 3' cleavage and polyadenylation.
Poly(A)-binding protein is an RNA-binding protein which triggers the binding of eukaryotic initiation factor 4 complex (eIF4G) directly to the poly(A) tail of mRNA which is 200-250 nucleotides long. The poly(A) tail is located on the 3' end of mRNA and was discovered by Mary Edmonds, who also characterized the poly-A polymerase enzyme that generates the poly(a) tail. The binding protein is also involved in mRNA precursors by helping polyadenylate polymerase add the poly(A) nucleotide tail to the pre-mRNA before translation. The nuclear isoform selectively binds to around 50 nucleotides and stimulates the activity of polyadenylate polymerase by increasing its affinity towards RNA. Poly(A)-binding protein is also present during stages of mRNA metabolism including nonsense-mediated decay and nucleocytoplasmic trafficking. The poly(A)-binding protein may also protect the tail from degradation and regulate mRNA production. Without these two proteins in-tandem, then the poly(A) tail would not be added and the RNA would degrade quickly.
The signal recognition particle RNA, is part of the signal recognition particle (SRP) ribonucleoprotein complex. SRP recognizes the signal peptide and binds to the ribosome, halting protein synthesis. SRP-receptor is a protein that is embedded in a membrane, and which contains a transmembrane pore. When the SRP-ribosome complex binds to SRP-receptor, SRP releases the ribosome and drifts away. The ribosome resumes protein synthesis, but now the protein is moving through the SRP-receptor transmembrane pore.
Synaptotagmin-binding, cytoplasmic RNA-interacting protein (SYNCRIP), also known as heterogeneous nuclear ribonucleoprotein (hnRNP) Q or NS1-associated protein-1 (NSAP-1), is a protein that in humans is encoded by the SYNCRIP gene. As the name implies, SYNCRIP is localized predominantly in the cytoplasm. It is evolutionarily conserved across eukaryotes and participates in several cellular and disease pathways, especially in neuronal and muscular development. In humans, there are three isoforms, all of which are associated in vitro with pre-mRNAs, mRNA splicing intermediates, and mature mRNA-protein complexes, including mRNA turnover.
RNA-binding protein 4 is a protein that in humans is encoded by the RBM4 gene.
Zinc finger protein chimera are chimeric proteins composed of a DNA-binding zinc finger protein domain and another domain through which the protein exerts its effect. The effector domain may be a transcriptional activator (A) or repressor (R), a methylation domain (M) or a nuclease (N).
RNA recognition motif, RNP-1 is a putative RNA-binding domain of about 90 amino acids that are known to bind single-stranded RNAs. It was found in many eukaryotic proteins.
TALeffectors are proteins secreted by some β- and γ-proteobacteria. Most of these are Xanthomonads. Plant pathogenic Xanthomonas bacteria are especially known for TALEs, produced via their type III secretion system. These proteins can bind promoter sequences in the host plant and activate the expression of plant genes that aid bacterial infection. The TALE domain responsible for binding to DNA is known to have 1.5 to 33.5 short sequences that are repeated multiple times. Each of these repeats was found to be specific for a certain base pair of the DNA. These repeats also have repeat variable residues (RVD) that can detect specific DNA base pairs. They recognize plant DNA sequences through a central repeat domain consisting of a variable number of ~34 amino acid repeats. There appears to be a one-to-one correspondence between the identity of two critical amino acids in each repeat and each DNA base in the target sequence. These proteins are interesting to researchers both for their role in disease of important crop species and the relative ease of retargeting them to bind new DNA sequences. Similar proteins can be found in the pathogenic bacterium Ralstonia solanacearum and Burkholderia rhizoxinica, as well as yet unidentified marine microorganisms. The term TALE-likes is used to refer to the putative protein family encompassing the TALEs and these related proteins.
Zinc finger transcription factors or ZF-TFs, are transcription factors composed of a zinc finger-binding domain and any of a variety of transcription-factor effector-domains that exert their modulatory effect in the vicinity of any sequence to which the protein domain binds.
The WRKY domain is found in the WRKY transcription factor family, a class of transcription factors. The WRKY domain is found almost exclusively in plants although WRKY genes appear present in some diplomonads, social amoebae and other amoebozoa, and fungi incertae sedis. They appear absent in other non-plant species. WRKY transcription factors have been a significant area of plant research for the past 20 years. The WRKY DNA-binding domain recognizes the W-box (T)TGAC(C/T) cis-regulatory element.
Archaeal transcription factor B is a protein family of extrinsic transcription factors that guide the initiation of RNA transcription in organisms that fall under the domain of Archaea. It is homologous to eukaryotic TFIIB and, more distantly, to bacterial sigma factor. Like these proteins, it is involved in forming transcription preinitiation complexes. Its structure includes several conserved motifs which interact with DNA and other transcription factors, notably the single type of RNA polymerase that performs transcription in Archaea.
Zinc finger protein 226 is a protein that in humans is encoded by the ZNF226 gene.