StarBase (biological database)

Last updated
StarBase
Database.png
Content
Description microRNA-mRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data.
Contact
Research center Sun Yat-sen University
LaboratoryKey Laboratory of Gene Engineering of the Ministry of Education
AuthorsJian-Hua Yang
Primary citationYang & al. (2011) [1]
Release date2010
Access
Website http://starbase.sysu.edu.cn/

StarBase [2] is a database for decoding miRNA-mRNA, miRNA-lncRNA, [3] miRNA-sncRNA, miRNA-circRNA, [3] miRNA-pseudogene, protein-lncRNA, [4] protein-ncRNA, protein-mRNA interactions, and ceRNA networks [5] from CLIP-Seq (HITS-CLIP, PAR-CLIP, iCLIP, CLASH) and degradome sequencing data. [1] [6] StarBase provides miRFunction and ceRNAFunction web tools to predict the function of ncRNAs (miRNAs, lncRNAs, pseudogenes) and protein-coding genes from the miRNA and ceRNA [7] regulatory networks. StarBase also developed Pan-Cancer Analysis Platform to decipher Pan-Cancer Analysis Networks of lncRNAs, miRNAs, ceRNAs, and RNA-binding proteins (RBPs) by mining clinical and expression profiles of 14 cancer types (including more than six thousand samples) from The Cancer Genome Atlas (TCGA) Data Portal.

Contents

See also

Related Research Articles

<span class="mw-page-title-main">Non-coding RNA</span> Class of ribonucleic acid that is not translated into proteins

A non-coding RNA (ncRNA) is a functional RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non-coding RNAs include transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs), as well as small RNAs such as microRNAs, siRNAs, piRNAs, snoRNAs, snRNAs, exRNAs, scaRNAs and the long ncRNAs such as Xist and HOTAIR.

<span class="mw-page-title-main">Regulation of gene expression</span> Modifying mechanisms used by cells to increase or decrease the production of specific gene products

Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products. Sophisticated programs of gene expression are widely observed in biology, for example to trigger developmental pathways, respond to environmental stimuli, or adapt to new food sources. Virtually any step of gene expression can be modulated, from transcriptional initiation, to RNA processing, and to the post-translational modification of a protein. Often, one gene regulator controls another, and so on, in a gene regulatory network.

RNA-binding proteins are proteins that bind to the double or single stranded RNA in cells and participate in forming ribonucleoprotein complexes. RBPs contain various structural motifs, such as RNA recognition motif (RRM), dsRNA binding domain, zinc finger and others. They are cytoplasmic and nuclear proteins. However, since most mature RNA is exported from the nucleus relatively quickly, most RBPs in the nucleus exist as complexes of protein and pre-mRNA called heterogeneous ribonucleoprotein particles (hnRNPs). RBPs have crucial roles in various cellular processes such as: cellular function, transport and localization. They especially play a major role in post-transcriptional control of RNAs, such as: splicing, polyadenylation, mRNA stabilization, mRNA localization and translation. Eukaryotic cells express diverse RBPs with unique RNA-binding activity and protein–protein interaction. According to the Eukaryotic RBP Database (EuRBPDB), there are 2961 genes encoding RBPs in humans. During evolution, the diversity of RBPs greatly increased with the increase in the number of introns. Diversity enabled eukaryotic cells to utilize RNA exons in various arrangements, giving rise to a unique RNP (ribonucleoprotein) for each RNA. Although RBPs have a crucial role in post-transcriptional regulation in gene expression, relatively few RBPs have been studied systematically.It has now become clear that RNA–RBP interactions play important roles in many biological processes among organisms.

Cross-linking and immunoprecipitation is a method used in molecular biology that combines UV crosslinking with immunoprecipitation in order to identify RNA binding sites of proteins on a transcriptome-wide scale, thereby increasing our understanding of post-transcriptional regulatory networks. CLIP can be used either with antibodies against endogenous proteins, or with common peptide tags or affinity purification, which enables the possibility of profiling model organisms or RBPs otherwise lacking suitable antibodies.

<span class="mw-page-title-main">Argonaute</span> Protein that plays a role in RNA silencing process

The Argonaute protein family, first discovered for its evolutionarily conserved stem cell function, plays a central role in RNA silencing processes as essential components of the RNA-induced silencing complex (RISC). RISC is responsible for the gene silencing phenomenon known as RNA interference (RNAi). Argonaute proteins bind different classes of small non-coding RNAs, including microRNAs (miRNAs), small interfering RNAs (siRNAs) and Piwi-interacting RNAs (piRNAs). Small RNAs guide Argonaute proteins to their specific targets through sequence complementarity, which then leads to mRNA cleavage, translation inhibition, and/or the initiation of mRNA decay.

ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.

<span class="mw-page-title-main">Long non-coding RNA</span> Non-protein coding transcripts longer than 200 nucleotides

Long non-coding RNAs are a type of RNA, generally defined as transcripts more than 200 nucleotides that are not translated into protein. This arbitrary limit distinguishes long ncRNAs from small non-coding RNAs, such as microRNAs (miRNAs), small interfering RNAs (siRNAs), Piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), and other short RNAs. Given that some lncRNAs have been reported to have the potential to encode small proteins or micro-peptides, the latest definition of lncRNA is a class of RNA molecules of over 200 nucleotides that have no or limited coding capacity. Long intervening/intergenic noncoding RNAs (lincRNAs) are sequences of lncRNA which do not overlap protein-coding genes.

Post-transcriptional regulation is the control of gene expression at the RNA level. It occurs once the RNA polymerase has been attached to the gene's promoter and is synthesizing the nucleotide sequence. Therefore, as the name indicates, it occurs between the transcription phase and the translation phase of gene expression. These controls are critical for the regulation of many genes across human tissues. It also plays a big role in cell physiology, being implicated in pathologies such as cancer and neurodegenerative diseases.

GeneCards is a database of human genes, which provides genomic, proteomic, transcriptomic, genetic, medical, and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science, in collaboration with LifeMap Sciences.

This microRNA database and microRNA targets databases is a compilation of databases and web portals and servers used for microRNAs and their targets. MicroRNAs (miRNAs) represent an important class of small non-coding RNAs (ncRNAs) that regulate gene expression by targeting messenger RNAs.

PAR-CLIP is a biochemical method for identifying the binding sites of cellular RNA-binding proteins (RBPs) and microRNA-containing ribonucleoprotein complexes (miRNPs). The method relies on the incorporation of ribonucleoside analogs that are photoreactive, such as 4-thiouridine (4-SU) and 6-thioguanosine (6-SG), into nascent RNA transcripts by living cells. Irradiation of the cells by ultraviolet light of 365 nm wavelength induces efficient crosslinking of photoreactive nucleoside–labeled cellular RNAs to interacting RBPs. Immunoprecipitation of the RBP of interest is followed by isolation of the crosslinked and coimmunoprecipitated RNA. The isolated RNA is converted into a cDNA library and is deep sequenced using next-generation sequencing technology.

High-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS-CLIP) is a variant of CLIP for genome-wide mapping protein–RNA binding sites or RNA modification sites in vivo. HITS-CLIP was originally used to generate genome-wide protein-RNA interaction maps for the neuron-specific RNA-binding protein and splicing factor NOVA1 and NOVA2; since then a number of other splicing factor maps have been generated, including those for PTB, RbFox2, SFRS1, hnRNP C, and even N6-Methyladenosine (m6A) mRNA modifications.

In molecular biology, competing endogenous RNAs regulate other RNA transcripts by competing for shared microRNAs (miRNAs). Models for ceRNA regulation describe how changes in the expression of one or multiple miRNA targets alter the number of unbound miRNAs and lead to observable changes in miRNA activity - i.e., the abundance of other miRNA targets. Models of ceRNA regulation differ greatly. Some describe the kinetics of target-miRNA-target interactions, where changes in the expression of one target species sequester one miRNA species and lead to changes in the dysregulation of the other target species. Others attempt to model more realistic cellular scenarios, where multiple RNA targets are affecting multiple miRNAs and where each target pair is co-regulated by multiple miRNA species. Some models focus on mRNA 3' UTRs as targets, and others consider long non-coding RNA targets as well. It's evident that our molecular-biochemical understanding of ceRNA regulation remains incomplete.

Within the field of molecular biology, the epitranscriptome includes all the biochemical modifications of the RNA within a cell. In analogy to epigenetics that describes "functionally relevant changes to the genome that do not involve a change in the nucleotide sequence", epitranscriptomics involves all functionally relevant changes to the transcriptome that do not involve a change in the ribonucleotide sequence. Thus, the epitranscriptome can be defined as the ensemble of such functionally relevant changes.

Pan-cancer analysis aims to examine the similarities and differences among the genomic and cellular alterations found across diverse tumor types. International efforts have performed pan-cancer analysis on exomes and the whole genomes of cancers, the latter including their non-coding regions. In 2018, The Cancer Genome Atlas (TCGA) Research Network used exome, transcriptome, and DNA methylome data to develop an integrated picture of commonalities, differences, and emergent themes across tumor types.

Competing endogenous RNAs hypothesis: ceRNAs regulate other RNA transcripts by competing for shared microRNAs. They are playing important roles in developmental, physiological and pathological processes, such as cancer. Multiple classes of ncRNAs and protein-coding mRNAs function as key ceRNAs (sponges) and to regulate the expression of mRNAs in plants and mammalian cells.

In molecular biology, Circular RNAs (circRNAs) refer to a class of circular RNA molecules found across all kingdoms of life. Studies in 2013 have suggested that circRNAs play important regulatory roles in miRNA activity. Researchers found that CDR1as circRNA acts as a miR-7 super-sponge that contains about 70 target sites from the same miR-7 at the same transcript. The other testis-specific circRNA, sex-determining region Y (Sry), also was found as a miR-138 sponge. About-mentioned examples suggesting that miRNA sponge effects achieved by circRNA formation may be a general phenomenon. As miR-7 modulates the expression of several oncogenes, ciRS-7/miR-7 interactions may play an important roles in cancer-related pathways. circRNA has also been shown in viral infection where it sequesters anti-viral protein to enhance viral replication.

RNA Modification Base (RMBase) is designed for decoding the landscape of RNA modifications identified from high-throughput sequencing data. It contains ~124200 N6-Methyladenosines (m6A), ~9500 pseudouridine (Ψ) modifications, ~1000 5-methylcytosine (m5C) modifications, ~1210 2′-O-methylations (2′-O-Me) and ~3130 other types of RNA modifications. RMBase demonstrated thousands of RNA modifications located within mRNAs, regulatory ncRNAs, miRNA target sites and disease-related SNPs.

<span class="mw-page-title-main">Short interspersed nuclear element</span>

Short interspersed nuclear elements (SINEs) are non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates. SINEs compose about 13% of the mammalian genome.

ncRNA therapy

A majority of the human genome is made up of non-protein coding DNA. It infers that such sequences are not commonly employed to encode for a protein. However, even though these regions do not code for protein, they have other functions and carry necessary regulatory information.They can be classified based on the size of the ncRNA. Small noncoding RNA is usually categorized as being under 200 bp in length, whereas long noncoding RNA is greater than 200bp. In addition, they can be categorized by their function within the cell; Infrastructural and Regulatory ncRNAs. Infrastructural ncRNAs seem to have a housekeeping role in translation and splicing and include species such as rRNA, tRNA, snRNA.Regulatory ncRNAs are involved in the modification of other RNAs.

References

  1. 1 2 Yang, Jian-Hua; Li Jun-Hao; Shao Peng; Zhou Hui; Chen Yue-Qin; Qu Liang-Hu (Jan 2011). "starBase: a database for exploring microRNA-mRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data". Nucleic Acids Res. England. 39 (Database issue): D202-9. doi:10.1093/nar/gkq1056. PMC   3013664 . PMID   21037263.
  2. "starBase or ENCORI: Decoding the Encyclopedia of RNA Interactomes". rnasysu.com.
  3. 1 2 "LncRNABase miRNA-lncRNA interactions: decoding miRNA-lncRNA interaction maps". starbase.sysu.edu.cn. Archived from the original on 2013-09-22.
  4. "RBP-LncRNA interactions, starBase: decoding RNA-LncRNA interaction maps". starbase.sysu.edu.cn. Archived from the original on 2013-09-22.
  5. "CeRNABase starBase: Decoding ceRNA regulatory networks from CLIP-Seq (PAR-CLIP, HITS-CLIP, iCLIP, CLASH) data. Competing endogenous RNAs (CeRNAs)". Archived from the original on 2013-09-22. Retrieved 2013-12-08.
  6. Li, JH; Liu, S; Zhou, H; Qu, LH; Yang, JH (Dec 1, 2013). "starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data". Nucleic Acids Research. 42 (1): D92-7. doi:10.1093/nar/gkt1248. PMC   3964941 . PMID   24297251.
  7. "Pan-Cancer ceRNA Regulatory Network. StarBase: Decoding miRNA-target, miRNA-ceRNA and protein-RNA interaction maps from CLIP-Seq(PAR-CLIP, HITS-CLIP, iCLIP, CLASH) data". starbase.sysu.edu.cn. Archived from the original on 15 March 2014. Retrieved 13 January 2022.