Sfold

Last updated
Original author(s) Ye Ding and Charles E. Lawrence
Developer(s) Dang Long and Chaochun Liu (application modeling); Clarence Chan, Adam Wolenc, William A. Rennie and Charles S. Carmack (software development)
Initial release1 April 2003;21 years ago (2003-04-01)
Repository github.com/Ding-RNA-Lab/Sfold
Operating system Linux
Website www.healthresearch.org/sfold-software-for-sirna/

Sfold is a software program developed to predict probable RNA secondary structures through structure ensemble sampling and centroid predictions [1] [2] with a focus on assessment of RNA target accessibility, [3] for major applications to the rational design of siRNAs [4] in the suppression of gene expressions, and to the identification of targets for regulatory RNAs particularly microRNAs. [5] [6]

Contents

Development

The core RNA secondary structure prediction algorithm is based on rigorous statistical (stochastic) sampling of Boltzmann ensemble of RNA secondary structures, enabling statistical characterization of any local structural features of potential interest to experimental investigators. In a review on nucleic acid structure and prediction, [7] the potential of structure sampling described in a prototype algorithm [8] was highlighted. With the publication of the mature algorithms for Sfold, [1] [2] the sampling approach became the focus of a review [9] Both the sampling approach and the centroid predictions were discussed in a comprehensive review. [10] As an application module of the Sfold package, the STarMir program [11] has been widely used for its capability in modeling target accessibility. [6] STarMir was described in an independent study on microRNA target prediction [12] STarMir predictions have been used in an attempt to derive improved predictions. [13] Predictions by Sfold have led to new biological insights. [14] The novel ideas of ensemble sampling and centroids have been adopted by others not only for RNA problems, but also for other fundamental problems in computational biology and genomics. [15] [16] [17] [18] [19]

An implementation of stochastic sampling has been included in two widely used RNA software packages, RNA Structure [20] and the ViennaRNA Package, [21] which are also based on the Turner RNA thermodynamic parameters. [22] Sfold was featured on a Nucleic Acids Research cover, [23] and was highlighted in Science NetWatch. [24] The underlying novel model for STarMir [11] was featured in the Cell Biology section of Nature Research Highlights. [25]

Distribution

Sfold runs on Linux, and is freely available to the scientific community for non-commercial applications, and is available under license for commercial applications. Both the source code and the executables are available at GitHub.

Related Research Articles

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

Nucleic acid structure prediction is a computational method to determine secondary and tertiary nucleic acid structure from its sequence. Secondary structure can be predicted from one or several nucleic acid sequences. Tertiary structure can be predicted from the sequence, or by comparative modeling.

Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. It is an annotated, open access database originally developed at the Wellcome Trust Sanger Institute in collaboration with Janelia Farm, and currently hosted at the European Bioinformatics Institute. Rfam is designed to be similar to the Pfam database for annotating protein families.

Anders Krogh is a bioinformatician at the University of Copenhagen, where he leads the university's bioinformatics center. He is known for his pioneering work on the use of hidden Markov models in bioinformatics, and is co-author of a widely used textbook in bioinformatics. In addition, he also co-authored one of the early textbooks on neural networks. His current research interests include promoter analysis, non-coding RNA, gene prediction and protein structure prediction.

<span class="mw-page-title-main">Nucleic acid design</span>

Nucleic acid design is the process of generating a set of nucleic acid base sequences that will associate into a desired conformation. Nucleic acid design is central to the fields of DNA nanotechnology and DNA computing. It is necessary because there are many possible sequences of nucleic acid strands that will fold into a given secondary structure, but many of these sequences will have undesired additional interactions which must be avoided. In addition, there are many tertiary structure considerations which affect the choice of a secondary structure for a given design.

Douglas "Doug" H. Turner is an American chemist and Professor of Chemistry at the University of Rochester.

<span class="mw-page-title-main">Nucleic acid secondary structure</span>

Nucleic acid secondary structure is the basepairing interactions within a single nucleic acid polymer or between two polymers. It can be represented as a list of bases which are paired in a nucleic acid molecule. The secondary structures of biological DNAs and RNAs tend to be different: biological DNA mostly exists as fully base paired double helices, while biological RNA is single stranded and often forms complex and intricate base-pairing interactions due to its increased ability to form hydrogen bonds stemming from the extra hydroxyl group in the ribose sugar.

miRBase

In bioinformatics, miRBase is a biological database that acts as an archive of microRNA sequences and annotations. As of September 2010 it contained information about 15,172 microRNAs. This number has risen to 38,589 by March 2018. The miRBase registry provides a centralised system for assigning new names to microRNA genes.

This microRNA database and microRNA targets databases is a compilation of databases and web portals and servers used for microRNAs and their targets. MicroRNAs (miRNAs) represent an important class of small non-coding RNAs (ncRNAs) that regulate gene expression by targeting messenger RNAs.

In molecular biology mir-241 microRNA is a short RNA molecule. MicroRNAs function to regulate the expression levels of other genes by several mechanisms.

<span class="mw-page-title-main">I-TASSER</span>

I-TASSER is a bioinformatics method for predicting three-dimensional structure model of protein molecules from amino acid sequences. It detects structure templates from the Protein Data Bank by a technique called fold recognition. The full-length structure models are constructed by reassembling structural fragments from threading templates using replica exchange Monte Carlo simulations. I-TASSER is one of the most successful protein structure prediction methods in the community-wide CASP experiments.

EPD is a biological database and web resource of eukaryotic RNA polymerase II promoters with experimentally defined transcription start sites. Originally, EPD was a manually curated resource relying on transcript mapping experiments targeted at individual genes and published in academic journals. More recently, automatically generated promoter collections derived from electronically distributed high-throughput data produced with the CAGE or TSS-Seq protocols were added as part of a special subsection named EPDnew. The EPD web server offers additional services, including an entry viewer which enables users to explore the genomic context of a promoter in a UCSC Genome Browser window, and direct links for uploading EPD-derived promoter subsets to associated web-based promoter analysis tools of the Signal Search Analysis (SSA) and ChIP-Seq servers. EPD also features a collection of position weight matrices (PWMs) for common promoter sequence motifs.

Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.

The ViennaRNA Package is a set of standalone programs and libraries used for prediction and analysis of RNA secondary structures. The source code for the package is distributed freely and compiled binaries are available for Linux, macOS and Windows platforms. The original paper has been cited over 2000 times.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

Non-coding RNAs have been discovered using both experimental and bioinformatic approaches. Bioinformatic approaches can be divided into three main categories. The first involves homology search, although these techniques are by definition unable to find new classes of ncRNAs. The second category includes algorithms designed to discover specific types of ncRNAs that have similar properties. Finally, some discovery methods are based on very general properties of RNA, and are thus able to discover entirely new kinds of ncRNAs.

<span class="mw-page-title-main">Genome mining</span>

Genome mining describes the exploitation of genomic information for the discovery of biosynthetic pathways of natural products and their possible interactions. It depends on computational technology and bioinformatics tools. The mining process relies on a huge amount of data accessible in genomic databases. By applying data mining algorithms, the data can be used to generate new knowledge in several areas of medicinal chemistry, such as discovering novel natural products.

References

  1. 1 2 Ding, Y; Lawrence, CE (2003). "A statistical sampling algorithm for RNA secondary structure prediction". Nucleic Acids Res. 15, 31 (24): 7280–301. doi:10.1093/nar/gkg938. PMC   297010 . PMID   14654704.
  2. 1 2 Ding, Y; Chan, CY; Lawrence, CE (2005). "RNA secondary structure prediction by centroids in a Bolzmann weighed ensemble". RNA. 11 (8): 1157–1166. doi: 10.1261/rna.2500605 . PMC   1370799 . PMID   16043502.
  3. Ding, Y; Lawrence, CE (2001). "Statistical Prediction of single stranded regions in RNA secondary structure and application to predicting effective antisense target sites and beyond". Nucleic Acids Research. 1, 29 (5): 1035–46. doi: 10.1093/nar/29.5.1034 . PMC   29728 . PMID   11222752.
  4. Elbashir, SM; Harborth, J; Lendeckel, W; Yalcin, A; Weber, K; Tuschi, T (2001). ""Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells". Nature. 411 (6836): 494–8. doi:10.1038/35078107. PMID   11373684. S2CID   710341.
  5. Lee, RC; Feinbaum, RL; Ambros, V (1993). "The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14". Cell. 75 (5): 843–54. doi: 10.1016/0092-8674(93)90529-y . PMID   8252621. S2CID   205020975.
  6. 1 2 Long, D; Lee, R; William, P; Chan, CY; Ambros, V; Ding, Y (2007). "Potent effect of target secondary structure on microRNA function". Nat Struct Mol Biol. 14 (4): 287–94. doi:10.1038/nsmb1226. PMID   17401373. S2CID   650349.
  7. Zucker, M. (2000). "Calculating nucleic acid secondary structure". Curr. Opin. Struct. Biol. 10 (3): 303–310. doi:10.1016/s0959-440x(00)00088-9. PMID   10851192.
  8. Ding, Y.; Lawrence, C. E. (1999). "A Bayesian Statistical Algorithm for RNA Secondary Structure Prediction". Computers & Chemistry. 23 (3–4): 387–400. doi:10.1016/S0097-8485(99)00010-8. PMID   10404626.
  9. Mathews, David H. (2006). "Revolutions in RNA Secondary Structure Prediction". Journal of Molecular Biology. 359 (3): 526–532. doi:10.1016/j.jmb.2006.01.067. ISSN   0022-2836. PMID   16500677.
  10. Seetin, Matthew G.; Mathews, David H. (2012), "RNA Structure Prediction: An Overview of Methods", Bacterial Regulatory RNA, Methods in Molecular Biology, vol. 905, Totowa, NJ: Humana Press, pp. 99–122, doi:10.1007/978-1-61779-949-5_8, ISBN   978-1-61779-948-8, PMID   22736001 , retrieved 2023-12-05
  11. 1 2 Rennie, William; Liu, Chaochun; Carmack, C. Steven; Wolenc, Adam; Kanoria, Shaveta; Lu, Jun; Long, Dang; Ding, Ye (2014-05-06). "STarMir: a web server for prediction of microRNA binding sites". Nucleic Acids Research. 42 (W1): W114–W118. doi: 10.1093/nar/gku376 . ISSN   1362-4962. PMC   4086099 . PMID   24803672.
  12. Wong, Leon; You, Zhu-Hong; Guo, Zhen-Hao; Yi, Hai-Cheng; Chen, Zhan-Heng; Cao, Mei-Yuan (2020-07-09). "MIPDH: A Novel Computational Model for Predicting microRNA–mRNA Interactions by DeepWalk on a Heterogeneous Network". ACS Omega. 5 (28): 17022–17032. doi: 10.1021/acsomega.9b04195 . ISSN   2470-1343. PMC   7376568 . PMID   32715187.
  13. Ullah, Abu Z.M. Dayem; Sahoo, Sudhakar; Steinhöfel, Kathleen; Albrecht, Andreas A. (2012). "Derivative scores from site accessibility and ranking of miRNA target predictions". International Journal of Bioinformatics Research and Applications. 8 (3/4): 171–191. doi:10.1504/ijbra.2012.048966. ISSN   1744-5485. PMID   22961450.
  14. Adams, L. (2017). "Pri-miRNA processing: structure is the key". Nature Reviews Genetics. 18 (3): 145. doi:10.1038/nrg.2017.6. PMID   28138147. S2CID   30513706.
  15. Huang, F. W.; Qin, Jing; Reidys, Christian M; Stadler, Peter F (2009). "Target prediction and a statistical sampling algorithm for RNA-RNA interaction". Bioinformatics. 26 (2): 175–181. doi:10.1093/bioinformatics/btp635. PMC   2804298 . PMID   19910305.
  16. Harmanchi, Arif Ozgun; Gaurav, Sharma; Mathews, David H (2009). "Stochastic sampling of the RNA structural alignment space". Nucleic Acids Research. 37 (12): 4063–4075. doi:10.1093/nar/gkp276. PMC   2709569 . PMID   19429694.
  17. Hamada, M; Kiryu, H; Mituyama, T; Asai, K (2009). "Prediction of RNA secondary structure using generalized centroid estimators". Bioinformatics. 25 (4): 465–473. doi: 10.1093/bioinformatics/btn601 . PMID   19095700.
  18. Carvalho, L. E.; Lawrence, C. E. (2008). "Centroid estimation in discrete high- dimensional spaces with applications in biology". Proc Natl Acad Sci. 105 (9): 3209–14. Bibcode:2008PNAS..105.3209C. doi: 10.1073/pnas.0712329105 . PMC   2265131 . PMID   18305160.
  19. Newberg, L. A.; Thompson, W. A.; Colan, S; Smith, T. M.; McCue, L. A.; Lawrence, C. E. (2007). "Centroid estimation in discrete high- dimensional spaces with applications in biology". Bioinformatics. 23 (14): 1718–27. doi:10.1093/bioinformatics/btm241. PMC   2268014 . PMID   17488758.
  20. Bellaousov, S; Reuter, Js; Seetin, MG; Mathews, DH (2013). "RNAstructure: Web servers for RNA secondary structure prediction and analysis". Nucleic Acids Research. 41 ((Web Server Issue)): W471-4. doi: 10.1093/nar/gkt290 . PMC   3692136 . PMID   23620284.
  21. Gruber, AR; Lorenz, R; Bernhart, SH; Neuböck, R; Hofacker, IL (2008). "The Vienna RNA websuite". Nucleic Acids Research. 36 (Web Server Issue): W70-4. doi: 10.1093/nar/gkn188 . PMC   2447809 . PMID   18424795.
  22. Mathews, DH; Sabina, J; Turner, DH (1999). "Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure". J. Mol. Biol. 288 (5): 911–40. doi: 10.1006/jmbi.1999.2700 . PMID   10329189.
  23. https://academic.oup.com/nar/article/31/24/7280/2904423
  24. "TOOLS: Nucleic Acid Origami". Science. 300 (5621): 873. 2003. doi:10.1126/science.300.5621.873d. S2CID   220109027.
  25. "Research highlights". Nature. 446 (7136): 586–587. 2007. Bibcode:2007Natur.446..586.. doi: 10.1038/446586a . ISSN   0028-0836.