BioUML

Last updated
BioUML
Original author(s) Fedor A. Kolpakov
Developer(s) BioUML team
Initial release2002;22 years ago (2002)
Stable release
v. 2023.3 / September 2023;10 months ago (2023-09)
Written inJava
Available inEnglish
Type Bioinformatics
Website https://www.biouml.org/

BioUML (Biological Universal Modeling Language) [1] [2] is an open-source web-based software platform written in Java with support for JavaScript and R. The main field is systems biology and data analysis - visualization of biological data, modeling of biological systems, as well as access to bioinformatics databases [2] [3] . It was originally developed by Fedor Kolpakov in 2002 in collaboration between Biosoft.Ru and the Institute of Systems Biology in Novosibirsk, Russia. [3]

Contents

Available versions

The current release of BioUML is version 2023.3 released in September 2023. [4]

BioUML Server offers access to data and analysis methods installed server-side for BioUML clients (workbench and web edition) over the Internet.

BioUML Workbench is a Java application that can work standalone or as "thick client" for the BioUML server edition.

BioUML Web Edition is a web browser based "thin client" for the BioUML server edition and provides most of the functionality of the BioUML workbench. It utilizes AJAX and HTML5 <canvas> technology for interactive data editing and visual modeling.

The platform has been developed continuously since 2002 [4] and offers data analysis and visualizations for scientists involved in complex molecular biology research. The system allows for the formalized description of biological systems structure and function including tools required to make discoveries related to genomics, proteomics, transcriptomics and metabolomics. The BioUML platform is built in a modular architecture which has allowed for the relatively simple addition of new tools. This has allowed the integration of many 3rd party tools into the platform over the 7 years it has been available.[ citation needed ]

Application and usage

BioUML was used for visualization of data from Cyclonet's integrated database on cell cycle regulation and carcinogenesis in 2007 [5]

Next-generation sequencing (NGS) and other high throughput methods create huge data sets (called "big data") in the region of 100 terabytes upwards. BioUML can disseminate, analyze and produce visualizations and simulations. It allows for parameter fitting and supports several analysis techniques required to deal with large amounts of raw data. The management of large volumes of data, commonly referred to as 'big data,' poses technical challenges in terms of storage, delivery, and sharing due to the collaborative nature of research across multiple institutions. A typical genome data set might contain 500 terabytes of data which may need to be shared, often internationally using Internet2 technology. Proprietary data compression mechanisms have been created (by Valex LLC) for the NCBI Short Read Archive Project [6] that allow for the delivery of raw research data at speeds of up to 40 Gbit/s. To provide a full solution for such collaborative research, the makers of BioUML have developed a new hardware/software system in partnership with Valex LLC. This version of BioUML is called Bio datomics.

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.

<span class="mw-page-title-main">Bioclipse</span>

The Bioclipse project is a Java-based, open-source, visual platform for chemo- and bioinformatics based on the Eclipse Rich Client Platform (RCP).

Expasy is an online bioinformatics resource operated by the SIB Swiss Institute of Bioinformatics. It is an extensible and integrative portal which provides access to over 160 databases and software tools and supports a range of life science and clinical research areas, from genomics, proteomics and structural biology, to evolution and phylogeny, systems biology and medical chemistry. The individual resources are hosted in a decentralized way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions.

<span class="mw-page-title-main">Apache Taverna</span>

Apache Taverna was an open source software tool for designing and executing workflows, initially created by the myGrid project under the name Taverna Workbench, then a project under the Apache incubator. Taverna allowed users to integrate many different software components, including WSDL SOAP or REST Web services, such as those provided by the National Center for Biotechnology Information, the European Bioinformatics Institute, the DNA Databank of Japan (DDBJ), SoapLab, BioMOBY and EMBOSS. The set of available services was not finite and users could import new service descriptions into the Taverna Workbench.

jGRASP

jGRASP is a development environment that includes the automatic creation of software visualizations. It produces static visualizations of source code structure and visualizations of data structures at runtime.

<span class="mw-page-title-main">CLC bio</span>

CLC bio was a bioinformatics software company that developed a software suite subsequently purchased by QIAGEN.

<span class="mw-page-title-main">Galaxy (computational biology)</span>

Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform that aims to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. Although it was initially developed for genomics research, it is largely domain agnostic and is now used as a general bioinformatics workflow management system.

BioPAX is a RDF/OWL-based standard language to represent biological pathways at the molecular and cellular level. Its major use is to facilitate the exchange of pathway data. Pathway data captures our understanding of biological processes, but its rapid growth necessitates development of databases and computational tools to aid interpretation. However, the current fragmentation of pathway information across many databases with incompatible formats presents barriers to its effective use. BioPAX solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. BioPAX was created through a community process. Through BioPAX, millions of interactions organized into thousands of pathways across many organisms, from a growing number of sources, are available. Thus, large amounts of pathway data are available in a computable form to support visualization, analysis and biological discovery.

The National Institute for Computational Sciences (NICS) is funded by the National Science Foundation and managed by the University of Tennessee. NICS was home to Kraken, the most powerful computer in the world managed by academia. The NICS petascale scientific computing environment is housed at Oak Ridge National Laboratory (ORNL), home to the world's most powerful computing complex. The mission of NICS, a member of the Extreme Science and Engineering Discovery Environment (XSEDE - formerly TeraGrid), is to enable the scientific discoveries of researchers nationwide by providing leading-edge computational resources, together with support for their effective use, and leveraging extensive partnership opportunities.

Biological data visualization is a branch of bioinformatics concerned with the application of computer graphics, scientific visualization, and information visualization to different areas of the life sciences. This includes visualization of sequences, genomes, alignments, phylogenies, macromolecular structures, systems biology, microscopy, and magnetic resonance imaging data. Software tools used for visualizing biological data range from simple, standalone programs to complex, integrated systems.

Biosimulation is a computer-aided mathematical simulation of biological processes and systems and thus is an integral part of systems biology. Due to the complexity of biological systems simplified models are often used, which should only be as complex as necessary.

A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics.

<span class="mw-page-title-main">Geworkbench</span> Genomic data analysis software

geWorkbench is an open-source software platform for integrated genomic data analysis. It is a desktop application written in the programming language Java. geWorkbench uses a component architecture. As of 2016, there are more than 70 plug-ins available, providing for the visualization and analysis of gene expression, sequence, and structure data.

The High-performance Integrated Virtual Environment (HIVE) is a distributed computing environment used for healthcare-IT and biological research, including analysis of Next Generation Sequencing (NGS) data, preclinical, clinical and post market data, adverse events, metagenomic data, etc. Currently it is supported and continuously developed by US Food and Drug Administration, George Washington University, and by DNA-HIVE, WHISE-Global and Embleema. HIVE currently operates fully functionally within the US FDA supporting wide variety (+60) of regulatory research and regulatory review projects as well as for supporting MDEpiNet medical device postmarket registries. Academic deployments of HIVE are used for research activities and publications in NGS analytics, cancer research, microbiome research and in educational programs for students at GWU. Commercial enterprises use HIVE for oncology, microbiology, vaccine manufacturing, gene editing, healthcare-IT, harmonization of real-world data, in preclinical research and clinical studies.

Nextflow is a scientific workflow system predominantly used for bioinformatic data analyses. It imposes standards on how to programmatically author a sequence of dependent compute steps and enables their execution on various local and cloud resources.

HOCOMOCO is an open-access database providing curated and benchmarked binding motifs of human and mouse transcription factors. It captures the following data types: Homo sapiens (human) and Mus musculus (mouse) transcription factors, their DNA binding site motifs, and motif subtypes.

References

  1. Kolpakov, Fedor; Akberdin, Ilya; Kashapov, Timur; Kiselev, llya; Kolmykov, Semyon; Kondrakhin, Yury; Kutumova, Elena; Mandrik, Nikita; Pintus, Sergey; Ryabova, Anna; Sharipov, Ruslan; Yevshin, Ivan; Kel, Alexander (2019-07-02). "BioUML: an integrated environment for systems biology and collaborative analysis of biomedical data". Nucleic Acids Research. 47 (W1): W225–W233. doi:10.1093/nar/gkz440. ISSN   0305-1048. PMC   6602424 . PMID   31131402.
  2. 1 2 Kolpakov, Fedor; Akberdin, Ilya; Kiselev, Ilya; Kolmykov, Semyon; Kondrakhin, Yury; Kulyashov, Mikhail; Kutumova, Elena; Pintus, Sergey; Ryabova, Anna; Sharipov, Ruslan; Yevshin, Ivan; Zhatchenko, Sergey; Kel, Alexander (2022-05-10). "BioUML - towards a universal research platform". Nucleic Acids Research. 50 (W1): W124–W131 via Oxford Academic.
  3. 1 2 Kolpakov, Fedor A (2002). "BioUML - framework for visual modeling and simulation of biological systems". ResearchGate. Retrieved 2024-07-21.{{cite web}}: CS1 maint: url-status (link)
  4. 1 2 "BioUML development history - BioUML platform". wiki.biouml.org. Retrieved 2024-07-21.
  5. Kolpakov, F; Poroikov, V; Sharipov, R; Kondrakhin, Y; Zakharov, A; Lagunin, A; Milanesi, L; Kel, A (2007). "CYCLONET--an integrated database on cell cycle regulation and carcinogenesis". Nucleic Acids Res. 35 (Database issue): D550–6. doi:10.1093/nar/gkl912. PMC   1899094 . PMID   17202170.
  6. "Galter Health Sciences Library & Learning Center | News".