BioUML

Last updated
BioUML
Original author(s) Fedor A. Kolpakov
Developer(s) BioUML team
Initial release2002;22 years ago (2002)
Stable release
v. 2023.3 / September 2023;11 months ago (2023-09)
Written inJava
Available inEnglish
Type Bioinformatics
Website https://www.biouml.org/

BioUML (Biological Universal Modeling Language) [1] [2] is an open-source web-based software platform written in Java with support for JavaScript and R. The main field is systems biology and data analysis - visualization of biological data, modeling of biological systems, as well as access to bioinformatics databases. [2] [3] It was originally developed by Fedor Kolpakov in 2002 in collaboration between Biosoft.Ru and the Institute of Systems Biology in Novosibirsk, Russia. [3]

Contents

Available versions

The current release of BioUML is version 2023.3 released in September 2023. [4]

BioUML Server offers access to data and analysis methods installed server-side for BioUML clients (workbench and web edition) over the Internet.

BioUML Workbench is a Java application that can work standalone or as "thick client" for the BioUML server edition.

BioUML Web Edition is a web browser based "thin client" for the BioUML server edition and provides most of the functionality of the BioUML workbench. It utilizes AJAX and HTML5 <canvas> technology for interactive data editing and visual modeling.

The platform has been developed continuously since 2002 [4] and offers data analysis and visualizations for scientists involved in complex molecular biology research. The system allows for the formalized description of biological systems structure and function including tools required to make discoveries related to genomics, proteomics, transcriptomics and metabolomics. The BioUML platform is built in a modular architecture which has allowed for the relatively simple addition of new tools. This has allowed the integration of many 3rd party tools into the platform over the 7 years it has been available.[ citation needed ]

Application and usage

BioUML was used for visualization of data from Cyclonet's integrated database on cell cycle regulation and carcinogenesis in 2007 [5]

Next-generation sequencing (NGS) and other high throughput methods create huge data sets (called "big data") in the region of 100 terabytes upwards. BioUML can disseminate, analyze and produce visualizations and simulations. It allows for parameter fitting and supports several analysis techniques required to deal with large amounts of raw data. The management of large volumes of data, commonly referred to as 'big data,' poses technical challenges in terms of storage, delivery, and sharing due to the collaborative nature of research across multiple institutions. A typical genome data set might contain 500 terabytes of data which may need to be shared, often internationally using Internet2 technology. Proprietary data compression mechanisms have been created (by Valex LLC) for the NCBI Short Read Archive Project [6] that allow for the delivery of raw research data at speeds of up to 40 Gbit/s. To provide a full solution for such collaborative research, the makers of BioUML have developed a new hardware/software system in partnership with Valex LLC. This version of BioUML is called Bio datomics.

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.

<span class="mw-page-title-main">Ensembl genome database project</span> Scientific project at the European Bioinformatics Institute

Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.

<span class="mw-page-title-main">Amos Bairoch</span> Swiss bioinformatician

Amos Bairoch is a Swiss bioinformatician and Professor of Bioinformatics at the Department of Human Protein Sciences of the University of Geneva where he leads the CALIPHO group at the Swiss Institute of Bioinformatics (SIB) combining bioinformatics, curation, and experimental efforts to functionally characterize human proteins.

<span class="mw-page-title-main">Generic Model Organism Database</span>

The Generic Model Organism Database (GMOD) project provides biological research communities with a toolkit of open-source software components for visualizing, annotating, managing, and storing biological data. The GMOD project is funded by the United States National Institutes of Health, National Science Foundation and the USDA Agricultural Research Service.

The myGrid consortium produces and uses a suite of tools design to “help e-Scientists get on with science and get on with scientists”. The tools support the creation of e-laboratories and have been used in domains as diverse as systems biology, social science, music, astronomy, multimedia and chemistry.

The Database of Macromolecular Motions is a bioinformatics database and software-as-a-service tool that attempts to categorize macromolecular motions, sometimes also known as conformational change. It was originally developed by Mark B. Gerstein, Werner Krebs, and Nat Echols in the Molecular Biophysics & Biochemistry Department at Yale University.

<span class="mw-page-title-main">Galaxy (computational biology)</span>

Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform that aims to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. Although it was initially developed for genomics research, it is largely domain agnostic and is now used as a general bioinformatics workflow management system.

BIOBASE is an international bioinformatics company headquartered in Wolfenbüttel, Germany. The company focuses on the generation, maintenance, and licensing of databases in the field of molecular biology, and their related software platforms.

Biological data visualization is a branch of bioinformatics concerned with the application of computer graphics, scientific visualization, and information visualization to different areas of the life sciences. This includes visualization of sequences, genomes, alignments, phylogenies, macromolecular structures, systems biology, microscopy, and magnetic resonance imaging data. Software tools used for visualizing biological data range from simple, standalone programs to complex, integrated systems.

The human gene Chromosome 3 open reading frame 14 is a gene of uncertain function located at 3p14.2 near fragile site FRBA3—which falls between this gene and the centromere. Its protein is expected to localize to the nucleus and bind DNA. Orthologs have been identified in all of the major animal groups, minus amphibians and insects, tracing as far back as the sea anemone; indicating an origin of over 1000 mya, highlighting its importance in the animal genome.

SWISS-MODEL is a structural bioinformatics web-server dedicated to homology modeling of 3D protein structures. Homology modeling is currently the most accurate method to generate reliable three-dimensional protein structure models and is routinely used in many practical applications. Homology modelling methods make use of experimental protein structures ("templates") to build models for evolutionary related proteins ("targets").

A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics.

<span class="mw-page-title-main">Geworkbench</span> Genomic data analysis software

geWorkbench is an open-source software platform for integrated genomic data analysis. It is a desktop application written in the programming language Java. geWorkbench uses a component architecture. As of 2016, there are more than 70 plug-ins available, providing for the visualization and analysis of gene expression, sequence, and structure data.

The High-performance Integrated Virtual Environment (HIVE) is a distributed computing environment used for healthcare-IT and biological research, including analysis of Next Generation Sequencing (NGS) data, preclinical, clinical and post market data, adverse events, metagenomic data, etc. Currently it is supported and continuously developed by US Food and Drug Administration, George Washington University, and by DNA-HIVE, WHISE-Global and Embleema. HIVE currently operates fully functionally within the US FDA supporting wide variety (+60) of regulatory research and regulatory review projects as well as for supporting MDEpiNet medical device postmarket registries. Academic deployments of HIVE are used for research activities and publications in NGS analytics, cancer research, microbiome research and in educational programs for students at GWU. Commercial enterprises use HIVE for oncology, microbiology, vaccine manufacturing, gene editing, healthcare-IT, harmonization of real-world data, in preclinical research and clinical studies.

EPD is a biological database and web resource of eukaryotic RNA polymerase II promoters with experimentally defined transcription start sites. Originally, EPD was a manually curated resource relying on transcript mapping experiments targeted at individual genes and published in academic journals. More recently, automatically generated promoter collections derived from electronically distributed high-throughput data produced with the CAGE or TSS-Seq protocols were added as part of a special subsection named EPDnew. The EPD web server offers additional services, including an entry viewer which enables users to explore the genomic context of a promoter in a UCSC Genome Browser window, and direct links for uploading EPD-derived promoter subsets to associated web-based promoter analysis tools of the Signal Search Analysis (SSA) and ChIP-Seq servers. EPD also features a collection of position weight matrices (PWMs) for common promoter sequence motifs.

Transmembrane Protein 217 is a protein encoded by the gene TMEM217. TMEM217 has been found to have expression correlated with the lymphatic system and endothelial tissues and has been predicted to have a function linked to the cytoskeleton.

Nextflow is a scientific workflow system predominantly used for bioinformatic data analyses. It imposes standards on how to programmatically author a sequence of dependent compute steps and enables their execution on various local and cloud resources.

HOCOMOCO is an open-access database providing curated and benchmarked binding motifs of human and mouse transcription factors. It captures the following data types: Homo sapiens (human) and Mus musculus (mouse) transcription factors, their DNA binding site motifs, and motif subtypes.

References

  1. Kolpakov, Fedor; Akberdin, Ilya; Kashapov, Timur; Kiselev, llya; Kolmykov, Semyon; Kondrakhin, Yury; Kutumova, Elena; Mandrik, Nikita; Pintus, Sergey; Ryabova, Anna; Sharipov, Ruslan; Yevshin, Ivan; Kel, Alexander (2019-07-02). "BioUML: an integrated environment for systems biology and collaborative analysis of biomedical data". Nucleic Acids Research. 47 (W1): W225–W233. doi:10.1093/nar/gkz440. ISSN   0305-1048. PMC   6602424 . PMID   31131402.
  2. 1 2 Kolpakov, Fedor; Akberdin, Ilya; Kiselev, Ilya; Kolmykov, Semyon; Kondrakhin, Yury; Kulyashov, Mikhail; Kutumova, Elena; Pintus, Sergey; Ryabova, Anna; Sharipov, Ruslan; Yevshin, Ivan; Zhatchenko, Sergey; Kel, Alexander (2022-05-10). "BioUML - towards a universal research platform". Nucleic Acids Research. 50 (W1): W124–W131. doi:10.1093/nar/gkac286 via Oxford Academic.
  3. 1 2 Kolpakov, Fedor A (2002). "BioUML - framework for visual modeling and simulation of biological systems". ResearchGate. Retrieved 2024-07-21.
  4. 1 2 "BioUML development history - BioUML platform". wiki.biouml.org. Retrieved 2024-07-21.
  5. Kolpakov, F; Poroikov, V; Sharipov, R; Kondrakhin, Y; Zakharov, A; Lagunin, A; Milanesi, L; Kel, A (2007). "CYCLONET--an integrated database on cell cycle regulation and carcinogenesis". Nucleic Acids Res. 35 (Database issue): D550–6. doi:10.1093/nar/gkl912. PMC   1899094 . PMID   17202170.
  6. "Galter Health Sciences Library & Learning Center | News".