BioBIKE

Last updated
BioBIKE
Initial release2002 (2002)
Written in Lisp
Operating system Unix-like
Available inEnglish
Type Scientific workflow, Symbolic Computing, Bioinformatics, Artificial Intelligence
License MIT Open Source
Website GitHub Repo

BioBike [1] [2] (nee. BioLingua [3] ) is a cloud-based, through-the-web programmable (Paas) symbolic biocomputing and bioinformatics platform that aims to make computational biology, and especially intelligent biocomputing (that is, the application of Artificial Intelligence to computational biology) accessible to research scientists who are not expert programmers. [4]

Contents

Unique capabilities

BioBIKE is an integrated symbolic biocomputing and bioinformatics platform, built from the start as an entirely (what is now called) cloud-based architecture where all computing is done in remote servers, and all user access is accomplished through web browsers.

BioBIKE has a built-in frame system in which all objects, data, and knowledge are represented. This enables code written either in the native Lisp, in the visual programming language, or systems of rules expressed in the SNARK theorem prover to access the whole of biological knowledge in an integrated manner.

For its time (released in 2002) it was unique in permitting users to create fully functional biocomputing programs that run on the back-end servers entirely through the web browser UI. (In modern terms it was one of the first PaaS (Platform as a Service) systems, predating even Salesforce in this capability.) Initially this programming was carried out in raw Lisp, but Jeff Elhai's team at VCU, with NSF funding, created an entirely graphical programming environment on top of BioBIKE based upon the Boxer-style programming environments. [1]

Composite of BioBIKE Visual Programming Language Style Interaction depicting function definition as well as complex flow control. BioBIKEVPL.png
Composite of BioBIKE Visual Programming Language Style Interaction depicting function definition as well as complex flow control.

Being a multi-headed, multi-threaded, multi-user, multi-tenancy cloud-based system, BioBIKE users were able to directly work together through their web browsers, remotely sharing the same listener and memory space. This permitted a unique sort of collaboration, discussed in Shrager (2007). [5]

A specialized offshoot of BioBIKE called "BioDeducta" includes SRI's SNARK theorem prover, offering unique "deductive biocomputing" capabilities. [2]

Implementation

BioBIKE is open-source software implemented using the Lisp programming language. Continuing development takes place by the BioBIKE team [6] centered at Virginia Commonwealth University .

History

BioBIKE was originally called "BioLingua", and was developed by Jeff Shrager at The Carnegie Inst. of Washington Dept. of Plant Biology, and JP Massar with funding from NASA's Astrobiology Division. Shrager and Massar wanted to create a web-based, multi-user Lisp Machine, specialized for bioinformatics. Other early contributors to the project included Mike Travers, and Jeff Elhai of VCU. Elhai obtained continuing funding from the National Science Foundation for the project, which was renamed BioBIKE. Elhai and colleagues added BioBIKE's unique visual programming language. Shrager, meanwhile, collaborated with Richard Waldinger at SRI to build SRI's (SNARK) theorem prover into BioBIKE, creating a deductive biocomputing system, called BioDeducta. [2]

Composite of Early BioBIKE Lisp-Listener Style Interaction depicting knowledge base frames, graphical I/O, and through-the-web Lisp programmability. BioBIKEListener2005.png
Composite of Early BioBIKE Lisp-Listener Style Interaction depicting knowledge base frames, graphical I/O, and through-the-web Lisp programmability.

Instances

There used to be a number of BioBIKE verticals in different biological domains, including viral pathogens, cyanobacteria and other bacteria, Arabidopsis thaliana, and several others described in the references.

See also

Related Research Articles

Bioinformatics Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques.

Planner is a programming language designed by Carl Hewitt at MIT, and first published in 1969. First, subsets such as Micro-Planner and Pico-Planner were implemented, and then essentially the whole language was implemented as Popler by Julian Davies at the University of Edinburgh in the POP-2 programming language. Derivations such as QA4, Conniver, QLISP and Ether were important tools in artificial intelligence research in the 1970s, which influenced commercial developments such as Knowledge Engineering Environment (KEE) and Automated Reasoning Tool (ART).

Bio-inspired computing, short for biologically inspired computing, is a field of study which seeks to solve computer science problems using models of biology. It relates to connectionism, social behavior, and emergence. Within computer science, bio-inspired computing relates to artificial intelligence and machine learning. Bio-inspired computing is a major subset of natural computation.

BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a huge range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.

BioPerl Collection of Perl modules for bioinformatics

BioPerl is a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications. It has played an integral role in the Human Genome Project.

Biopython Collection of open-source Python software tools for computational biology

The Biopython project is an open-source collection of non-commercial Python tools for computational biology and bioinformatics, created by an international association of developers. It contains classes to represent biological sequences and sequence annotations, and it is able to read and write to a variety of file formats. It also allows for a programmatic means of accessing online databases of biological information, such as those at NCBI. Separate modules extend Biopython's capabilities to sequence alignment, protein structure, population genetics, phylogenetics, sequence motifs, and machine learning. Biopython is one of a number of Bio* projects designed to reduce code duplication in computational biology.

Tom Knight is an American synthetic biologist and computer engineer, who was formerly a senior research scientist at the MIT Computer Science and Artificial Intelligence Laboratory, a part of the MIT School of Engineering. He now works at the synthetic biology company Ginkgo Bioworks, which he cofounded in 2008.

The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.

Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.

Biomedical text mining refers to the methods and study of how text mining may be applied to texts and literature of the biomedical and molecular biology domains. As a field of research, biomedical text mining incorporates ideas from natural language processing, bioinformatics, medical informatics and computational linguistics. The strategies developed through studies in this field are frequently applied to the biomedical and molecular biology literature available through services such as PubMed.

Carole Goble British computer scientist

Carole Anne Goble, is a British academic who is Professor of Computer Science at the University of Manchester. She is principal investigator (PI) of the myGrid, BioCatalogue and myExperiment projects and co-leads the Information Management Group (IMG) with Norman Paton.

Robert David Stevens

Robert David Stevens is a professor of bio-health informatics. and Head of Department of Computer Science at The University of Manchester

SNARK, , is a theorem prover for multi-sorted first-order logic intended for applications in artificial intelligence and software engineering, developed at SRI International.

Richard Jay Waldinger is a computer science researcher at SRI International's Artificial Intelligence Center whose interests focus on the application of automated deductive reasoning to problems in software engineering and artificial intelligence.

Galaxy (computational biology)

Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform that aims to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. Although it was initially developed for genomics research, it is largely domain agnostic and is now used as a general bioinformatics workflow management system.

Lawrence Hunter

Lawrence E. Hunter is a Professor and Director of the Center for Computational Pharmacology and of the Computational Bioscience Program at the University of Colorado School of Medicine and Professor of Computer Science at the University of Colorado Boulder. He is an internationally known scholar, focused on computational biology, knowledge-driven extraction of information from the primary biomedical literature, the semantic integration of knowledge resources in molecular biology, and the use of knowledge in the analysis of high-throughput data, as well as for his foundational work in computational biology, which led to the genesis of the major professional organization in the field and two international conferences.

BioMart

BioMart is a community-driven project to provide a single point of access to distributed research data. The BioMart project contributes open source software and data services to the international scientific community. Although the BioMart software is primarily used by the biomedical research community, it is designed in such a way that any type of data can be incorporated into the BioMart framework. The BioMart project originated at the European Bioinformatics Institute as a data management solution for the Human Genome Project. Since then, BioMart has grown to become a multi-institute collaboration involving various database projects on five continents.

A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics.

A deductive classifier is a type of artificial intelligence inference engine. It takes as input a set of declarations in a frame language about a domain such as medical research or molecular biology. For example, the names of classes, sub-classes, properties, and restrictions on allowable values. The classifier determines if the various declarations are logically consistent and if not will highlight the specific inconsistent declarations and the inconsistencies among them. If the declarations are consistent the classifier can then assert additional information based on the input. For example, it can add information about existing classes, create additional classes, etc. This differs from traditional inference engines that trigger off of IF-THEN conditions in rules. Classifiers are also similar to theorem provers in that they take as input and produce output via First Order Logic. Classifiers originated with KL-ONE Frame languages. They are increasingly significant now that they form a part in the enabling technology of the Semantic Web. Modern classifiers leverage the Web Ontology Language. The models they analyze and generate are called ontologies.


The BioCompute Object (BCO) Project is a community-driven initiative to build a framework for standardizing and sharing computations and analyses generated from High-throughput sequencing. The project has since been standardized as IEEE 2791-2020, and the project files are maintained in an open source repository. The July 22nd, 2020 edition of the Federal Register announced that the FDA now supports the use of BioCompute in regulatory submissions, and the inclusion of the standard in the Data Standards Catalog for the submission of HTS data in NDAs, ANDAs, BLAs, and INDs to CBER, CDER, and CFSAN.

Originally started as a collaborative contract between the George Washington University and the Food and Drug Administration, the project has grown to include over 20 universities, biotechnology companies, public-private partnerships and pharmaceutical companies including Seven Bridges and Harvard Medical School. The BCO aims to ease the exchange of HTS workflows between various organizations, such as the FDA, pharmaceutical companies, contract research organizations, bioinformatic platform providers, and academic researchers. Due to the sensitive nature of regulatory filings, few direct references to material can be published. However, the project is currently funded to train FDA Reviewers and administrators to read and interpret BCOs, and currently has 4 publications either submitted or nearly submitted.

References

  1. 1 2 Elhai, J.; Taton, A.; Massar, J.; Myers, J. K.; Travers, M.; Casey, J.; Slupesky, M.; Shrager, J. (2009). "BioBIKE: A Web-based, programmable, integrated biological knowledge base". Nucleic Acids Research. 37 (Web Server issue): W28–W32. doi:10.1093/nar/gkp354. PMC   2703918 . PMID   19433511.
  2. 1 2 3 Shrager, J.; Waldinger, R.; Stickel, M.; Massar, J. P. (2007). Futrelle, Robert (ed.). "Deductive Biocomputing". PLOS ONE. 2 (4): e339. Bibcode:2007PLoSO...2..339S. doi: 10.1371/journal.pone.0000339 . PMC   1838522 . PMID   17415407.
  3. Massar, J. P.; Travers, M.; Elhai, J.; Shrager, J. (2004). "BioLingua: A programmable knowledge environment for biologists". Bioinformatics. 21 (2): 199–207. doi: 10.1093/bioinformatics/bth465 . PMID   15308539.
  4. Jeff Elhai: Humans, Computers, and the Route to Biological Insights: Regaining Our Capacity for Surprise. Journal of Computational Biology 18(7): 867–878 (2011)
  5. J Shrager (2007) The Evolution of BioBike: Community Adaptation of a Biocomputing Platform. Studies in History and Philosophy of Science, 38, 642–656.
  6. "交通事故について弁護士に相談できる – 相談するだけでいい場合と訴訟を視野に入れて相談をした方がいい場合について".