Computational biology

Last updated

Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, ecological, behavioral, and social systems. [1] The field is broadly defined and includes foundations in biology, applied mathematics, statistics, biochemistry, chemistry, biophysics, molecular biology, genetics, genomics, computer science and evolution. [2]

Biology is the natural science that studies life and living organisms, including their physical structure, chemical processes, molecular interactions, physiological mechanisms, development and evolution. Despite the complexity of the science, there are certain unifying concepts that consolidate it into a single, coherent field. Biology recognizes the cell as the basic unit of life, genes as the basic unit of heredity, and evolution as the engine that propels the creation and extinction of species. Living organisms are open systems that survive by transforming energy and decreasing their local entropy to maintain a stable and vital condition defined as homeostasis.

Applied mathematics Application of mathematical methods to other fields

Applied mathematics is the application of mathematical methods by different fields such as science, engineering, business, computer science, and industry. Thus, applied mathematics is a combination of mathematical science and specialized knowledge. The term "applied mathematics" also describes the professional specialty in which mathematicians work on practical problems by formulating and studying mathematical models. In the past, practical applications have motivated the development of mathematical theories, which then became the subject of study in pure mathematics where abstract concepts are studied for their own sake. The activity of applied mathematics is thus intimately connected with research in pure mathematics.

Statistics study of the collection, organization, analysis, interpretation, and presentation of data

Statistics is a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation. In applying statistics to, for example, a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model process to be studied. Populations can be diverse topics such as "all people living in a country" or "every atom composing a crystal". Statistics deals with all aspects of data, including the planning of data collection in terms of the design of surveys and experiments. See glossary of probability and statistics.


Computational biology is different from biological computing, which is a subfield of computer science and computer engineering using bioengineering and biology to build computers, but is similar to bioinformatics, which is an interdisciplinary science using computers to store and process biological data.

Bio computers use systems of biologically derived molecules—such as DNA and proteins—to perform computational calculations involving storing, retrieving, and processing data.

Computer science study of the theoretical foundations of information and computation

Computer science is the study of processes that interact with data and that can be represented as data in the form of programs. It enables the use of algorithms to manipulate, store, and communicate digital information. A computer scientist studies the theory of computation and the practice of designing software systems.

Computer engineering discipline integrating computer science and electrical engineering to develop computer hardware and software

Computer engineering is a branch of engineering that integrates several fields of computer science and electronic engineering required to develop computer hardware and software. Computer engineers usually have training in electronic engineering, software design, and hardware–software integration instead of only software engineering or electronic engineering. Computer engineers are involved in many hardware and software aspects of computing, from the design of individual microcontrollers, microprocessors, personal computers, and supercomputers, to circuit design. This field of engineering not only focuses on how computer systems themselves work, but also how they integrate into the larger picture.


Computational Biology, which includes many aspects of bioinformatics, is the science of using biological data to develop algorithms or models to understand biological systems and relationships. Until recently, biologists did not have access to very large amounts of data. This data has now become commonplace, particularly in molecular biology and genomics. Researchers were able to develop analytical methods for interpreting biological information, but were unable to share them quickly among colleagues. [3]

Bioinformatics Software tools for understanding biological data

Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data. As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques.

Molecular biology branch of biology that deals with the molecular basis of biological activity

Molecular biology is a branch of biology that concerns the molecular basis of biological activity between biomolecules in the various systems of a cell, including the interactions between DNA, RNA, proteins and their biosynthesis, as well as the regulation of these interactions. Writing in Nature in 1961, William Astbury described molecular biology as:

...not so much a technique as an approach, an approach from the viewpoint of the so-called basic sciences with the leading idea of searching below the large-scale manifestations of classical biology for the corresponding molecular plan. It is concerned particularly with the forms of biological molecules and [...] is predominantly three-dimensional and structural – which does not mean, however, that it is merely a refinement of morphology. It must at the same time inquire into genesis and function.

Genomics discipline in genetics

Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of genes, which direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.

Bioinformatics began to develop in the early 1970s. It was considered the science of analyzing informatics processes of various biological systems. At this time, research in artificial intelligence was using network models of the human brain in order to generate new algorithms. This use of biological data to develop other fields pushed biological researchers to revisit the idea of using computers to evaluate and compare large data sets. By 1982, information was being shared among researchers through the use of punch cards. The amount of data being shared began to grow exponentially by the end of the 1980s. This required the development of new computational methods in order to quickly analyze and interpret relevant information. [3]

In computer science, artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals. Computer science defines AI research as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. More specifically, Kaplan and Haenlein define AI as “a system’s ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation”. Colloquially, the term "artificial intelligence" is used to describe machines that mimic "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving".

Since the late 1990s, computational biology has become an important part of developing emerging technologies for the field of biology. [4] The terms computational biology and evolutionary computation have a similar name, but are not to be confused. Unlike computational biology, evolutionary computation is not concerned with modeling and analyzing biological data. It instead creates algorithms based on the ideas of evolution across species. Sometimes referred to as genetic algorithms, the research of this field can be applied to computational biology. While evolutionary computation is not inherently a part of computational biology, Computational evolutionary biology is a subfield of it. [5]

Evolutionary computation Trial and error problem solvers with a metaheuristic or stochastic optimization character

In computer science, evolutionary computation is a family of algorithms for global optimization inspired by biological evolution, and the subfield of artificial intelligence and soft computing studying these algorithms. In technical terms, they are a family of population-based trial and error problem solvers with a metaheuristic or stochastic optimization character.

Genetic algorithm competitive algorithm for searching a problem space

In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection. John Holland introduced genetic algorithms in 1960 based on the concept of Darwin’s theory of evolution; afterwards, his student David E. Goldberg extended GA in 1989.

Computational biology has been used to help sequence the human genome, create accurate models of the human brain, and assist in modeling biological systems. [3]


Computational anatomy

Computational anatomy is a discipline focusing on the study of anatomical shape and form at the visible or gross anatomical scale of morphology. It involves the development and application of computational, mathematical and data-analytical methods for modeling and simulation of biological structures. It focuses on the anatomical structures being imaged, rather than the medical imaging devices. Due to the availability of dense 3D measurements via technologies such as magnetic resonance imaging (MRI), computational anatomy has emerged as a subfield of medical imaging and bioengineering for extracting anatomical coordinate systems at the morphome scale in 3D.

The original formulation of computational anatomy is as a generative model of shape and form from exemplars acted upon via transformations. [6] The diffeomorphism group is used to study different coordinate systems via coordinate transformations as generated via the Lagrangian and Eulerian velocities of flow from one anatomical configuration in to another. It relates with shape statistics and morphometrics, with the distinction that diffeomorphisms are used to map coordinate systems, whose study is known as diffeomorphometry.

Computational biomodeling

Computational biomodeling is a field concerned with building computer models of biological systems. Computational biomodeling aims to develop and use visual simulations in order to assess the complexity of biological systems. This is accomplished through the use of specialized algorithms, and visualization software. These models allow for prediction of how systems will react under different environments. This is useful for determining if a system is robust. A robust biological system is one that “maintain their state and functions against external and internal perturbations”, [7] which is essential for a biological system to survive. Computational biomodeling generates a large archive of such data, allowing for analysis from multiple users. While current techniques focus on small biological systems, researchers are working on approaches that will allow for larger networks to be analyzed and modeled. A majority of researchers believe that this will be essential in developing modern medical approaches to creating new drugs and gene therapy. [7] A useful modelling approach is to use Petri nets via tools such as esyN [8]

Computational genomics (Computational genetics)

A partially sequenced genome. Genome viewer screenshot small.png
A partially sequenced genome.

Computational genomics is a field within genomics which studies the genomes of cells and organisms. It is sometimes referred to as Computational and Statistical Genetics and encompasses much of Bioinformatics. The Human Genome Project is one example of computational genomics. This project looks to sequence the entire human genome into a set of data. Once fully implemented, this could allow for doctors to analyze the genome of an individual patient. [9] This opens the possibility of personalized medicine, prescribing treatments based on an individual’s pre-existing genetic patterns. This project has created many similar programs. Researchers are looking to sequence the genomes of animals, plants, bacteria, and all other types of life. [10]

One of the main ways that genomes are compared is by homology. Homology is the study of biological structures and nucleotide sequences in different organisms that come from a common ancestor. Research suggests that between 80 and 90% of genes in newly sequenced prokaryotic genomes can be identified this way. [10]

This field is still in development. An untouched project in the development of computational genomics is the analysis of intergenic regions. Studies show that roughly 97% of the human genome consists of these regions. [10] Researchers in computational genomics are working on understanding the functions of non-coding regions of the human genome through the development of computational and statistical methods and via large consortia projects such as ENCODE (The Encyclopedia of DNA Elements) and the Roadmap Epigenomics Project.

Computational neuroscience

Computational neuroscience is the study of brain function in terms of the information processing properties of the structures that make up the nervous system. It is a subset of the field of neuroscience, and looks to analyze brain data to create practical applications. [11] It looks to model the brain in order to examine specific types aspects of the neurological system. Various types of models of the brain include:

It is the work of computational neuroscientists to improve the algorithms and data structures currently used to increase the speed of such calculations.

Computational pharmacology

Computational pharmacology (from a computational biology perspective) is “the study of the effects of genomic data to find links between specific genotypes and diseases and then screening drug data”. [13] The pharmaceutical industry requires a shift in methods to analyze drug data. Pharmacologists were able to use Microsoft Excel to compare chemical and genomic data related to the effectiveness of drugs. However, the industry has reached what is referred to as the Excel barricade. This arises from the limited number of cells accessible on a spreadsheet. This development led to the need for computational pharmacology. Scientists and researchers develop computational methods to analyze these massive data sets. This allows for an efficient comparison between the notable data points and allows for more accurate drugs to be developed. [14]

Analysts project that if major medications fail due to patents, that computational biology will be necessary to replace current drugs on the market. Doctoral students in computational biology are being encouraged to pursue careers in industry rather than take Post-Doctoral positions. This is a direct result of major pharmaceutical companies needing more qualified analysts of the large data sets required for producing new drugs. [14]

Computational evolutionary biology

Computational biology has assisted the field of evolutionary biology in many capacities. This includes:

Cancer computational biology

Cancer computational biology is a field that aims to determine the future mutations in cancer through an algorithmic approach to analyzing data. Research in this field has led to the use of high-throughput measurement. High throughput measurement allows for the gathering of millions of data points using robotics and other sensing devices. This data is collected from DNA, RNA, and other biological structures. Areas of focus include determining the characteristics of tumors, analyzing molecules that are deterministic in causing cancer, and understanding how the human genome relates to the causation of tumors and cancer. [16]

Computational neuropsychiatry

Computational neuropsychiatry is the emerging field that uses mathematical and computer-assisted modeling of brain mechanisms involved in mental disorders. It was already demonstrated by several initiatives that computational modeling is an important contribution to understand neuronal circuits that could generate mental functions and dysfunctions. [17] [18] [19]

Software and tools

Computational Biologists use a wide range of software. These range from command line programs to graphical and web-based programs.

Open source software

Open source software provides a platform to develop computational biological methods. Specifically, open source means that every person and/or entity can access and benefit from software developed in research. PLOS cites four main reasons for the use of open source software including:


There are several large conferences that are concerned with computational biology. Some notable examples are Intelligent Systems for Molecular Biology (ISMB), European Conference on Computational Biology (ECCB) and Research in Computational Molecular Biology (RECOMB).


There are numerous journals dedicated to computational biology. Some notable examples include Journal of Computational Biology and PLOS Computational Biology. The PLOS computational biology journal is a peer-reviewed journal that has many notable research projects in the field of computational biology. They provide reviews on software, tutorials for open source software, and display information on upcoming computational biology conferences. PLOS Computational Biology is an open access journal. The publication may be openly used provided the author is cited. [21] Recently a new open access journal Computational Molecular Biology was launched.

Computational biology, bioinformatics and mathematical biology are all interdisciplinary approaches to the life sciences that draw from quantitative disciplines such as mathematics and information science. The NIH describes computational/mathematical biology as the use of computational/mathematical approaches to address theoretical and experimental questions in biology and, by contrast, bioinformatics as the application of information science to understand complex life-sciences data. [1]

Specifically, the NIH defines

Computational biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems. [1]

Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. [1]

While each field is distinct, there may be significant overlap at their interface. [1]

See also

Related Research Articles

Computer science is the study of the theoretical foundations of information and computation and their implementation and application in computer systems. One well known subject classification system for computer science is the ACM Computing Classification System devised by the Association for Computing Machinery.

Bio-inspired computing, short for biologically inspired computing, is a field of study that loosely knits together subfields related to the topics of connectionism, social behaviour and emergence. It is often closely related to the field of artificial intelligence, as many of its pursuits can be linked to machine learning. It relies heavily on the fields of biology, computer science and mathematics. Briefly put, it is the use of computers to model the living phenomena, and simultaneously the study of life to improve the usage of computers. Biologically inspired computing is a major subset of natural computation.

Systems biology computational and mathematical modeling of complex biological systems

Systems biology is the computational and mathematical modeling of complex biological systems. It is a biology-based interdisciplinary field of study that focuses on complex interactions within biological systems, using a holistic approach to biological research.

Computational science is a rapidly growing multidisciplinary field that uses advanced computing capabilities to understand and solve complex problems. It is an area of science which spans many disciplines, but at its core it involves the development of models and simulations to understand natural systems.

Modelling biological systems is a significant task of systems biology and mathematical biology. Computational systems biology aims to develop and use efficient algorithms, data structures, visualization and communication tools with the goal of computer modelling of biological systems. It involves the use of computer simulations of biological systems, including cellular subsystems, to both analyze and visualize the complex connections of these cellular processes.

Biocomplexity Institute of Virginia Tech

The Biocomplexity Institute of Virginia Tech is a research organization specializing in bioinformatics, computational biology, and systems biology. The Institute has more than 250 personnel, including over 50 tenured and research faculty. Research at the Institute involves collaboration in diverse disciplines such as mathematics, computer science, biology, plant pathology, biochemistry, systems biology, statistics, economics, synthetic biology and medicine. The institute develops -omic and bioinformatic tools and databases that can be applied to the study of human, animal and plant diseases as well as the discovery of new vaccine, drug and diagnostic targets.

Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data. These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. As such, computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.

Neuroinformatics is a research field concerned with the organization of neuroscience data by the application of computational models and analytical tools. These areas of research are important for the integration and analysis of increasingly large-volume, high-dimensional, and fine-grain experimental data. Neuroinformaticians provide computational tools, mathematical models, and create interoperable databases for clinicians and research scientists. Neuroscience is a heterogeneous field, consisting of many and various sub-disciplines. In order for our understanding of the brain to continue to deepen, it is necessary that these sub-disciplines are able to share data and findings in a meaningful way; Neuroinformaticians facilitate this.

Computational engineering

Not to be confused with computer engineering.

Paulien Hogeweg is a Dutch theoretical biologist and complex systems researcher studying biological systems as dynamic information processing systems at many interconnected levels. In 1970, with Ben Hesper and she defined the term bioinformatics as "the study of informatic processes in biotic systems".

Physiomics is a systematic study of physiome in biology. Physiomics employs bioinformatics to construct networks of physiological features that are associated with genes, proteins and their networks. A few of the methods for determining individual relationships between the DNA sequence and physiological function include metabolic pathway engineering and RNAi analysis. The relationships derived from methods such as these are organized and processed computationally to form distinct networks. Computer models use these experimentally determined networks to develop further predictions of gene function.

Igor Goryanin is a systems biologist, who holds a Henrik Kacser Chair in Computational Systems Biology at the University of Edinburgh, and leads the Computational Systems Biology and Bioinformatics group, School of Informatics. He heads the Biological Systems Unit at Okinawa Institute Science and Technology, Japan.

Cellular model

Creating a cellular model has been a particularly challenging task of systems biology and mathematical biology. It involves developing efficient algorithms, data structures, visualization and communication tools to orchestrate the integration of large quantities of biological data with the goal of computer modeling.

Natural computing, also called natural computation, is a terminology introduced to encompass three classes of methods: 1) those that take inspiration from nature for the development of novel problem-solving techniques; 2) those that are based on the use of computers to synthesize natural phenomena; and 3) those that employ natural materials to compute. The main fields of research that compose these three branches are artificial neural networks, evolutionary algorithms, swarm intelligence, artificial immune systems, fractal geometry, artificial life, DNA computing, and quantum computing, among others.

Translational bioinformatics (TBI) is an emerging field in the study of health informatics, focused on the convergence of molecular bioinformatics, biostatistics, statistical genetics and clinical informatics. Its focus is on applying informatics methodology to the increasing amount of biomedical and genomic data to formulate knowledge and medical tools, which can be utilized by scientists, clinicians, and patients. Furthermore, it involves applying biomedical research to improve human health through the use of computer-based information system. TBI employs data mining and analyzing biomedical informatics in order to generate clinical knowledge for application. Clinical knowledge includes finding similarities in patient populations, interpreting biological information to suggest therapy treatments and predict health outcomes.

Gary Stormo American geneticist

Gary Stormo is an American geneticist and currently Joseph Erlanger Professor in the Department of Genetics and the Center for Genome Sciences and Systems Biology at Washington University School of Medicine in St Louis. He is considered as one of the pioneers of bioinformatics and genomics. His research combines experimental and computational approaches in order to identify and predict regulatory sequences in DNA and RNA, and their contributions to the regulatory networks that control gene expression.

Victor V. Solovyev bioinformatician

Victor V. Solovyev is Chief Scientific Officer of Softberry Inc.. He has previously served as a Professor of Computer Science in the Computer, Electrical and Mathematical Sciences and Engineering Division at King Abdullah University of Science and Technology (KAUST) (2013-2015) and in the Department of Computer Science, Royal Holloway College, University of London (2003-2012). He served on the Editorial Board of Mathematical Biosciences and was a founder of Softberry Inc..

The Institute of Mathematical Problems of Biology RAS is a research institute specializing in computational biology and bioinformatics. The objective of the institute is elaboration of mathematical and computational methods for biological research, as well as implementation of these methods directly addressing the problems of computational biology.


  1. 1 2 3 4 5 "NIH working definition of bioinformatics and computational biology" (PDF). Biomedical Information Science and Technology Initiative. 17 July 2000. Archived from the original (PDF) on 5 September 2012. Retrieved 18 August 2012.
  2. "About the CCMB". Center for Computational Molecular Biology. Retrieved 18 August 2012.
  3. 1 2 3 Hogeweg, Paulien (7 March 2011). "The Roots of Bioinformatics in Theoretical Biology". PLOS Computational Biology. 3. 7: e1002021. doi:10.1371/journal.pcbi.1002021. PMC   3068925 Lock-green.svg. PMID   21483479.
  4. Bourne, Philip. "Rise and Demise of Bioinformatics? Promise and Progress". PLoS Computational Biology. 8: e1002487. doi:10.1371/journal.pcbi.1002487. PMC   3343106 Lock-green.svg. PMID   22570600.
  5. Foster, James (June 2001). "Evolutionary Computation". Nature Reviews Genetics. doi:10.1038/35076523.
  6. Grenander, Ulf; Miller, Michael I. (1998-12-01). "Computational Anatomy: An Emerging Discipline". Q. Appl. Math. 56 (4): 617–694.
  7. 1 2 Kitano, Hiroaki (14 November 2002). "Computational systems biology". Nature. 420 (6912): 206–10. doi:10.1038/nature01254. PMID   12432404.
  8. Favrin, Bean (2 September 2014). "esyN: Network Building, Sharing and Publishing". PLOS ONE. 9: e106035. doi:10.1371/journal.pone.0106035. PMC   4152123 Lock-green.svg. PMID   25181461.
  9. "Genome Sequencing to the Rest of Us". Scientific American.
  10. 1 2 3 Koonin, Eugene (6 March 2001). "Computational Genomics". Curr. Biol. 11 (5): 155–158. doi:10.1016/S0960-9822(01)00081-1. PMID   11267880.
  11. "BU Neuroscience".
  12. 1 2 Sejnowski, Terrence; Christof Koch; Patricia S. Churchland (9 September 1988). "Computational Neuroscience". 4871. 241.
  13. Price, Michael. "Computational Biologists: The Next Pharma Scientists?".
  14. 1 2 Jessen, Walter. "Pharma's shifting strategy means more jobs for computational biologists".
  15. Antonio Carvajal-Rodríguez (2012). "Simulation of Genes and Genomes Forward in Time". Current Genomics. Bentham Science Publishers Ltd. 11 (1): 58–61. doi:10.2174/138920210790218007. PMC   2851118 Lock-green.svg. PMID   20808525.
  16. Yakhini, Zohar. "Cancer Computational Biology". BMC.
  17. Dauvermann, Maria R., Heather C. Whalley, André Schmidt, Graham L. Lee, Liana Romaniuk, Neil Roberts, Eve C. Johnstone, Stephen M. Lawrie, and Thomas WJ Moorhead. "Computational neuropsychiatry–schizophrenia as a cognitive brain network disorder." Frontiers in psychiatry 5 (2014).
  18. Tretter, Felix, and M. Albus. "“Computational Neuropsychiatry” of Working Memory Disorders in Schizophrenia: The Network Connectivity in Prefrontal Cortex-Data and Models." Pharmacopsychiatry 40, no. S 1 (2007): S2-S16.
  19. Marin-Sanguino, A., and E. R. Mendoza. "Hybrid modeling in computational neuropsychiatry." Pharmacopsychiatry 41, no. S 01 (2008): S85-S88.
  20. "The PLOS Computational Biology Software Section". PLOS Computational Biology. 8: e1002799. doi:10.1371/journal.pcbi.1002799.
  21. "PLOS Computational Biology".