Paulien Hogeweg | |
---|---|
Born | 24 December 1943 80) | (age
Nationality | Dutch |
Alma mater | Universiteit van Amsterdam Utrecht University |
Known for | coined the term Bioinformatics in 1970, Cellular Potts model |
Scientific career | |
Fields | theoretical biologist, bioinformatics, prebiotic evolution, in silico evolution |
Institutions | Utrecht University |
Website | https://tbb.bio.uu.nl/ph/ |
Paulien Hogeweg (born 1943) is a Dutch theoretical biologist and complex systems researcher studying biological systems as dynamic information processing systems at many interconnected levels. In 1970, together with Ben Hesper, she defined the term bioinformatics [2] [3] [4] as "the study of informatic processes in biotic systems".
Born in Amsterdam, the Netherlands, Hogeweg graduated with a master's degree from the University of Amsterdam in 1969. In her last year as Biology Masters Student, Hogeweg published her studies on water plants titled Structure of aquatic vegetation: a comparison of aquatic vegetation in India, the Netherlands and Czechoslovakia. [5] While volunteering at Leiden University, Hogeweg started her study as a Ph.D. student at Utrecht University. She published seven articles based on her Ph.D work. She graduated from Utrecht University in 1976. The title of her thesis is "Topics in Biological Pattern Analysis", [6] which addressed pattern formation and pattern recognition in biology.
After graduating with a Masters in biology she went to volunteer at a Lab at Leiden University. It was when volunteering at Leiden University that she met Hesper and coined the term Bioinformatics, which she defines as:“the study of information processes in biotic systems.” [7] In 1977, Hogeweg opened a research lab dedicated to bioinformatics with Ben Hesper. In 1990, Hogeweg published an important paper in the field of pre-biotic study: Spiral wave structure in pre-biotic evolution hypercycle stable against parasites. In 1991, Hogeweg became a full professor of Theoretical Biology at Utrecht University (UU). Since 2008, Hogeweg has been an Honorary professor at UU. Hogeweg has participated as an editor board member for Journal Theoretical Biology, Bulletin Mathematical Biology, Biosystems, Artificial Life Journal, and Ecological Informatics.
Starting with asynchronous extensions of L-systems she pioneered agent-based modeling studying development of social structure in animal societies, using the opportunity based "ToDo" principle where agents "do what there is to do", and a "DoDom" principle for dominance ranking, also known as the winner-loser effect. [8] This type of research later became popular in artificial life. [7]
When the first biological sequence data became available (from the EMBL) she developed a tree based algorithm for multiple sequence alignment. [6] which is now common practice in sequence alignment and phylogeny. At about the same time she pioneered folding algorithms for predicting RNA secondary structures. [9] RNA folding was also introduced to allow for a non-linear genotype to phenotype mapping to study evolution on complex fitness landscapes . [10] [11]
The first phase-phase trajectory of a chaotic attractor in an ecological food-chain model of three differential equations appeared long before chaos became popular. [12] She pioneered the use of cellular automata for studying spatial ecological and evolutionary processes and demonstrated that spatial pattern formation can revert evolutionary selection pressures. [13] [14]
Extending the Cellular Potts model (CPM) to study morphogenesis and development she modeled the complete life cycle of Dictyostelium discoideum using simple rules for chemotaxis and differential adhesion . [15] [16] This CPM approach is now used for modeling in various areas of developmental biology, and the migration of immune cells in lymphoid tissues. Finally the CPM is used for EvoDevo research.
In recent years, Hogeweg has continued her research on co-evolutionary dynamics and morphogenesis, to expand on “adaptive genomics” and to study the interface between gene regulation and evolution in cellular organisms. Also, her research is focused on evolvability at the level of genome organization and regulatory networks, and has shown RNA increase in complexity as the result of interactions of secondary structure and spatial pattern formation. [17]
Hogeweg has participated in diverse research groups in biological science. Her contribution varies from developing computational methods such as algorithm for tree based multiple sequence alignment which has become a standard practice. Most importantly, her work has greatly contributed to bioinformatics theory.
Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The process of analyzing and interpreting data can sometimes be referred to as computational biology, however this distinction between the two terms is often disputed. To some, the term computational biology refers to building and using models of biological systems.
Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer science and engineering which uses bioengineering to build computers.
In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. It can be performed on the entire genome, transcriptome or proteome of an organism, and can also involve only selected segments or regions, like tandem repeats and transposable elements. Methodologies used include sequence alignment, searches against biological databases, and others.
In theoretical linguistics and computational linguistics, probabilistic context free grammars (PCFGs) extend context-free grammars, similar to how hidden Markov models extend regular grammars. Each production is assigned a probability. The probability of a derivation (parse) is the product of the probabilities of the productions used in that derivation. These probabilities can be viewed as parameters of the model, and for large problems it is convenient to learn these parameters via machine learning. A probabilistic grammar's validity is constrained by context of its training dataset.
Chargaff's rules state that in the DNA of any species and any organism, the amount of guanine should be equal to the amount of cytosine and the amount of adenine should be equal to the amount of thymine. Further, a 1:1 stoichiometric ratio of purine and pyrimidine bases should exist. This pattern is found in both strands of the DNA. They were discovered by Austrian-born chemist Erwin Chargaff in the late 1940s.
Comparative genomics is a branch of biological research that examines genome sequences across a spectrum of species, spanning from humans and mice to a diverse array of organisms from bacteria to chimpanzees. This large-scale holistic approach compares two or more genomes to discover the similarities and differences between the genomes and to study the biology of the individual genomes. Comparison of whole genome sequences provides a highly detailed view of how organisms are related to each other at the gene level. By comparing whole genome sequences, researchers gain insights into genetic relationships between organisms and study evolutionary changes. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, Comparative genomics provides a powerful tool for studying evolutionary changes among organisms, helping to identify genes that are conserved or common among species, as well as genes that give unique characteristics of each organism. Moreover, these studies can be performed at different levels of the genomes to obtain multiple perspectives about the organisms.
Cis-regulatory elements (CREs) or cis-regulatory modules (CRMs) are regions of non-coding DNA which regulate the transcription of neighboring genes. CREs are vital components of genetic regulatory networks, which in turn control morphogenesis, the development of anatomy, and other aspects of embryonic development, studied in evolutionary developmental biology.
Biomolecular structure is the intricate folded, three-dimensional shape that is formed by a molecule of protein, DNA, or RNA, and that is important to its function. The structure of these molecules may be considered at any of several length scales ranging from the level of individual atoms to the relationships among entire protein subunits. This useful distinction among scales is often expressed as a decomposition of molecular structure into four levels: primary, secondary, tertiary, and quaternary. The scaffold for this multiscale organization of the molecule arises at the secondary level, where the fundamental structural elements are the molecule's various hydrogen bonds. This leads to several recognizable domains of protein structure and nucleic acid structure, including such secondary-structure features as alpha helixes and beta sheets for proteins, and hairpin loops, bulges, and internal loops for nucleic acids. The terms primary, secondary, tertiary, and quaternary structure were introduced by Kaj Ulrik Linderstrøm-Lang in his 1951 Lane Medical Lectures at Stanford University.
Nucleic acid structure prediction is a computational method to determine secondary and tertiary nucleic acid structure from its sequence. Secondary structure can be predicted from one or several nucleic acid sequences. Tertiary structure can be predicted from the sequence, or by comparative modeling.
This list of structural comparison and alignment software is a compilation of software tools and web portals used in pairwise or multiple structural comparison and structural alignment.
CompuCell3D (CC3D) is a three-dimensional C++ and Python software problem solving environment for simulations of biocomplexity problems, integrating multiple mathematical [morphogenesis] models. These include the cellular Potts model (CPM) or Glazier-Graner-Hogeweg model (GGH) which can model cell clustering, growth, division, death, adhesion, and volume and surface area constraints; as well as partial differential equation solvers for modeling reaction–diffusion of external chemical fields and cell type automata for differentiation. By integrating these models CompuCell3D enables modeling of cellular reactions to external chemical fields such as secretion or resorption, and responses such as chemotaxis and haptotaxis.
In computational biology, a Cellular Potts model is a computational model of cells and tissues. It is used to simulate individual and collective cell behavior, tissue morphogenesis and cancer development. CPM describes cells as deformable objects with a certain volume, that can adhere to each other and to the medium in which they live. The formalism can be extended to include cell behaviours such as cell migration, growth and division, and cell signalling. The first CPM was proposed for the simulation of cell sorting by François Graner and James A. Glazier as a modification of a large-Q Potts model. CPM was then popularized by Paulien Hogeweg for studying morphogenesis. Although the model was developed to describe biological cells, it can also be used to model individual parts of a biological cell, or even regions of fluid.
Gustavo Caetano-Anollés is Professor of Bioinformatics in the Department of Crop Sciences, University of Illinois at Urbana-Champaign. He is an expert in the field of evolutionary and comparative genomics.
Anders Krogh is a bioinformatician at the University of Copenhagen, where he leads the university's bioinformatics center. He is known for his pioneering work on the use of hidden Markov models in bioinformatics, and is co-author of a widely used textbook in bioinformatics. In addition, he also co-authored one of the early textbooks on neural networks. His current research interests include promoter analysis, non-coding RNA, gene prediction and protein structure prediction.
Nucleic acid secondary structure is the basepairing interactions within a single nucleic acid polymer or between two polymers. It can be represented as a list of bases which are paired in a nucleic acid molecule. The secondary structures of biological DNAs and RNAs tend to be different: biological DNA mostly exists as fully base paired double helices, while biological RNA is single stranded and often forms complex and intricate base-pairing interactions due to its increased ability to form hydrogen bonds stemming from the extra hydroxyl group in the ribose sugar.
In chemistry, a hypercycle is an abstract model of organization of self-replicating molecules connected in a cyclic, autocatalytic manner. It was introduced in an ordinary differential equation (ODE) form by the Nobel Prize in Chemistry winner Manfred Eigen in 1971 and subsequently further extended in collaboration with Peter Schuster. It was proposed as a solution to the error threshold problem encountered during modelling of replicative molecules that hypothetically existed on the primordial Earth. As such, it explained how life on Earth could have begun using only relatively short genetic sequences, which in theory were too short to store all essential information. The hypercycle is a special case of the replicator equation. The most important properties of hypercycles are autocatalytic growth competition between cycles, once-for-ever selective behaviour, utilization of small selective advantage, rapid evolvability, increased information capacity, and selection against parasitic branches.
Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.
Non-coding RNAs have been discovered using both experimental and bioinformatic approaches. Bioinformatic approaches can be divided into three main categories. The first involves homology search, although these techniques are by definition unable to find new classes of ncRNAs. The second category includes algorithms designed to discover specific types of ncRNAs that have similar properties. Finally, some discovery methods are based on very general properties of RNA, and are thus able to discover entirely new kinds of ncRNAs.
James Alexander Glazier is a biophysicist and bioengineer, author, and educator best known for his contributions to the field of multiscale modeling, pattern formation, and morphogenesis in biological systems. Glazier has published numerous articles in leading scientific journals, and his work has been widely recognized within the scientific community. He has also been influential in promoting the use of computational modeling and simulation in the study of complex biological phenomena.