BEAST 2

Last updated
Bayesian evolutionary analysis by sampling trees
Stable release
2.7.4 / 2023-03-20
Repository https://github.com/CompEvol/beast2
Written inJava
Operating system Windows, Mac OS X, Linux
Type Phylogenetics software
License GNU Lesser General Public License
Website http://www.beast2.org/

BEAST 2 is a cross-platform program for Bayesian analysis of molecular sequences. [1] Using MCMC, it estimates rooted, timed phylogenies using a range of substitution and clock models, and a variety of tree priors. There is an associated tool, called BEAUTi, for setting up standard analyses (which are specified using XML). BEAST 2 is a complete re-write of the earlier (still actively developed) BEAST program [2] and as such draws on a large body of work. A notable feature of BEAST 2 is the packaging system which has simplified the process of implementing novel models.

Contents

Taming the BEAST is a community driven resource which teaches the use of BEAST 2 and related phylogenetic software. [3]

BEAUti

BEAUti stands for "Bayesian Evolutionary Analysis Utility." It is a graphical user interface (GUI) that is used to create the input files for BEAST 2. It allows users to easily specify the various options and settings for their phylogenetic analysis, such as the data file, the model of molecular evolution, and the prior distributions of model parameters. BEAUti also allows users to specify the parameters for the MCMC analysis, such as the chain length and the sampling frequency. This makes it a user-friendly way to run BEAST 2, as it eliminates the need for users to manually edit XML input files.

BEAUti allows users to easily install and manage different packages, such as models of molecular evolution or coalescent models. These packages can be installed directly from within BEAUti. This makes it easier to add new functionality to the analysis without needing to manually download and install the packages.

CBAN

The Comprehensive BEAST Archive Network (CBAN) is the official repository for BEAST 2 packages. Packages in CBAN can be installed via BEAUti out of the box. [4]

LinguaPhylo

A related project is LinguaPhylo (LPhy). [5] LPhy is a probabilistic programming language for defining phylogenetic analyses with a syntax similar to OpenBUGS. It is provides a way to generate BEAST 2 XML files (and similar model specifications for other phylogenetics packages) without needing to write the XML by hand.

MASTER/remaster

A particularly important package in the BEAST 2 ecosystems is remaster [6] (formerly MASTER [7] ). Remaster is a tool for simulating population trajectories and phylogenies from birth-death and coalescent models. It is used in simulation studies to validate novel methodologies.

Related Research Articles

In biology, phylogenetics is the study of the evolutionary history and relationships among or within groups of organisms. These relationships are determined by phylogenetic inference, methods that focus on observed heritable traits, such as DNA sequences, protein amino acid sequences, or morphology. The result of such an analysis is a phylogenetic tree—a diagram containing a hypothesis of relationships that reflects the evolutionary history of a group of organisms.

Molecular phylogenetics is the branch of phylogeny that analyzes genetic, hereditary molecular differences, predominantly in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to determine the processes by which diversity among species has been achieved. The result of a molecular phylogenetic analysis is expressed in a phylogenetic tree. Molecular phylogenetics is one aspect of molecular systematics, a broader term that also includes the use of molecular data in taxonomy and biogeography.

The molecular clock is a figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleotide sequences for DNA, RNA, or amino acid sequences for proteins.

<span class="mw-page-title-main">Biopython</span> Collection of open-source Python software tools for computational biology

The Biopython project is an open-source collection of non-commercial Python tools for computational biology and bioinformatics, created by an international association of developers. It contains classes to represent biological sequences and sequence annotations, and it is able to read and write to a variety of file formats. It also allows for a programmatic means of accessing online databases of biological information, such as those at NCBI. Separate modules extend Biopython's capabilities to sequence alignment, protein structure, population genetics, phylogenetics, sequence motifs, and machine learning. Biopython is one of a number of Bio* projects designed to reduce code duplication in computational biology.

<span class="mw-page-title-main">Euarchontoglires</span> Superorder of mammals

Euarchontoglires, synonymous with Supraprimates, is a clade and a superorder of mammals, the living members of which belong to one of the five following groups: rodents, lagomorphs, treeshrews, primates, and colugos.

Coalescent theory is a model of how alleles sampled from a population may have originated from a common ancestor. In the simplest case, coalescent theory assumes no recombination, no natural selection, and no gene flow or population structure, meaning that each variant is equally likely to have been passed from one generation to the next. The model looks backward in time, merging alleles into a single ancestral copy according to a random process in coalescence events. Under this model, the expected time between successive coalescence events increases almost exponentially back in time. Variance in the model comes from both the random passing of alleles from one generation to the next, and the random occurrence of mutations in these alleles.

A phylogenetic network is any graph used to visualize evolutionary relationships between nucleotide sequences, genes, chromosomes, genomes, or species. They are employed when reticulation events such as hybridization, horizontal gene transfer, recombination, or gene duplication and loss are believed to be involved. They differ from phylogenetic trees by the explicit modeling of richly linked networks, by means of the addition of hybrid nodes instead of only tree nodes. Phylogenetic trees are a subset of phylogenetic networks. Phylogenetic networks can be inferred and visualised with software such as SplitsTree, the R-package, phangorn, and, more recently, Dendroscope. A standard format for representing phylogenetic networks is a variant of Newick format which is extended to support networks as well as trees.

Computational phylogenetics, phylogeny inference, or phylogenetic inference focuses on computational and optimization algorithms, heuristics, and approaches involved in phylogenetic analyses. The goal is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of genes, species, or taxa. Maximum likelihood, parsimony, Bayesian, and minimum evolution are typical optimality criteria used to assess how well a phylogenetic tree topology describes the sequence data. Nearest Neighbour Interchange (NNI), Subtree Prune and Regraft (SPR), and Tree Bisection and Reconnection (TBR), known as tree rearrangements, are deterministic algorithms to search for optimal or the best phylogenetic tree. The space and the landscape of searching for the optimal phylogenetic tree is known as phylogeny search space.

Ancestral reconstruction is the extrapolation back in time from measured characteristics of individuals, populations, or specie to their common ancestors. It is an important application of phylogenetics, the reconstruction and study of the evolutionary relationships among individuals, populations or species to their ancestors. In the context of evolutionary biology, ancestral reconstruction can be used to recover different kinds of ancestral character states of organisms that lived millions of years ago. These states include the genetic sequence, the amino acid sequence of a protein, the composition of a genome, a measurable characteristic of an organism (phenotype), and the geographic range of an ancestral population or species. This is desirable because it allows us to examine parts of phylogenetic trees corresponding to the distant past, clarifying the evolutionary history of the species in the tree. Since modern genetic sequences are essentially a variation of ancient ones, access to ancient sequences may identify other variations and organisms which could have arisen from those sequences. In addition to genetic sequences, one might attempt to track the changing of one character trait to another, such as fins turning to legs.

Bayesian inference of phylogeny combines the information in the prior and in the data likelihood to create the so-called posterior probability of trees, which is the probability that the tree is correct given the data, the prior and the likelihood model. Bayesian inference was introduced into molecular phylogenetics in the 1990s by three independent groups: Bruce Rannala and Ziheng Yang in Berkeley, Bob Mau in Madison, and Shuying Li in University of Iowa, the last two being PhD students at the time. The approach has become very popular since the release of the MrBayes software in 2001, and is now one of the most popular methods in molecular phylogenetics.

Archaeopteryx is an interactive computer software program, written in Java, for viewing, editing, and analyzing phylogenetic trees. This type of program can be used for a variety of analyses of molecular data sets, but is particularly designed for phylogenomics. Besides tree description formats with limited expressiveness, it also implements the phyloXML format. Archaeopteryx is the successor to Java program A Tree Viewer (ATV).

Ziheng Yang FRS is a Chinese biologist. He holds the R.A. Fisher Chair of Statistical Genetics at University College London, and is the Director of R.A. Fisher Centre for Computational Biology at UCL. He was elected a Fellow of the Royal Society in 2006.

Viral phylodynamics is defined as the study of how epidemiological, immunological, and evolutionary processes act and potentially interact to shape viral phylogenies. Since the coining of the term in 2004, research on viral phylodynamics has focused on transmission dynamics in an effort to shed light on how these dynamics impact viral genetic variation. Transmission dynamics can be considered at the level of cells within an infected host, individual hosts within a population, or entire populations of hosts.

Cross-species transmission (CST), also called interspecies transmission, host jump, or spillover, is the transmission of an infectious pathogen, such as a virus, between hosts belonging to different species. Once introduced into an individual of a new host species, the pathogen may cause disease for the new host and/or acquire the ability to infect other individuals of the same species, allowing it to spread through the new host population. The phenomenon is most commonly studied in virology, but cross-species transmission may also occur with bacterial pathogens or other types of microorganisms.

Bacterial phylodynamics is the study of immunology, epidemiology, and phylogenetics of bacterial pathogens to better understand the evolutionary role of these pathogens. Phylodynamic analysis includes analyzing genetic diversity, natural selection, and population dynamics of infectious disease pathogen phylogenies during pandemics and studying intra-host evolution of viruses. Phylodynamics combines the study of phylogenetic analysis, ecological, and evolutionary processes to better understand of the mechanisms that drive spatiotemporal incidence and phylogenetic patterns of bacterial pathogens. Bacterial phylodynamics uses genome-wide single-nucleotide polymorphisms (SNP) in order to better understand the evolutionary mechanism of bacterial pathogens. Many phylodynamic studies have been performed on viruses, specifically RNA viruses which have high mutation rates. The field of bacterial phylodynamics has increased substantially due to the advancement of next-generation sequencing and the amount of data available.

Multispecies Coalescent Process is a stochastic process model that describes the genealogical relationships for a sample of DNA sequences taken from several species. It represents the application of coalescent theory to the case of multiple species. The multispecies coalescent results in cases where the relationships among species for an individual gene can differ from the broader history of the species. It has important implications for the theory and practice of phylogenetics and for understanding genome evolution.

<span class="mw-page-title-main">Tanja Stadler</span> German mathematician and professor of computational evolution

Tanja Stadler is a mathematician and professor of computational evolution at the Swiss Federal Institute of Technology. She’s the current president of the Swiss Scientific Advisory Panel COVID-19 and Vize-Chair of the Department of Biosystems Science and Engineering at ETH Zürich.

In the field of epidemiology, source attribution refers to a category of methods with the objective of reconstructing the transmission of an infectious disease from a specific source, such as a population, individual, or location. For example, source attribution methods may be used to trace the origin of a new pathogen that recently crossed from another host species into humans, or from one geographic region to another. It may be used to determine the common source of an outbreak of a foodborne infectious disease, such as a contaminated water supply. Finally, source attribution may be used to estimate the probability that an infection was transmitted from one specific individual to another, i.e., "who infected whom".

References

  1. Bouckaert, Remoco (2019-04-08). "BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis". PLOS Computational Biology. 15 (4): e1006650. Bibcode:2019PLSCB..15E6650B. doi: 10.1371/journal.pcbi.1006650 . PMC   6472827 . PMID   30958812. S2CID   104294209.
  2. Suchard, Marc (2018-06-08). "Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10". Virus Evolution. 4 (1): vey016. doi:10.1093/ve/vey016. PMC   6007674 . PMID   29942656.
  3. Barido-Sottani, Joëlle (January 2018). "Taming the BEAST—A Community Teaching Material Resource for BEAST 2". Systematic Biology. 67 (1): 170–174. doi:10.1093/sysbio/syx060. PMC   5925777 . PMID   28673048.
  4. "Comprehensive BEAST Archive Network".
  5. Drummond, Alexei J.; Chen, Kylie; Mendes, Fábio K.; Xie, Dong (2023). "LinguaPhylo: A probabilistic model specification language for reproducible phylogenetic analyses". PLOS Computational Biology. 19 (7): e1011226. doi: 10.1371/journal.pcbi.1011226 . PMC   10381047 . PMID   37463154.
  6. Vaughan, Timothy G. (2024). "ReMASTER: improved phylodynamic simulation for BEAST 2.7". Bioinformatics. 40 (1): btae015. doi:10.1093/bioinformatics/btae015.
  7. Vaughan, Timothy G.; Drummond, Alexei J. (2013). "A Stochastic Simulator of Birth–Death Master Equations with Application to Phylodynamics". Molecular Biology and Evolution. 30 (6): 1480–1493. doi:10.1093/molbev/mst057.