Treefinder

Last updated
TreeFinder
Developer(s) Gangolf Jobb
Stable release
March 2011 / 1 March 2011 (2011-03-01)
Available in English
Type Bioinformatics tool
Website www.treefinder.de

Treefinder is a computer program for the likelihood-based reconstruction of phylogenetic trees from molecular sequences. It was written by Gangolf Jobb, a former researcher at the University of Munich, Germany, and was originally released in 2004. Treefinder is free of charge, though the most recent license prohibits its use in the USA and eight European countries.

Contents

Overview

A platform-independent graphical environment integrates a standard suite of analyses: phylogeny reconstruction, bootstrap analysis, model selection, hypothesis testing, tree calibration, manipulation of trees and sequence data. Treefinder is scriptable through a proprietary scripting language called TL.

Treefinder has an efficient tree search algorithm that can infer trees with thousands of species within a short time. Result trees are displayed and can then be saved as a reconstruction report, which may serve as an input for further analysis, for example hypothesis testing. The report contains all information about the tree and the models used. Treefinder also supports exporting results as NEWICK or NEXUS files.

The software supports a broad collection of models of sequence evolution. The June 2008 release implements 7 models of nucleotide substitution (HKY, TN, J1, J2, J3 (= TIM), TVM, GTR), 14 empirical models of amino acid substitution (BLOSUM, cpREV, Dayhoff, JTT, LG, mtArt, mtMam, mtREV, PMB, rtREV, betHIV, witHIV, VT, WAG), 4 substitution models of structured rRNA (bactRNA, eukRNA, euk23RNA, mitoRNA), the 6-state "Dayhoff Groups" protein model (DG), 2-state and 3-state models of DNA (GTR3, GTR2), a parametric mixed model (MIX) mixing the empirical models of proteins or rRNA, and also a user-definable GTR-type model (MAP) mapping characters to states as needed. Three models of among-site rate heterogeneity are available (Gamma, Gamma+I, I), which can be combined with any of the substitution models. One can assume different models for different partitions of a sequence alignment, and partitions may be assumed to evolve at different speeds. All parameters of the models can be estimated from the data by maximization of likelihood. Certain TL expressions, the "model expressions", allow the concise notation of complex models, together with their parameters and optimization modes.

Treefinder's original publication from 2004 [1] has been cited more than a thousand times in the scientific literature.[ citation needed ]

Controversy

On February 1, 2015, Jobb disallowed the use of Treefinder in the USA in order to make a political statement. [2] The author again changed the license terms on October 1, 2015 to exclude use in Germany, Austria, France, the Netherlands, Belgium, Great Britain, Sweden, and Denmark - countries he claimed "host most of the non-european immigrants". In an accompanying statement, he decried the handling of the European migrant crisis by European countries. [2] [3] The journal BMC Evolutionary Biology that published the original application note has since retracted it, stating that the license change violated the journal's policy. [4] [5]

See also

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Sequence alignment</span> Process in bioinformatics that identifies equivalent sites within molecular sequences

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences such as calculating the distance cost between strings in a natural language, or to display financial data.

A phylogenetic tree, phylogeny or evolutionary tree is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time. In other words, it is a branching diagram or a tree showing the evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical or genetic characteristics. In evolutionary biology, all life on Earth is theoretically part of a single phylogenetic tree, indicating common ancestry. Phylogenetics is the study of phylogenetic trees. The main challenge is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of species or taxa. Computational phylogenetics focuses on the algorithms involved in finding optimal phylogenetic tree in the phylogenetic landscape.

Molecular phylogenetics is the branch of phylogeny that analyzes genetic, hereditary molecular differences, predominantly in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to determine the processes by which diversity among species has been achieved. The result of a molecular phylogenetic analysis is expressed in a phylogenetic tree. Molecular phylogenetics is one aspect of molecular systematics, a broader term that also includes the use of molecular data in taxonomy and biogeography.

<span class="mw-page-title-main">Substitution model</span> Description of the process by which states in sequences change into each other and back

In biology, a substitution model, also called models of sequence evolution, are Markov models that describe changes over evolutionary time. These models describe evolutionary changes in macromolecules, such as DNA sequences or protein sequences, that can be represented as sequence of symbols. Substitution models are used to calculate the likelihood of phylogenetic trees using multiple sequence alignment data. Thus, substitution models are central to maximum likelihood estimation of phylogeny as well as Bayesian inference in phylogeny. Estimates of evolutionary distances are typically calculated using substitution models. Substitution models are also central to phylogenetic invariants because they are necessary to predict site pattern frequencies given a tree topology. Substitution models are also necessary to simulate sequence data for a group of organisms related by a specific tree.

<span class="mw-page-title-main">Conserved sequence</span> Similar DNA, RNA or protein sequences within genomes or among species

In evolutionary biology, conserved sequences are identical or similar sequences in nucleic acids or proteins across species, or within a genome, or between donor and receptor taxa. Conservation indicates that a sequence has been maintained by natural selection.

Phylogenomics is the intersection of the fields of evolution and genomics. The term has been used in multiple ways to refer to analysis that involves genome data and evolutionary reconstructions. It is a group of techniques within the larger fields of phylogenetics and genomics. Phylogenomics draws information by comparing entire genomes, or at least large portions of genomes. Phylogenetics compares and analyzes the sequences of single genes, or a small number of genes, as well as many other types of data. Four major areas fall under phylogenomics:

A phylogenetic network is any graph used to visualize evolutionary relationships between nucleotide sequences, genes, chromosomes, genomes, or species. They are employed when reticulation events such as hybridization, horizontal gene transfer, recombination, or gene duplication and loss are believed to be involved. They differ from phylogenetic trees by the explicit modeling of richly linked networks, by means of the addition of hybrid nodes instead of only tree nodes. Phylogenetic trees are a subset of phylogenetic networks. Phylogenetic networks can be inferred and visualised with software such as SplitsTree, the R-package, phangorn, and, more recently, Dendroscope. A standard format for representing phylogenetic networks is a variant of Newick format which is extended to support networks as well as trees.

Computational phylogenetics, phylogeny inference, or phylogenetic inference focuses on computational and optimization algorithms, heuristics, and approaches involved in phylogenetic analyses. The goal is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of genes, species, or taxa. Maximum likelihood, parsimony, Bayesian, and minimum evolution are typical optimality criteria used to assess how well a phylogenetic tree topology describes the sequence data. Nearest Neighbour Interchange (NNI), Subtree Prune and Regraft (SPR), and Tree Bisection and Reconnection (TBR), known as tree rearrangements, are deterministic algorithms to search for optimal or the best phylogenetic tree. The space and the landscape of searching for the optimal phylogenetic tree is known as phylogeny search space.

<span class="mw-page-title-main">Multiple sequence alignment</span> Alignment of more than two molecular sequences

Multiple sequence alignment (MSA) is the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. These alignments are used to infer evolutionary relationships via phylogenetic analysis and can highlight homologous features between sequences. Alignments highlight mutation events such as point mutations, insertion mutations and deletion mutations, and alignments are used to assess sequence conservation and infer the presence and activity of protein domains, tertiary structures, secondary structures, and individual amino acids or nucleotides.

Ancestral reconstruction is the extrapolation back in time from measured characteristics of individuals, populations, or species to their common ancestors. It is an important application of phylogenetics, the reconstruction and study of the evolutionary relationships among individuals, populations or species to their ancestors. In the context of evolutionary biology, ancestral reconstruction can be used to recover different kinds of ancestral character states of organisms that lived millions of years ago. These states include the genetic sequence, the amino acid sequence of a protein, the composition of a genome, a measurable characteristic of an organism (phenotype), and the geographic range of an ancestral population or species. This is desirable because it allows us to examine parts of phylogenetic trees corresponding to the distant past, clarifying the evolutionary history of the species in the tree. Since modern genetic sequences are essentially a variation of ancient ones, access to ancient sequences may identify other variations and organisms which could have arisen from those sequences. In addition to genetic sequences, one might attempt to track the changing of one character trait to another, such as fins turning to legs.

Bayesian inference of phylogeny combines the information in the prior and in the data likelihood to create the so-called posterior probability of trees, which is the probability that the tree is correct given the data, the prior and the likelihood model. Bayesian inference was introduced into molecular phylogenetics in the 1990s by three independent groups: Bruce Rannala and Ziheng Yang in Berkeley, Bob Mau in Madison, and Shuying Li in University of Iowa, the last two being PhD students at the time. The approach has become very popular since the release of the MrBayes software in 2001, and is now one of the most popular methods in molecular phylogenetics.

18S ribosomal RNA is a part of the ribosomal RNA. The S in 18S represents Svedberg units. 18S rRNA is an SSU rRNA, a component of the eukaryotic ribosomal small subunit (40S). 18S rRNA is the structural RNA for the small component of eukaryotic cytoplasmic ribosomes, and thus one of the basic components of all eukaryotic cells.

Biological data visualization is a branch of bioinformatics concerned with the application of computer graphics, scientific visualization, and information visualization to different areas of the life sciences. This includes visualization of sequences, genomes, alignments, phylogenies, macromolecular structures, systems biology, microscopy, and magnetic resonance imaging data. Software tools used for visualizing biological data range from simple, standalone programs to complex, integrated systems.

Ziheng Yang FRS is a Chinese biologist. He holds the R.A. Fisher Chair of Statistical Genetics at University College London, and is the Director of R.A. Fisher Centre for Computational Biology at UCL. He was elected a Fellow of the Royal Society in 2006.

<span class="mw-page-title-main">Americhelydia</span> Clade of turtles

Americhelydia is a clade of turtles that consists of sea turtles, snapping turtles, the Central American river turtle and mud turtles, supported by several lines of molecular work. Prior to these studies some morphological and developmental work have considered sea turtles to be basal members of Cryptodira and kinosternids related to the trionychians in the clade Trionychoidea. Americhelydia and Testudinoidea, both clades within Durocryptodira, split a part during the early Cretaceous.

In molecular phylogenetics, relationships among individuals are determined using character traits, such as DNA, RNA or protein, which may be obtained using a variety of sequencing technologies. High-throughput next-generation sequencing has become a popular technique in transcriptomics, which represent a snapshot of gene expression. In eukaryotes, making phylogenetic inferences using RNA is complicated by alternative splicing, which produces multiple transcripts from a single gene. As such, a variety of approaches may be used to improve phylogenetic inference using transcriptomic data obtained from RNA-Seq and processed using computational phylogenetics.

Arndt von Haeseler is a German bioinformatician and evolutionary biologist. He is the scientific director of the Max F. Perutz Laboratories at the Vienna Biocenter and a professor of bioinformatics at the University of Vienna and the Medical University of Vienna.

References

  1. Jobb, Gangolf; von Haeseler, Arndt; Strimmer, Korbinian (2004). "TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics". BMC Evolutionary Biology. 4 (1). Springer Nature: 18. doi: 10.1186/1471-2148-4-18 . ISSN   1471-2148. PMC   459214 . PMID   15222900. (Retracted, see doi:10.1186/s12862-015-0513-z, PMID   26542699,  Retraction Watch)
  2. 1 2 "News". Treefinder.de. 2015-09-30. Retrieved 2018-02-06.
  3. "Scientist says researchers in immigrant-friendly nations can't use his software". Science | AAAS. 2015-09-29. Retrieved 2018-02-06.
  4. "Retraction Note: TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics. BMC Evolutionary Biology". 2015-11-11. Retrieved 2015-11-11.
  5. "BMC retracts paper by scientist who banned use of his software by immigrant-friendly countries". Retraction Watch. 11 November 2015.