Developer(s) | Nick Patterson, Robert Maier, David Reich |
---|---|
Initial release | 2012 |
Repository | |
Written in | C, C++, R |
Operating system | Windows, Linux, etc. |
Type | Population genetics |
Website | uqrmaie1 |
ADMIXTOOLS (or AdmixTools) is a software package that is primarily used for analyzing admixture in population genetics. The original version was developed as a set of standalone C programs by Nick Patterson and colleagues and published in 2012. [1] [2] A reimplemented version, ADMIXTOOLS 2, was developed as an R package by Robert Maier and colleagues and published in 2023. [3] [4]
Most ADMIXTOOLS programs are based on fitting demographic models to f-statistics, which are calculated from population allele frequencies. [5]
qpGraph is a software program that is part of the ADMIXTOOLS [2] software package developed by Patterson et al. (2012). qpGraph evaluates graph-based models of population relationships with genetic admixture. [1] It estimates likelihoods of graphs with a fixed topology, [6] [7] while adjusting graph parameters to fit observed f-statistics. [8]
ADMIXTOOLS 2 adds functionality for finding optimized graph topologies, similar to programs like Treemix. [9]
Related statistical tools in the ADMIXTOOLS software package include qpAdm, [10] qpfst, qpF4ratio, qp3Pop, qpBound,qpDstat, andqpWave. [11] qpDstat and qpWave test whether populations form clades, while qpAdm estimates ancestry proportions. [4] qpAdm is often used in conjunction with CP/NNLS. [12] [13]
Biostatistics is a branch of statistics that applies statistical methods to a wide range of topics in biology. It encompasses the design of biological experiments, the collection and analysis of data from those experiments and the interpretation of the results.
Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer science and engineering which uses bioengineering to build computers.
BioRuby is a collection of open-source Ruby code, comprising classes for computational molecular biology and bioinformatics. It contains classes for DNA and protein sequence analysis, sequence alignment, biological database parsing, structural biology and other bioinformatics tasks. BioRuby is released under the GNU GPL version 2 or Ruby licence and is one of a number of Bio* projects, designed to reduce code duplication.
The Yamnaya culture or the Yamna culture, also known as the Pit Grave culture or Ochre Grave culture, is a late Copper Age to early Bronze Age archaeological culture of the region between the Southern Bug, Dniester, and Ural rivers, dating to 3300–2600 BC. It was discovered by Vasily Gorodtsov following his archaeological excavations near the Donets River in 1901–1903. Its name derives from its characteristic burial tradition: Я́мная is a Russian adjective that means 'related to pits ', as these people used to bury their dead in tumuli (kurgans) containing simple pit chambers. Research in recent years has found that Mikhaylovka, in lower Dnieper river, Ukraine, formed the Core Yamnaya culture.
Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for exploratory qualitative data analysis and interactive data visualization.
Genetics and archaeogenetics of South Asia is the study of the genetics and archaeogenetics of the ethnic groups of South Asia. It aims at uncovering these groups' genetic histories. The geographic position of the Indian subcontinent makes its biodiversity important for the study of the early dispersal of anatomically modern humans across Asia.
Coalescent theory is a model of how alleles sampled from a population may have originated from a common ancestor. In the simplest case, coalescent theory assumes no recombination, no natural selection, and no gene flow or population structure, meaning that each variant is equally likely to have been passed from one generation to the next. The model looks backward in time, merging alleles into a single ancestral copy according to a random process in coalescence events. Under this model, the expected time between successive coalescence events increases almost exponentially back in time. Variance in the model comes from both the random passing of alleles from one generation to the next, and the random occurrence of mutations in these alleles.
In population genetics, an ancestry-informative marker (AIM) is a single-nucleotide polymorphism that exhibits substantially different frequencies between different populations. A set of many AIMs can be used to estimate the proportion of ancestry of an individual derived from each population.
David Emil Reich is an American geneticist known for his research into the population genetics of ancient humans, including their migrations and the mixing of populations, discovered by analysis of genome-wide patterns of mutations. He is professor in the department of genetics at the Harvard Medical School, and an associate of the Broad Institute. Reich was highlighted as one of Nature's 10 for his contributions to science in 2015. He received the Dan David Prize in 2017, the NAS Award in Molecular Biology, the Wiley Prize, and the Darwin–Wallace Medal in 2019. In 2021 he was awarded the Massry Prize.
GenePattern is a freely available computational biology open-source software package originally created and developed at the Broad Institute for the analysis of genomic data. Designed to enable researchers to develop, capture, and reproduce genomic analysis methodologies, GenePattern was first released in 2004. GenePattern is currently developed at the University of California, San Diego.
Population structure is the presence of a systematic difference in allele frequencies between subpopulations. In a randomly mating population, allele frequencies are expected to be roughly similar between groups. However, mating tends to be non-random to some degree, causing structure to arise. For example, a barrier like a river can separate two groups of the same species and make it difficult for potential mates to cross; if a mutation occurs, over many generations it can spread and become common in one subpopulation while being completely absent in the other.
GeneNetwork is a combined database and open-source bioinformatics data analysis software resource for systems genetics. This resource is used to study gene regulatory networks that link DNA sequence differences to corresponding differences in gene and protein expression and to variation in traits such as health and disease risk. Data sets in GeneNetwork are typically made up of large collections of genotypes and phenotypes from groups of individuals, including humans, strains of mice and rats, and organisms as diverse as Drosophila melanogaster, Arabidopsis thaliana, and barley. The inclusion of genotypes makes it practical to carry out web-based gene mapping to discover those regions of genomes that contribute to differences among individuals in mRNA, protein, and metabolite levels, as well as differences in cell function, anatomy, physiology, and behavior.
Interbreeding between archaic and modern humans occurred during the Middle Paleolithic and early Upper Paleolithic. The interbreeding happened in several independent events that included Neanderthals and Denisovans, as well as several unidentified hominins.
Mega2 is a data manipulation software for applied statistical genetics. Mega is an acronym for Manipulation Environment for Genetic Analysis.
Pathway is the term from molecular biology for a curated schematic representation of a well characterized segment of the molecular physiological machinery, such as a metabolic pathway describing an enzymatic process within a cell or tissue or a signaling pathway model representing a regulatory process that might, in its turn, enable a metabolic or another regulatory process downstream. A typical pathway model starts with an extracellular signaling molecule that activates a specific receptor, thus triggering a chain of molecular interactions. A pathway is most often represented as a relatively small graph with gene, protein, and/or small molecule nodes connected by edges of known functional relations. While a simpler pathway might appear as a chain, complex pathway topologies with loops and alternative routes are much more common. Computational analyses employ special formats of pathway representation. In the simplest form, however, a pathway might be represented as a list of member molecules with order and relations unspecified. Such a representation, generally called Functional Gene Set (FGS), can also refer to other functionally characterised groups such as protein families, Gene Ontology (GO) and Disease Ontology (DO) terms etc. In bioinformatics, methods of pathway analysis might be used to identify key genes/ proteins within a previously known pathway in relation to a particular experiment / pathological condition or building a pathway de novo from proteins that have been identified as key affected elements. By examining changes in e.g. gene expression in a pathway, its biological activity can be explored. However most frequently, pathway analysis refers to a method of initial characterization and interpretation of an experimental condition that was studied with omics tools or genome-wide association study. Such studies might identify long lists of altered genes. A visual inspection is then challenging and the information is hard to summarize, since the altered genes map to a broad range of pathways, processes, and molecular functions. In such situations, the most productive way of exploring the list is to identify enrichment of specific FGSs in it. The general approach of enrichment analyses is to identify FGSs, members of which were most frequently or most strongly altered in the given condition, in comparison to a gene set sampled by chance. In other words, enrichment can map canonical prior knowledge structured in the form of FGSs to the condition represented by altered genes.
In archaeogenetics, the term Ancient North Eurasian (ANE) is the name given to an ancestral component that represents the lineage of the people of the Mal'ta–Buret' culture and populations closely related to them, such as the Upper Paleolithic individuals from Afontova Gora in Siberia. Genetic studies also revealed that the ANE are closely related to the remains of the preceding Yana culture, which were named Ancient North Siberians (ANS). Ancient North Eurasians are predominantly of West Eurasian ancestry who arrived in Siberia via the "northern route", but also derive a significant amount of their ancestry from an East Eurasian source, having arrived to Siberia via the "southern route".
The complementarity plot (CP) is a graphical tool for structural validation of atomic models for both folded globular proteins and protein-protein interfaces. It is based on a probabilistic representation of preferred amino acid side-chain orientation, analogous to the preferred backbone orientation of Ramachandran plots). It can potentially serve to elucidate protein folding as well as binding. The upgraded versions of the software suite is available and maintained in github for both folded globular proteins as well as inter-protein complexes. The software is included in the bioinformatic tool suites OmicTools and Delphi tools.
FlexAID is a molecular docking software that can use small molecules and peptides as ligands and proteins and nucleic acids as docking targets. As the name suggests, FlexAID supports full ligand flexibility as well side-chain flexibility of the target. It does using a soft scoring function based on the complementarity of the two surfaces.
CP/NNLS, standing for "ChromoPainter (CP) non-negative least squares (NNLS)" is a statistical method used in genetics. "ChromoPainter" is the name of a tool for finding haplotypes in sequence data, in which each individual is "painted" as a combination of all other sequences. It is used in Principal Components Analysis (PCA) to create data summaries, or dating admixture events. Non-negative least squares (NNLS) is a kind of regression analysis, which aims at finding the best possible correlation between a large set of dependent variables. It here used within the functionalities of ChromoPainter.