Analysis of molecular variance

Last updated

Analysis of molecular variance (AMOVA), is a statistical model for the molecular algorithm in a single species, typically biological. [1] The name and model are inspired by ANOVA. The method was developed by Laurent Excoffier, Peter Smouse and Joseph Quattro at Rutgers University in 1992.

Since developing AMOVA, Excoffier has written a program for running such analyses. This program, which runs on Windows, is called Arlequin and is freely available on Excoffier's website. There are also implementations in R language in the ade4 and the pegas packages, both available on CRAN (Comprehensive R Archive Network). Another implementation is in Info-Gen, which also runs on Windows. The student version is free and fully functional. Native language of the application is Spanish but an English version is also available.

An additional free statistical package, GenAlEx, [2] is geared toward teaching as well as research and allows for complex genetic analyses to be employed and compared within the commonly used Microsoft Excel interface. This software allows for calculation of analyses such as AMOVA, as well as comparisons with other types of closely related statistics including F-statistics and Shannon's index, and more.

Related Research Articles

Molecular phylogenetics is the branch of phylogeny that analyzes genetic, hereditary molecular differences, predominately in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to determine the processes by which diversity among species has been achieved. The result of a molecular phylogenetic analysis is expressed in a phylogenetic tree. Molecular phylogenetics is one aspect of molecular systematics, a broader term that also includes the use of molecular data in taxonomy and biogeography.

Heritability Estimation of effect of genetic variation on phenotypic variation of a trait

Heritability is a statistic used in the fields of breeding and genetics that estimates the degree of variation in a phenotypic trait in a population that is due to genetic variation between individuals in that population. It measures how much of the variation of a trait can be attributed to variation of genetic factors, as opposed to variation of environmental factors. The concept of heritability can be expressed in the form of the following question: "What is the proportion of the variation in a given trait within a population that is not explained by the environment or random chance?"

Biopython Collection of open-source Python software tools for computational biology

The Biopython project is an open-source collection of non-commercial Python tools for computational biology and bioinformatics, created by an international association of developers. It contains classes to represent biological sequences and sequence annotations, and it is able to read and write to a variety of file formats. It also allows for a programmatic means of accessing online databases of biological information, such as those at NCBI. Separate modules extend Biopython's capabilities to sequence alignment, protein structure, population genetics, phylogenetics, sequence motifs, and machine learning. Biopython is one of a number of Bio* projects designed to reduce code duplication in computational biology.

In population genetics, F-statistics describe the statistically expected level of heterozygosity in a population; more specifically the expected degree of (usually) a reduction in heterozygosity when compared to Hardy–Weinberg expectation.

Nucleotide diversity is a concept in molecular genetics which is used to measure the degree of polymorphism within a population.

SAGA GIS

System for Automated Geoscientific Analyses is a geographic information system (GIS) computer program, used to edit spatial data. It is free and open-source software, developed originally by a small team at the Department of Physical Geography, University of Göttingen, Germany, and is now being maintained and extended by an international developer community.

Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.

Spatial network analysis software packages are analytic software used to prepare graph-based analysis of spatial networks. They stem from research fields in transportation, architecture, and urban planning. The earliest examples of such software include the work of Garrison (1962), Kansky (1963), Levin (1964), Harary (1969), Rittel (1967), Tabor (1970) and others in the 1960s and 70s. Specific packages address to suit their domain-specific needs, including TransCAD for transportation, GIS for planning and geography, and Axman for Space syntax researchers.

Coalescent theory is a model of how alleles sampled from a population may have originated from a common ancestor. In the simplest case, coalescent theory assumes no recombination, no natural selection, and no gene flow or population structure, meaning that each variant is equally likely to have been passed from one generation to the next. The model looks backward in time, merging alleles into a single ancestral copy according to a random process in coalescence events. Under this model, the expected time between successive coalescence events increases almost exponentially back in time. Variance in the model comes from both the random passing of alleles from one generation to the next, and the random occurrence of mutations in these alleles.

The fixation index (FST) is a measure of population differentiation due to genetic structure. It is frequently estimated from genetic polymorphism data, such as single-nucleotide polymorphisms (SNP) or microsatellites. Developed as a special case of Wright's F-statistics, it is one of the most commonly used statistics in population genetics.

Arlequin is a free population genetics software distributed as an integrated GUI data analysis software. It performs several types of tests and calculations, including Fixation index, computing genetic distance, Hardy–Weinberg equilibrium, linkage disequilibrium, analysis of molecular variance, mismatch distribution, and pairwise difference tests.

OpenEpi

OpenEpi is a free, web-based, open source, operating system-independent series of programs for use in epidemiology, biostatistics, public health, and medicine, providing a number of epidemiologic and statistical tools for summary data. OpenEpi was developed in JavaScript and HTML, and can be run in modern web browsers. The program can be run from the OpenEpi website or downloaded and run without a web connection. The source code and documentation is downloadable and freely available for use by other investigators. OpenEpi has been reviewed, both by media organizations and in research journals.

Psychometric software is software that is used for psychometric analysis of data from tests, questionnaires, or inventories reflecting latent psychoeducational variables. While some psychometric analyses can be performed with standard statistical software like SPSS, most analyses require specialized tools.

GeneNetwork is a combined database and open-source bioinformatics data analysis software resource for systems genetics. This resource is used to study gene regulatory networks that link DNA sequence differences to corresponding differences in gene and protein expression and to variation in traits such as health and disease risk. Data sets in GeneNetwork are typically made up of large collections of genotypes and phenotypes from groups of individuals, including humans, strains of mice and rats, and organisms as diverse as Drosophila melanogaster, Arabidopsis thaliana, and barley. The inclusion of genotypes makes it practical to carry out web-based gene mapping to discover those regions of genomes that contribute to differences among individuals in mRNA, protein, and metabolite levels, as well as differences in cell function, anatomy, physiology, and behavior.

Molecular Evolutionary Genetics Analysis Software for statistical analysis of molecular evolution

Molecular Evolutionary Genetics Analysis (MEGA) is computer software for conducting statistical analysis of molecular evolution and for constructing phylogenetic trees. It includes many sophisticated methods and tools for phylogenomics and phylomedicine. It is licensed as proprietary freeware. The project for developing this software was initiated by the leadership of Masatoshi Nei in his laboratory at the Pennsylvania State University in collaboration with his graduate student Sudhir Kumar and postdoctoral fellow Koichiro Tamura. Nei wrote a monograph (pp. 130) outlining the scope of the software and presenting new statistical methods that were included in MEGA. The entire set of computer programs was written by Kumar and Tamura. The personal computers then lacked the ability to send the monograph and software electronically, so they were delivered by postal mail. From the start, MEGA was intended to be easy-to-use and include solid statistical methods only.

Geneticist Biologist who studies genetics and performs general research on genetic technologies and processes

A geneticist is a biologist who studies genetics, the science of genes, heredity, and variation of organisms. A geneticist can be employed as a scientist or a lecturer. Geneticists may perform general research on genetic processes or develop genetic technologies to aid in the pharmaceutical or and agriculture industries. Some geneticists perform experiments in model organisms such as Drosophila, C. elegans, zebrafish, rodents or humans and analyze data to interpret the inheritance of biological traits. A geneticist can be a scientist who has earned a PhD in genetics or a physician who has been trained in genetics as a specialization. They evaluate, diagnose, and manage patients with hereditary conditions or congenital malformations, genetic risk calculations, and mutation analysis, as well as refer patients to other medical specialists. The geneticist carries out studies, tests and counsels patients with genetic disorders.

Y Chromosome Haplotype Reference Database

The Y Chromosome Haplotype Reference Database (YHRD) is an open-access, annotated collection of population samples typed for Y chromosomal sequence variants. Two important objectives are pursued: (1) the generation of reliable frequency estimates for Y-STR haplotypes and Y-SNP haplotypes to be used in the quantitative assessment of matches in forensic and kinship cases and (2) the characterization of male lineages to draw conclusions about the origins and history of human populations. The database is endorsed by the International Society for Forensic Genetics (ISFG). By December 2021 343,932 9-STR locus haplotypes, among them 283,483 17-STR locus haplotypes, 99,962 23-STR locus haplotypes, 100,932 27-STR locus haplotypes and 30,988 Y SNP haplotypes sampled for 141 countries have been directly submitted by forensic institutions and universities. In geographic terms, 47% of the YHRD samples stem from Asia, 23% from Europe, 14% from North America, 11% from Latin America, 3% from Africa, 1% from Oceania/Australia and 0.3% from the Arctic. The 1,398 individual sampling projects are described in more than 750 peer-reviewed publications

Genome-wide complex trait analysis (GCTA) Genome-based restricted maximum likelihood (GREML) is a statistical method for variance component estimation in genetics which quantifies the total narrow-sense (additive) contribution to a trait's heritability of a particular subset of genetic variants. This is done by directly quantifying the chance genetic similarity of unrelated individuals and comparing it to their measured similarity on a trait; if two unrelated individuals are relatively similar genetically and also have similar trait measurements, then the measured genetics are likely to causally influence that trait, and the correlation can to some degree tell how much. This can be illustrated by plotting the squared pairwise trait differences between individuals against their estimated degree of relatedness. The GCTA framework can be applied in a variety of settings. For example, it can be used to examine changes in heritability over aging and development. It can also be extended to analyse bivariate genetic correlations between traits. There is an ongoing debate about whether GCTA generates reliable or stable estimates of heritability when used on current SNP data. The method is based on the outdated and false dichotomy of genes versus the environment. It also suffers from serious methodological weaknesses, such as susceptibility to population stratification.

References

  1. Excoffier, L; Smouse, Pe; Quattro, Jm (Jun 1992). "Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data" (Free full text). Genetics. 131 (2): 479–91. doi:10.1093/genetics/131.2.479. ISSN   0016-6731. PMC   1205020 . PMID   1644282.
  2. Peakall, R. and Smouse P.E. (2012) GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research-an update. Bioinformatics 28, 2537–2539.