ASReml

Last updated

ASReml is a statistical software package for fitting linear mixed models using restricted maximum likelihood, a technique commonly used in plant and animal breeding and quantitative genetics as well as other fields. It is notable for its ability to fit very large and complex data sets efficiently, due to its use of the average information algorithm [1] and sparse matrix methods.

Mixed model statistical model containing both fixed effects and random effects

A mixed model is a statistical model containing both fixed effects and random effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences. They are particularly useful in settings where repeated measurements are made on the same statistical units, or where measurements are made on clusters of related statistical units. Because of their advantage in dealing with missing values, mixed effects models are often preferred over more traditional approaches such as repeated measures ANOVA.

In statistics, the restrictedmaximum likelihood (REML) approach is a particular form of maximum likelihood estimation that does not base estimates on a maximum likelihood fit of all the information, but instead uses a likelihood function calculated from a transformed set of data, so that nuisance parameters have no effect.

Plant breeding The art and science of changing the traits of plants in order to produce desired characteristics

Plant breeding is the science of changing the traits of plants in order to produce desired characteristics. It has been used to improve the quality of nutrition in products for humans and animals. Plant breeding can be accomplished through many different techniques ranging from simply selecting plants with desirable characteristics for propagation, to methods that make use of knowledge of genetics and chromosomes, to more complex molecular techniques. Genes in a plant are what determine what type of qualitative or quantitative traits it will have. Plant breeders strive to create a specific outcome of plants and potentially new plant varieties.

It was originally developed by Arthur Gilmour. [2]

ASREML can be used in Windows, Linux, and as an add-on to S-PLUS and R.

S-PLUS is a commercial implementation of the S programming language sold by TIBCO Software Inc.

R (programming language) programming language for statistical computing

R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, data mining surveys, and studies of scholarly literature databases show substantial increases in popularity in recent years. as of March 2019, R ranks 14th in the TIOBE index, a measure of popularity of programming languages.

Related Research Articles

Binary search algorithm Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array. Even though the idea is simple, implementing binary search correctly requires attention to some subtleties about its exit conditions and midpoint calculation, particularly if the values in the array are not all of the whole numbers in the range.

Hash table Associates data values with key values - a lookup table

In computing, a hash table is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.

Supervised learning machine learning task of learning a function that maps an input to an output based on example input-output pairs

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value. A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way.

In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most frequently used orders are numerical order and lexicographical order. Efficient sorting is important for optimizing the efficiency of other algorithms which require input data to be in sorted lists. Sorting is also often useful for canonicalizing data and for producing human-readable output. More formally, the output of any sorting algorithm must satisfy two conditions:

  1. The output is in nondecreasing order ;
  2. The output is a permutation of the input.
Linear programming

Linear programming is a method to achieve the best outcome in a mathematical model whose requirements are represented by linear relationships. Linear programming is a special case of mathematical programming.

Perceptron algorithm for supervised learning of binary classifiers

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

Time complexity An estimate of time taken for running an algorithm

In computer science, the time complexity is the computational complexity that describes the amount of time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to differ by at most a constant factor.

Cluster analysis Task of grouping a set of objects so that objects in the same group (or cluster) are more similar to each other than to those in other clusters

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.

In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features for use in model construction. Feature selection techniques are used for four reasons:

Stochastic gradient descent, also known as incremental gradient descent, is an iterative method for optimizing a differentiable objective function, a stochastic approximation of gradient descent optimization. A 2018 article implicitly credits Herbert Robbins and Sutton Monro for developing SGD in their 1951 article titled "A Stochastic Approximation Method"; see Stochastic approximation for more information. It is called stochastic because samples are selected randomly instead of as a single group or in the order they appear in the training set.

Code-excited linear prediction (CELP) is a speech coding algorithm originally proposed by M. R. Schroeder and B. S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algorithms, such as residual-excited linear prediction and linear predictive coding vocoders. Along with its variants, such as algebraic CELP, relaxed CELP, low-delay CELP and vector sum excited linear prediction, it is currently the most widely used speech coding algorithm. It is also used in MPEG-4 Audio speech coding. CELP is commonly used as a generic term for a class of algorithms and not for a particular codec.

Genstat

Genstat is a statistical software package with data analysis capabilities, particularly in the field of agriculture.

Electronic circuit simulation circuit behavior replication; uses mathematical models to replicate the behavior of an actual electronic device or circuit

Electronic circuit simulation uses mathematical models to replicate the behavior of an actual electronic device or circuit. Simulation software allows for modeling of circuit operation and is an invaluable analysis tool. Due to its highly accurate modeling capability, many colleges and universities use this type of software for the teaching of electronics technician and electronics engineering programs. Electronics simulation software engages the user by integrating him or her into the learning experience. These kinds of interactions actively engage learners to analyze, synthesize, organize, and evaluate content and result in learners constructing their own knowledge.

Ensemble learning in statistics and machine learning, using multiple algorithms

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives.

Determining the number of clusters in a data set, a quantity often labelled k as in the k-means algorithm, is a frequent problem in data clustering, and is a distinct issue from the process of actually solving the clustering problem.

MOSEK is a software package for the solution of linear, mixed-integer linear, quadratic, mixed-integer quadratic, quadratically constraint, conic and convex nonlinear mathematical optimization problems. The emphasis in MOSEK is on solving large scale sparse problems. Particularly the interior-point optimizer for linear, conic quadratic and semi-definite problems is very efficient. A special feature of the MOSEK interior-point optimizer is that it is based on the so-called homogeneous model which implies MOSEK can reliably detect a primal and/or dual infeasible status as documented in several published papers.

In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters are learned.

References

  1. Gilmour, A.R., Thompson, R. and Cullis, B.R. (1995). Average Information REML, an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51, 1440-50
  2. Arthur Gilmour, Cargovale