This article includes a list of general references, but it lacks sufficient corresponding inline citations .(August 2021) |
Although the subject of sexual dimorphism is not in itself controversial, the measures by which it is assessed differ widely. Most of the measures are used on the assumption that a random variable is considered so that probability distributions should be taken into account. In this review, a series of sexual dimorphism measures are discussed concerning both their definition and the probability law on which they are based. Most of them are sample functions, or statistics, which account for only partial characteristics, for example the mean or expected value, of the distribution involved. Further, the most widely used measure fails to incorporate an inferential support.
It is widely known that sexual dimorphism is an important component of the morphological variation in biological populations (see, e.g., Klein and Cruz-Uribe, 1984; [1] Oxnard, 1987; [2] Kelley, 1993 [3] ). In higher Primates, sexual dimorphism is also related to some aspects of the social organization and behavior (Alexander et al., 1979; [4] Clutton-Brock, 1985 [5] ). Thus, it has been observed that the most dimorphic species tend to polygyny and a social organization based on male dominance, whereas in the less dimorphic species, monogamy and family groups are more common. Fleagle et al. (1980) [6] and Kay (1982), [7] on the other hand, have suggested that the behavior of extinct species can be inferred on the basis of sexual dimorphism and, e.g. Plavcan and van Schaick (1992) [8] think that sex differences in size among primate species reflect processes of an ecological and social nature. Some references on sexual dimorphism regarding human populations can be seen in Lovejoy (1981), [9] Borgognini Tarli and Repetto (1986) [10] and Kappelman (1996). [11]
These biological facts do not appear to be controversial. However, they are based on a series of different sexual dimorphism measures, or indices. Sexual dimorphism, in most works, is measured on the assumption that a random variable is being taken into account. This means that there is a law which accounts for the behavior of the whole set of values that compose the domain of the random variable, a law which is called distribution function. Because both studies of sexual dimorphism aim at establishing differences, in some random variable, between sexes and the behavior of the random variable is accounted for by its distribution function, it follows that a sexual dimorphism study should be equivalent to a study whose main purpose is to determine to what extent the two distribution functions - one per sex - overlap (see shaded area in Fig. 1, where two normal distributions are represented).
In Borgognini Tarli and Repetto (1986) an account of indices based on sample means can be seen. Perhaps, the most widely used is the quotient, [10]
where is the sample mean of one sex (e.g., male) and the corresponding mean of the other. Nonetheless, for instance,
have also been proposed.
Going over the works where these indices are used, the reader misses any reference to their parametric counterpart (see reference above). In other words, if we suppose that the quotient of two sample means is considered, no work can be found where, in order to make inferences, the way in which the quotient is used as a point estimate of
is discussed.
By assuming that differences between populations are the objective to analyze, when quotients of sample means are used it is important to point out that the only feature of these populations that seems to be interesting is the mean parameter. However, a population has also variance, as well as a shape which is defined by its distribution function (notice that, in general, this function depends on parameters such as means or variances).
Marini et al. (1999) [12] have illustrated that it is a good idea to consider something other than sample means when sexual dimorphism is analyzed. Possibly, the main reason is that the intrasexual variability influences both the manifestation of dimorphism and its interpretation.
It is likely that, within this type of indices, the one used the most is the well-known statistic with Student's t distribution see, for instance, Green, 1989. [13] Marini et al. (1999) [12] have observed that variability among females seems to be lower than among males, so that it appears advisable to use the form of the Student's t statistic with degrees of freedom given by the Welch-Satterthwaite approximation,
where are sample variances and sample sizes, respectively.
It is important to point out the following:
However, in sexual dimorphism analyses, it does not appear reasonably (see Ipiña and Durand, 2000 [14] ) to assume that two independent random samples have been selected. Rather on the contrary, when we sample we select some random observations - making up one sample - that sometimes correspond to one sex and sometimes to the other.
Chakraborty and Majumder (1982) [15] have proposed an index of sexual dimorphism that is the overlapping area - to be precise, its complement - of two normal density functions (see Fig. 1). Therefore, it is a function of four parameters (expected values and variances, respectively), and takes the shape of the two normals into account. Inman and Bradley (1989) [16] have discussed this overlapping area as a measure to assess the distance between two normal densities.
Regarding inferences, Chakraborty and Majumder proposed a sample function constructed by considering the Laplace-DeMoivre's theorem (an application to binomial laws of the central limit theorem). According to these authors, the variance of such a statistic is,
where is the statistic, and (male, female) stand for the estimate of the probability of observing the measurement of an individual of the sex in some interval of the real line, and the sample size of the i sex, respectively. Notice that this implies that two independent random variables with binomial distributions have to be regarded. One of such variables is number of individuals of the f sex in a sample of size composed of individuals of the f sex, which seems nonsensical.
Authors such as Josephson et al. (1996) [17] believe that the two sexes to be analyzed form a single population with a probabilistic behavior denominated a mixture of two normal populations. Thus, if is a random variable which is normally distributed among the females of a population and likewise this variable is normally distributed among the males of the population, then,
is the density of the mixture with two normal components, where are the normal densities and the mixing proportions of both sexes, respectively. See an example in Fig. 2 where the thicker curve represents the mixture whereas the thinner curves are the functions.
It is from a population modelled like this that a random sample with individuals of both sexes can be selected. Note that on this sample tests which are based on the normal assumption cannot be applied since, in a mixture of two normal components, is not a normal density.
Josephson et al. limited themselves to considering two normal mixtures with the same component variances and mixing proportions. [17] As a consequence, their proposal to measure sexual dimorphism is the difference between the mean parameters of the two normals involved. In estimating these central parameters, the procedure used by Josephson et al. is the one of Pearson's moments. [17] Nowadays, the EM expectation maximization algorithm (see McLachlan and Basford, 1988 [18] ) and the MCMC Markov chain Monte Carlo Bayesian procedure (see Gilks et al., 1996 [19] ) are the two competitors for estimating mixture parameters.
Possibly the main difference between considering two independent normal populations and a mixture model of two normal components is in the mixing proportions, which is the same as saying that in the two independent normal population model the interaction between sexes is ignored. This, in turn implies that probabilistic properties change (see Ipiña and Durand, 2000 [14] ).
Ipiña and Durand (2000, [14] 2004 [20] ) have proposed a measure of sexual dimorphism called . This proposal computes the overlapping area between the and functions, which represent the contribution of each sex to the two normal components mixture (see shaded area in Fig. 2). Thus, can be written,
being the real line.
The smaller the overlapping area the greater the gap between the two functions and , in which case the sexual dimorphism is greater. Obviously, this index is a function of the five parameters that characterize a mixture of two normal components . Its range is in the interval , and the interested reader can see, in the work of the authors who proposed the index, the way in which an interval estimate is constructed.
Marini et al. (1999) [12] have suggested the Kolmogorov-Smirnov distance as a measure of sexual dimorphism. The authors use the following form of the statistic,
with being sample cumulative distributions corresponding to two independent random samples.
Such a distance has the advantage of being applicable whatever the form of the random variable distributions concerned, yet they should be continuous. The use of this distance assumes that two populations are involved. Further, the Kolmogorov-Smirnov distance is a sample function whose aim is to test that the two samples under analysis have been selected from a single distribution. If one accepts the null hypothesis, then there is not sexual dimorphism; otherwise, there is.
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is . A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.
The weighted arithmetic mean is similar to an ordinary arithmetic mean, except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The notion of weighted mean plays a role in descriptive statistics and also occurs in a more general form in several other areas of mathematics.
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.
In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution. Equivalently, if Y has a normal distribution, then the exponential function of Y, X = exp(Y), has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicine, economics and other topics (e.g., energies, concentrations, lengths, prices of financial instruments, and other metrics).
In probability and statistics, Student's t distribution is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.
In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution for nonnegative-valued random variables. Up to rescaling, it coincides with the chi distribution with two degrees of freedom. The distribution is named after Lord Rayleigh.
In probability theory and statistics, the Gumbel distribution is used to model the distribution of the maximum of a number of samples of various distributions.
In probability theory, a distribution is said to be stable if a linear combination of two independent random variables with this distribution has the same distribution, up to location and scale parameters. A random variable is said to be stable if its distribution is stable. The stable distribution family is also sometimes referred to as the Lévy alpha-stable distribution, after Paul Lévy, the first mathematician to have studied it.
In probability theory and statistics, the Lévy distribution, named after Paul Lévy, is a continuous probability distribution for a non-negative random variable. In spectroscopy, this distribution, with frequency as the dependent variable, is known as a van der Waals profile. It is a special case of the inverse-gamma distribution. It is a stable distribution.
In probability theory and directional statistics, the von Mises distribution is a continuous probability distribution on the circle. It is a close approximation to the wrapped normal distribution, which is the circular analogue of the normal distribution. A freely diffusing angle on a circle is a wrapped normally distributed random variable with an unwrapped variance that grows linearly in time. On the other hand, the von Mises distribution is the stationary distribution of a drift and diffusion process on the circle in a harmonic potential, i.e. with a preferred orientation. The von Mises distribution is the maximum entropy distribution for circular data when the real and imaginary parts of the first circular moment are specified. The von Mises distribution is a special case of the von Mises–Fisher distribution on the N-dimensional sphere.
In probability theory and statistics, the chi distribution is a continuous probability distribution over the non-negative real line. It is the distribution of the positive square root of a sum of squared independent Gaussian random variables. Equivalently, it is the distribution of the Euclidean distance between a multivariate Gaussian random variable and the origin. It is thus related to the chi-squared distribution by describing the distribution of the positive square roots of a variable obeying a chi-squared distribution.
In statistics, the method of moments is a method of estimation of population parameters. The same principle is used to derive higher moments like skewness and kurtosis.
In probability theory, the inverse Gaussian distribution is a two-parameter family of continuous probability distributions with support on (0,∞).
The folded normal distribution is a probability distribution related to the normal distribution. Given a normally distributed random variable X with mean μ and variance σ2, the random variable Y = |X| has a folded normal distribution. Such a case may be encountered if only the magnitude of some variable is recorded, but not its sign. The distribution is called "folded" because probability mass to the left of x = 0 is folded over by taking the absolute value. In the physics of heat conduction, the folded normal distribution is a fundamental solution of the heat equation on the half space; it corresponds to having a perfect insulator on a hyperplane through the origin.
In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.
In probability theory and statistics, the normal-gamma distribution is a bivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and precision.
The auxiliary particle filter is a particle filtering algorithm introduced by Pitt and Shephard in 1999 to improve some deficiencies of the sequential importance resampling (SIR) algorithm when dealing with tailed observation densities.
In probability theory and directional statistics, a wrapped Cauchy distribution is a wrapped probability distribution that results from the "wrapping" of the Cauchy distribution around the unit circle. The Cauchy distribution is sometimes known as a Lorentzian distribution, and the wrapped Cauchy distribution may sometimes be referred to as a wrapped Lorentzian distribution.
In probability theory, the rectified Gaussian distribution is a modification of the Gaussian distribution when its negative elements are reset to 0. It is essentially a mixture of a discrete distribution and a continuous distribution as a result of censoring.