Gy's sampling theory

Last updated

Gy's sampling theory is a theory about the sampling of materials, developed by Pierre Gy from the 1950s to beginning 2000s [1] in articles and books including:

The abbreviation "TOS" is also used to denote Gy's sampling theory. [2]

Gy's sampling theory uses a model in which the sample taking is represented by independent Bernoulli trials for every particle in the parent population from which the sample is drawn. The two possible outcomes of each Bernoulli trial are: (1) the particle is selected and (2) the particle is not selected. The probability of selecting a particle may be different during each Bernoulli trial. The model used by Gy is mathematically equivalent to Poisson sampling. [3] Using this model, the following equation for the variance of the sampling error in the mass concentration in a sample was derived by Gy:

in which V is the variance of the sampling error, N is the number of particles in the population (before the sample was taken), q i is the probability of including the ith particle of the population in the sample (i.e. the first-order inclusion probability of the ith particle), m i is the mass of the ith particle of the population and a i is the mass concentration of the property of interest in the ith particle of the population.

It is noted that the above equation for the variance of the sampling error is an approximation based on a linearization of the mass concentration in a sample.

In the theory of Gy, correct sampling is defined as a sampling scenario in which all particles have the same probability of being included in the sample. This implies that q i no longer depends on i, and can therefore be replaced by the symbol q. Gy's equation for the variance of the sampling error becomes:

where abatch is the concentration of the property of interest in the population from which the sample is to be drawn and Mbatch is the mass of the population from which the sample is to be drawn. It has been noted that a similar equation had already been derived in 1935 by Kassel and Guy. [4] [5]

Two books covering the theory and practice of sampling are available; one is the Third Edition of a high-level monograph [6] and the other an introductory text. [7]

See also

Related Research Articles

Ficks laws of diffusion mathematical descriptions of molecular diffusion

Fick's laws of diffusion describe diffusion and were derived by Adolf Fick in 1856. They can be used to solve for the diffusion coefficient, D. Fick's first law can be used to derive his second law which in turn is identical to the diffusion equation.

In probability theory and statistics, a probability distribution is a mathematical function that provides the probabilities of occurrence of different possible outcomes in an experiment. In more technical terms, the probability distribution is a description of a random phenomenon in terms of the probabilities of events. For instance, if the random variable X is used to denote the outcome of a coin toss, then the probability distribution of X would take the value 0.5 for X = heads, and 0.5 for X = tails. Examples of random phenomena can include the results of an experiment or survey.

Negative binomial distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of failures occurs. For example, we can define that when we throw a dice and get a 6 it is a failure while rolling any other number is considered a success, and also choose r to be 3. We then throw the dice repeatedly until the third time the number 6 appears. In such a case, the probability distribution of the number of non-6s that appeared will be a negative binomial distribution.

Aerosol colloid of fine solid particles or liquid droplets, in air or another gas

An aerosol is a suspension of fine solid particles or liquid droplets, in air or another gas. Aerosols can be natural or anthropogenic. Examples of natural aerosols are fog, dust, forest exudates and geyser steam. Examples of anthropogenic aerosols are haze, particulate air pollutants and smoke. The liquid or solid particles have diameters typically <1 μm; larger particles with a significant settling speed make the mixture a suspension, but the distinction is not clear-cut. In general conversation, aerosol usually refers to an aerosol spray that delivers a consumer product from a can or similar container. Other technological applications of aerosols include dispersal of pesticides, medical treatment of respiratory illnesses, and combustion technology. Diseases can also spread by means of small droplets in the breath, also called aerosols.

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive is because of randomness or because the estimator does not account for information that could produce a more accurate estimate.

In chemistry, a mixture is a material made up of two or more different substances which are physically combined. A mixture is the physical combination of two or more substances in which the identities are retained and are mixed in the form of solutions, suspensions and colloids.

Chemometrics is the science of extracting information from chemical systems by data-driven means. Chemometrics is inherently interdisciplinary, using methods frequently employed in core data-analytic disciplines such as multivariate statistics, applied mathematics, and computer science, in order to address problems in chemistry, biochemistry, medicine, biology and chemical engineering. In this way, it mirrors other interdisciplinary fields, such as psychometrics and econometrics.

Pierre Maurice Gy was a chemist and statistician. Born in Paris, France, to Felix and Clemence, Gy graduated in chemical engineering from ESPCI ParisTech in 1946.

Studentized residual

In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. Typically the standard deviations of residuals in a sample vary greatly from one data point to another even when the errors all have the same standard deviation, particularly in regression analysis; thus it does not make sense to compare residuals at different data points without first studentizing. It is a form of a Student's t-statistic, with the estimate of error varying between points.

In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a k-sided die rolled n times. For n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories.

Monte Carlo integration

In mathematics, Monte Carlo integration is a technique for numerical integration using random numbers. It is a particular Monte Carlo method that numerically computes a definite integral. While other algorithms usually evaluate the integrand at a regular grid, Monte Carlo randomly choose points at which the integrand is evaluated. This method is particularly useful for higher-dimensional integrals.

Many letters of the Latin alphabet, both capital and small, are used in mathematics, science, and engineering to denote by convention specific or abstracted constants, variables of a certain type, units, multipliers, or physical entities. Certain letters, when combined with special formatting, take on special meaning.

Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which the errors covariance matrix is allowed to be different from an identity matrix. WLS is also a specialization of generalized least squares in which the above matrix is diagonal.

In statistical physics, the BBGKY hierarchy is a set of equations describing the dynamics of a system of a large number of interacting particles. The equation for an s-particle distribution function in the BBGKY hierarchy includes the (s + 1)-particle distribution function, thus forming a coupled chain of equations. This formal theoretic result is named after Nikolay Bogolyubov, Max Born, Herbert S. Green, John Gamble Kirkwood, and Jacques Yvon.

Jan Visman was a Dutch statistician who played a key role in building a bridge between sampling theory with its homogeneous populations and sampling practice with its heterogeneous sampling units and sample spaces.

The sample mean or empirical mean and the sample covariance are statistics computed from a collection of data on one or more random variables. The sample mean and sample covariance are estimators of the population mean and population covariance, where the term population refers to the set from which the sample was taken.

Diffusion Net movement of molecules or atoms from a region of high concentration (or high chemical potential) to a region of low concentration (or low chemical potential)

Diffusion is the net movement of anything from a region of higher concentration to a region of lower concentration. Diffusion is driven by a gradient in concentration.

During sampling of granular materials, correct sampling is defined in Gy's sampling theory as a sampling scenario in which all particles in a population have the same probability of ending up in the sample.

Homogeneous and heterogeneous mixtures forms which mixtures may take relating to uniformity

A homogeneous mixture is a solid, liquid, or gaseous mixture that has the same proportions of its components throughout any given sample. Conversely, a heterogeneous mixture has components in which proportions vary throughout the sample. "Homogeneous" and "heterogeneous" are not absolute terms, but are dependent on context and the size of the sample. In chemistry, if the volume of a homogeneous suspension is divided in half, the same amount of material is suspended in both halves of the substance. An example of a homogeneous mixture is air.

In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials that are not necessarily identically distributed. The concept is named after Siméon Denis Poisson.

References

  1. Gy, P (2004), Chemometrics and Intelligent Laboratory Systems, 74, 61-70.
  2. K.H. Esbensen. 50 years of Pierre Gy's “Theory of Sampling”—WCSB1: a tribute. Chemometrics and Intelligent Laboratory Systems. Volume 74, Issue 1, 28 November 2004, pages 36.
  3. Geelhoed, B.; Glass, H. J. (2004). "Comparison of theories for the variance caused by the sampling of random mixtures of non-identical particles". Geostandards and Geoanalytical Research. 28 (2): 263–276. doi:10.1111/j.1751-908X.2004.tb00742.x.
  4. Kassel, L. S.; Guy, T. W. (1935). "Determining the correct weight of sample in coal sampling". Industrial & Engineering Chemistry Analytical Edition. 7 (2): 112–115. doi:10.1021/ac50094a013.
  5. Cheng, H.; Geelhoed, B.; Bode, P. (2011). "A Markov Chain Monte Carlo comparison of variance estimators for the sampling of particulate mixtures". Applied Stochastic Models in Business and Industry. 29 (3): 187–198. doi:10.1002/asmb.878.
  6. Pitard, Francis (2019). Theory of sampling and sampling practice (Third ed.). Boca Raton, FL: Chapman and Hall/CRC. ISBN   978-1-351-10592-7. OCLC   1081315442.
  7. Esbensen, Kim (2020). Introduction to the Theory and Practice of Sampling. Chichester, UK: IM Publications Open. ISBN   978-1-906715-29-8.