Probability-proportional-to-size sampling

Last updated September 04, 2025

In survey methodology, probability-proportional-to-size (pps) sampling is a sampling process where each element of the population (of size N) has some (independent) chance $p_{i}$ to be selected to the sample when performing one draw. This $p_{i}$ is proportional to some known quantity $x_{i}$ so that $p_{i}={\frac {x_{i}}{\sum _{i=1}^{N}x_{i}}}$ .^[1]^: 97^[2]

One of the cases this occurs in, as developed by Hanson and Hurwitz in 1943,^[3] is when we have several clusters of units, each with a different (known upfront) number of units, then each cluster can be selected with a probability that is proportional to the number of units inside it.^[4]^: 250 So, for example, if we have 3 clusters with 10, 20 and 30 units each, then the chance of selecting the first cluster will be 1/6, the second would be 1/3, and the third cluster will be 1/2.

The pps sampling results in a fixed sample size n (as opposed to Poisson sampling which is similar but results in a random sample size with expectancy of n). When selecting items with replacement the selection procedure is to just draw one item at a time (like getting n draws from a multinomial distribution with N elements, each with their own $p_{i}$ selection probability). If doing a without-replacement sampling, the schema can become more complex.^[1]^: 93

Another sampling method, Reservoir sampling, is 'Weighted random sampling with a reservoir', which offers an algorithm for drawing a weighted random sample of size m from a population of n weighted items, where m⩽n, in one-pass over unknown population size.^[5]

Distribution and properties

If observations from some distribution F are sampled in a way that is proportional to their value, then the distribution of the values in that sample follows a Length-biased distribution, with the following density function:^[6]^: 2^[7]

$g(x)=xf(x)/E[x]$

Also: $E[g(x)]=E[x^{2}]/E[x]$

Notice that this would assume that the PPS sampling is done with replacement (or if the sample size is much smaller than the population size).

References

1 2 Carl-Erik Sarndal; Bengt Swensson; Jan Wretman (1992). Model Assisted Survey Sampling. Springer. ISBN 978-0-387-97528-3.
↑ Skinner, Chris J. (2016). "Probability Proportional to Size (PPS) Sampling". Wiley StatsRef: Statistics Reference Online. pp. 1–5. doi:10.1002/9781118445112.stat03346.pub2. ISBN 978-1-118-44511-2.
↑ Hansen, Morris H.; Hurwitz, William N. (1943). "On the Theory of Sampling from Finite Populations". The Annals of Mathematical Statistics. 14 (4): 333–362. doi:10.1214/aoms/1177731356. JSTOR 2235923.
↑ Cochran, W. G. (1977). Sampling Techniques (3rd ed.). Nashville, TN: John Wiley & Sons. ISBN 978-0-471-16240-7
↑ Efraimidis, Pavlos S.; Spirakis, Paul G. (March 2006). "Weighted random sampling with a reservoir". Information Processing Letters. 97 (5): 181–185. doi:10.1016/j.ipl.2005.11.003.
↑ Mustafa, Abdelfattah; Khan, M. I. (June 2022). "The length-biased power hazard rate distribution: Some properties and applications". Statistics in Transition. New Series. 23 (2): 1–16. doi:10.2478/stattrans-2022-0013. hdl: 10419/266304 .
↑ Lee, Kyeongjun (30 November 2024). "Estimation of length biased exponential distribution based on progressive hybrid censoring". Communications for Statistical Applications and Methods. 31 (6): 661–675. doi: 10.29220/CSAM.2024.31.6.661 .

This statistics-related article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[sarndal1992-1] 1 2 Carl-Erik Sarndal; Bengt Swensson; Jan Wretman (1992). Model Assisted Survey Sampling. Springer. ISBN 978-0-387-97528-3.

[2] Skinner, Chris J. (2016). "Probability Proportional to Size (PPS) Sampling". Wiley StatsRef: Statistics Reference Online. pp. 1–5. doi:10.1002/9781118445112.stat03346.pub2. ISBN 978-1-118-44511-2.

[3] Hansen, Morris H.; Hurwitz, William N. (1943). "On the Theory of Sampling from Finite Populations". The Annals of Mathematical Statistics. 14 (4): 333–362. doi:10.1214/aoms/1177731356. JSTOR 2235923.

[Cochran1977-4] Cochran, W. G. (1977). Sampling Techniques (3rd ed.). Nashville, TN: John Wiley & Sons. ISBN 978-0-471-16240-7

[5] Efraimidis, Pavlos S.; Spirakis, Paul G. (March 2006). "Weighted random sampling with a reservoir". Information Processing Letters. 97 (5): 181–185. doi:10.1016/j.ipl.2005.11.003.

[6] Mustafa, Abdelfattah; Khan, M. I. (June 2022). "The length-biased power hazard rate distribution: Some properties and applications". Statistics in Transition. New Series. 23 (2): 1–16. doi:10.2478/stattrans-2022-0013. hdl: 10419/266304 .

[7] Lee, Kyeongjun (30 November 2024). "Estimation of length biased exponential distribution based on progressive hybrid censoring". Communications for Statistical Applications and Methods. 31 (6): 661–675. doi: 10.29220/CSAM.2024.31.6.661 .

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Probability-proportional-to-size sampling

Contents

Distribution and properties

See also

References