Survey sampling

Last updated March 30, 2024

In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey. The term "survey" may refer to many different types or techniques of observation. In survey sampling it most often involves a questionnaire used to measure the characteristics and/or attitudes of people. Different ways of contacting members of a sample once they have been selected is the subject of survey data collection. The purpose of sampling is to reduce the cost and/or the amount of work that it would take to survey the entire target population. A survey that measures the entire target population is called a census. A sample refers to a group or section of a population from which information is to be obtained.

Survey samples can be broadly divided into two types: probability samples and super samples. Probability-based samples implement a sampling plan with specified probabilities (perhaps adapted probabilities specified by an adaptive procedure). Probability-based sampling allows design-based inference about the target population. The inferences are based on a known objective probability distribution that was specified in the study protocol. Inferences from probability-based surveys may still suffer from many types of bias.

Surveys that are not based on probability sampling have greater difficulty measuring their bias or sampling error.^[1] Surveys based on non-probability samples often fail to represent the people in the target population.^[2]

In academic and government survey research, probability sampling is a standard procedure. In the United States, the Office of Management and Budget's "List of Standards for Statistical Surveys" states that federally funded surveys must be performed:

selecting samples using generally accepted statistical methods (e.g., probabilistic methods that can provide estimates of sampling error). Any use of nonprobability sampling methods (e.g., cut-off or model-based samples) must be justified statistically and be able to measure estimation error.^[3]

Random sampling and design-based inference are supplemented by other statistical methods, such as model-assisted sampling and model-based sampling.^[4]^[5]

For example, many surveys have substantial amounts of nonresponse. Even though the units are initially chosen with known probabilities, the nonresponse mechanisms are unknown. For surveys with substantial nonresponse, statisticians have proposed statistical models with which the data sets are analyzed.

Issues related to survey sampling are discussed in several sources, including Salant and Dillman (1994).^[6]

Probability sampling

In a probability sample (also called "scientific" or "random" sample) each member of the target population has a known and non-zero probability of inclusion in the sample.^[7] A survey based on a probability sample can in theory produce statistical measurements of the target population that are unbiased, because the expected value of the sample mean is equal to the population mean, E(ȳ)=μ, or have a measurable sampling error, which can be expressed as a confidence interval or margin of error.^[8]^[9]

A probability-based survey sample is created by constructing a list of the target population, called the sampling frame, a randomized process for selecting units from the sample frame, called a selection procedure, and a method of contacting selected units to enable them to complete the survey, called a data collection method or mode.^[10] For some target populations this process may be easy; for example, sampling the employees of a company by using payroll lists. However, in large, disorganized populations simply constructing a suitable sample frame is often a complex and expensive task.

Common methods of conducting a probability sample of the household population in the United States are Area Probability Sampling, Random Digit Dial telephone sampling, and more recently, Address-Based Sampling.^[11]

Within probability sampling, there are specialized techniques such as stratified sampling and cluster sampling that improve the precision or efficiency of the sampling process without altering the fundamental principles of probability sampling.

Stratification is the process of dividing members of the population into homogeneous subgroups before sampling, based on auxiliary information about each sample unit. The strata should be mutually exclusive: every element in the population must be assigned to only one stratum. The strata should also be collectively exhaustive: no population element can be excluded. Then methods such as simple random sampling or systematic sampling can be applied within each stratum. Stratification often improves the representativeness of the sample by reducing sampling error.

Bias in probability sampling

Bias in surveys is undesirable, but often unavoidable. The major types of bias that may occur in the sampling process are:

Non-response bias: When individuals or households selected in the survey sample cannot or will not complete the survey there is the potential for bias to result from this non-response. Nonresponse bias occurs when the observed value deviates from the population parameter due to differences between respondents and nonrespondents.^[12]
Response bias: This is not the opposite of non-response bias, but instead relates to a possible tendency of respondents to give inaccurate or untruthful answers for various reasons.
Selection Bias: Selection bias occurs when some units have a differing probability of selection that is unaccounted for by the researcher. For example, some households have multiple phone numbers making them more likely to be selected in a telephone survey than households with only one phone number. This selection bias would be corrected by applying a survey weight equal to [1/(# of phone numbers)] to each household.
Self-selection bias: A type of bias in which individuals voluntarily select themselves into a group, thereby potentially biasing the response of that group.
Participation bias: Bias that arises due to the characteristics of those who choose to participate in a survey or poll.
Coverage bias: Coverage bias can occur when population members do not appear in the sample frame (undercoverage). Coverage bias occurs when the observed value deviates from the population parameter due to differences between covered and non-covered units. Telephone surveys suffer from a well known source of coverage bias because they cannot include households without telephones.

Non-probability sampling

Many surveys are not based on probability samples, but rather on finding a suitable collection of respondents to complete the survey. Some common examples of non-probability sampling are:^[13]

Judgement Samples: A researcher decides which population members to include in the sample based on his or her judgement. The researcher may provide some alternative justification for the representativeness of the sample. The underlying assumption is that the investigator will select units that are characteristic of the population. This method can be subjected to researcher's biases and perception.^[14]
Snowball Samples: Often used when a target population is rare. Members of the target population recruit other members of the population for the survey.
Quota Samples: The sample is designed to include a designated number of people with certain specified characteristics. For example, 100 coffee drinkers. This type of sampling is common in non-probability market research surveys.
Convenience Samples: The sample is composed of whatever persons can be most easily accessed to fill out the survey.

In non-probability samples the relationship between the target population and the survey sample is immeasurable and potential bias is unknowable. Sophisticated users of non-probability survey samples tend to view the survey as an experimental condition, rather than a tool for population measurement, and examine the results for internally consistent relationships.

Related Research Articles

In statistics, cluster sampling is a sampling plan used when mutually homogeneous yet internally heterogeneous groupings are evident in a statistical population. It is often used in marketing research.

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistical inference, and the actions and deductions that satisfy the basic principles stated for these different approaches. Within a given approach, statistical theory gives ways of comparing statistical procedures; it can find a best possible procedure within a given context for given statistical problems, or can provide guidance on the choice between alternative procedures.

Statistics, like all mathematical disciplines, does not infer valid conclusions from nothing. Inferring interesting conclusions about real statistical populations almost always requires some background assumptions. Those assumptions must be made carefully, because incorrect assumptions can generate wildly inaccurate conclusions.

Randomization is a statistical process in which a random mechanism is employed to select a sample from a population or assign subjects to different groups. The process is crucial in ensuring the random allocation of experimental units or treatment protocols, thereby minimizing selection bias and enhancing the statistical validity. It facilitates the objective comparison of treatment effects in experimental design, as it equates groups statistically by balancing both known and unknown factors at the outset of the study. In statistical terms, it underpins the principle of probabilistic equivalence among groups, allowing for the unbiased estimation of treatment effects and the generalizability of conclusions drawn from sample data to the broader population.

In statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt to collect samples that are representative of the population. Sampling has lower costs and faster data collection compared to recording data from the entire population, and thus, it can provide insights in cases where it is infeasible to measure an entire population.

Sampling is the use of a subset of the population to represent the whole population or to inform about (social) processes that are meaningful beyond the particular cases, individuals or sites studied. Probability sampling, or random sampling, is a sampling technique in which the probability of getting any particular sample may be calculated. In cases where external validity is not of critical importance to the study's goals or purpose, researchers might prefer to use nonprobability sampling. Nonprobability sampling does not meet this criterion. Nonprobability sampling techniques are not intended to be used to infer from the sample to the general population in statistical terms. Instead, for example, grounded theory can be produced through iterative nonprobability sampling until theoretical saturation is reached.

Survey methodology is "the study of survey methods". As a field of applied statistics concentrating on human-research surveys, survey methodology studies the sampling of individual units from a population and associated techniques of survey data collection, such as questionnaire construction and methods for improving the number and accuracy of responses to surveys. Survey methodology targets instruments or procedures that ask one or more questions that may or may not be answered.

Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory.

Sample size determination or estimation is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power. In complex studies, different sample sizes may be allocated, such as in stratified surveys or experimental designs with multiple treatment groups. In a census, data is sought for an entire population, hence the intended sample size is equal to the population. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

In survey research, response rate, also known as completion rate or return rate, is the number of people who answered the survey divided by the number of people in the sample. It is usually expressed in the form of a percentage. The term is also used in direct marketing to refer to the number of people who responded to an offer.

In statistics, resampling is the creation of new samples based on one observed sample. Resampling methods are:

Permutation tests
Bootstrapping
Cross validation

Participation bias or non-response bias is a phenomenon in which the results of elections, studies, polls, etc. become non-representative because the participants disproportionately possess certain traits which affect the outcome. These traits mean the sample is systematically different from the target population, potentially resulting in biased estimates.

In survey methodology, the design effect is a measure of the expected impact of a sampling design on the variance of an estimator for some parameter. It is calculated as the ratio of the variance of an estimator based on a sample from an (often) complex sampling design, to the variance of an alternative estimator based on a simple random sample (SRS) of the same number of elements. The $can be used to adjust the variance of an estimator in cases where the sample is not drawn using simple random sampling. It may also be useful in sample size calculations and for quantifying the representativeness of a sample. The term "design effect" was coined by Leslie Kish in 1965.$

In survey sampling, Total Survey Error includes all forms of survey error including sampling variability, interviewer effects, frame errors, response bias, and non-response bias. Total Survey Error is discussed in detail in many sources including Salant and Dillman.

With the application of probability sampling in the 1930s, surveys became a standard tool for empirical research in social sciences, marketing, and official statistics. The methods involved in survey data collection are any of a number of ways in which data can be collected for a statistical survey. These are methods that are used to collect information from a sample of individuals in a systematic way. First there was the change from traditional paper-and-pencil interviewing (PAPI) to computer-assisted interviewing (CAI). Now, face-to-face surveys (CAPI), telephone surveys (CATI), and mail surveys are increasingly replaced by web surveys. In addition, remote interviewers could possibly keep the respondent engaged while reducing cost as compared to in-person interviewers.

Convenience sampling is a type of non-probability sampling that involves the sample being drawn from that part of the population that is close to hand.

References

↑ "Non-Probability Sampling - AAPOR". www.aapor.org. Retrieved 2020-05-24.
↑ Weisberg, Herbert F. (2005), The Total Survey Error Approach, University of Chicago Press: Chicago. p.231.
↑ "Archived copy" (PDF). Office of Management and Budget . Retrieved 2009-06-17– via National Archives.
↑ Lohr. Brewer. Swedes
↑ Richard Valliant, Alan H. Dorfman, and Richard M. Royall (2000), Finite Population Sampling and Inference: A Prediction Approach, Wiley, New York, p. 19
↑ Salant, Priscilla, I. Dillman, and A. Don. How to conduct your own survey. No. 300.723 S3. 1994.
↑ Kish, L. (1965), Survey Sampling, New York: Wiley. p. 20
↑ Kish, L. (1965), Survey Sampling, New York: Wiley. p.59
↑ "Why Sampling Works - AAPOR".
↑ Groves et al., Survey Methodology, Wiley: New York.
↑ Michael W. Link, Michael P. Battaglia, Martin R. Frankel, Larry Osborn, and Ali H. Mokdad, A Comparison of Address-Based Sampling (ABS) Versus Random-Digit Dialing (RDD) for General Population Surveys; Public Opinion Q, Spring 2008; 72: 6 - 27.
↑ "Glossary - NCES Statistical Standards". nces.ed.gov.
↑ "Survey Sampling Methods". www.statpac.com.
↑ Government of Canada, Statistics Canada; Government of Canada, Statistics Canada (28 January 2009). "Learning resources: Statistics: Power from data! Non-probability sampling". www150.statcan.gc.ca.

External links

CRAN Task View Survey Methodology
What is a Survey? Booklet published by National Opinion Research Center and The American Statistical Association
Journal of Information Technology Learning and Performance article Organizational Research: Determining Sample Size in Survey Research
Sample Design and Confidence Intervals
Survey Sampling Methods
Non-probability sampling

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Non-Probability Sampling - AAPOR". www.aapor.org. Retrieved 2020-05-24.

[2] Weisberg, Herbert F. (2005), The Total Survey Error Approach, University of Chicago Press: Chicago. p.231.

[3] "Archived copy" (PDF). Office of Management and Budget . Retrieved 2009-06-17– via National Archives.

[4] Lohr. Brewer. Swedes

[5] Richard Valliant, Alan H. Dorfman, and Richard M. Royall (2000), Finite Population Sampling and Inference: A Prediction Approach, Wiley, New York, p. 19

[6] Salant, Priscilla, I. Dillman, and A. Don. How to conduct your own survey. No. 300.723 S3. 1994.

[7] Kish, L. (1965), Survey Sampling, New York: Wiley. p. 20

[8] Kish, L. (1965), Survey Sampling, New York: Wiley. p.59

[9] "Why Sampling Works - AAPOR".

[10] Groves et al., Survey Methodology, Wiley: New York.

[11] Michael W. Link, Michael P. Battaglia, Martin R. Frankel, Larry Osborn, and Ali H. Mokdad, A Comparison of Address-Based Sampling (ABS) Versus Random-Digit Dialing (RDD) for General Population Surveys; Public Opinion Q, Spring 2008; 72: 6 - 27.

[12] "Glossary - NCES Statistical Standards". nces.ed.gov.

[13] "Survey Sampling Methods". www.statpac.com.

[14] Government of Canada, Statistics Canada; Government of Canada, Statistics Canada (28 January 2009). "Learning resources: Statistics: Power from data! Non-probability sampling". www150.statcan.gc.ca.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

v t e Social survey research
Data collection	Collection methods Questionnaire Interview Structured Semi-structured Unstructured Couple
Methodology	Census Sampling frame Statistical sample Sampling for surveys Random sampling Simple random sampling Quota sampling Stratified sampling Nonprobability sampling Sample size determination Research design Panel study Cohort study Cross-sectional study Cross-sequential study
Survey errors	Sampling error Standard error Sampling bias Systematic errors Non-sampling error Specification error Frame error Measurement error Response errors Non-response bias Coverage error Pseudo-opinion Processing errors
Data analysis	Categorical data Contingency table Level of measurement Descriptive statistics Exploratory data analysis Multivariate statistics Psychometrics Statistical inference Statistical models Graphical Log-linear Structural
Applications	Audience measurement Demography Market research Opinion poll Public opinion
Major surveys	List of comparative social surveys Afrobarometer American National Election Studies Asian Barometer Survey Comparative Study of Electoral Systems Emerson College Polling Eurobarometer European Social Survey Gallup Poll General Social Survey Household, Income and Labour Dynamics in Australia Survey International Social Survey Latinobarómetro List of household surveys in the United States National Health and Nutrition Examination Survey New Zealand Attitudes and Values Study Suffolk University Political Research Center The Phillips Academy Poll Quinnipiac University Polling Institute World Values Survey
Associations	American Association for Public Opinion Research European Society for Opinion and Marketing Research International Statistical Institute Pew Research Center World Association for Public Opinion Research
Category Projects Business Politics Psychology Sociology Statistics