Nonprobability sampling

Last updated

Sampling is the use of a subset of the population to represent the whole population or to inform about (social) processes that are meaningful beyond the particular cases, individuals or sites studied. Probability sampling, or random sampling, is a sampling technique in which the probability of getting any particular sample may be calculated. In cases where external validity is not of critical importance to the study's goals or purpose, researchers might prefer to use nonprobability sampling.  Nonprobability sampling does not meet this criterion. Nonprobability sampling techniques are not intended to be used to infer from the sample to the general population in statistical terms. Instead, for example, grounded theory can be produced through iterative nonprobability sampling until theoretical saturation is reached (Strauss and Corbin, 1990).

Thus, one cannot say the same on the basis of a nonprobability sample than on the basis of a probability sample. The grounds for drawing generalizations (e.g., propose new theory, propose policy) from studies based on nonprobability samples are based on the notion of "theoretical saturation" and "analytical generalization" (Yin, 2014) instead of on statistical generalization.

Researchers working with the notion of purposive sampling assert that while probability methods are suitable for large-scale studies concerned with representativeness, nonprobability approaches are more suitable for in-depth qualitative research in which the focus is often to understand complex social phenomena (e.g., Marshall 1996; Small 2009). One of the advantages of nonprobability sampling is its lower cost compared to probability sampling. Moreover, the in-depth analysis of a small-N purposive sample or a case study enables the "discovery" and identification of patterns and causal mechanisms that do not draw time and context-free assumptions.

Nonprobability sampling is often not appropriate in statistical quantitative research, though, as these assertions raise some questions — how can one understand a complex social phenomenon by drawing only the most convenient expressions of that phenomenon into consideration? What assumption about homogeneity in the world must one make to justify such assertions? Alas, the consideration that research can only be based in statistical inference focuses on the problems of bias linked to nonprobability sampling and acknowledges only one situation in which a nonprobability sample can be appropriate — if one is interested only in the specific cases studied (for example, if one is interested in the Battle of Gettysburg), one does not need to draw a probability sample from similar cases (Lucas 2014a).

Nonprobability sampling is however widely used in qualitative research. Examples of nonprobability sampling include:

Studies intended to use probability sampling sometimes end up using nonprobability samples because of characteristics of the sampling method. For example, using a sample of people in the paid labor force to analyze the effect of education on earnings is to use a nonprobability sample of persons who could be in the paid labor force. Because the education people obtain could determine their likelihood of being in the paid labor force, the sample in the paid labor force is a nonprobability sample for the question at issue. In such cases results are biased.

The statistical model one uses can also render the data a nonprobability sample. For example, Lucas (2014b) notes that several published studies that use multilevel modeling have been based on samples that are probability samples in general, but nonprobability samples for one or more of the levels of analysis in the study. Evidence indicates that in such cases the bias is poorly behaved, such that inferences from such analyses are unjustified.

These problems occur in the academic literature, but they may be more common in non-academic research. For example, in public opinion polling by private companies (or other organizations unable to require response), the sample can be self-selected rather than random. This often introduces an important type of error, self-selection bias, in which a potential participant's willingness to volunteer for the sample may be determined by characteristics such as submissiveness or availability. The samples in such surveys should be treated as nonprobability samples of the population, and the validity of the findings based on them is unknown and cannot be established.

See also

Related Research Articles

Cluster sampling Sampling methodology in statistics

In statistics, cluster sampling is a sampling plan used when mutually homogeneous yet internally heterogeneous groupings are evident in a statistical population. It is often used in marketing research. In this sampling plan, the total population is divided into these groups and a simple random sample of the groups is selected. The elements in each cluster are then sampled. If all elements in each sampled cluster are sampled, then this is referred to as a "one-stage" cluster sampling plan. If a simple random subsample of elements is selected within each of these groups, this is referred to as a "two-stage" cluster sampling plan. A common motivation for cluster sampling is to reduce the total number of interviews and costs given the desired accuracy. For a fixed sample size, the expected random error is smaller when most of the variation in the population is present internally within the groups, and not between the groups.

Statistics Study of the collection, analysis, interpretation, and presentation of data

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

Statistical inference Process of using data analysis

Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey. The term "survey" may refer to many different types or techniques of observation. In survey sampling it most often involves a questionnaire used to measure the characteristics and/or attitudes of people. Different ways of contacting members of a sample once they have been selected is the subject of survey data collection. The purpose of sampling is to reduce the cost and/or the amount of work that it would take to survey the entire target population. A survey that measures the entire target population is called a census. A sample refers to a group or section of a population from which information is to be obtained

Statistics is a field of inquiry that studies the collection, analysis, interpretation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities; it is also used and misused for making informed decisions in all areas of business and government.

A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters.

Randomization is the process of making something random; in various contexts this involves, for example:

Sampling (statistics) Selection of data points in statistics.

In statistics, quality assurance, and survey methodology, sampling is the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt to collect samples that are representative of the population in question. Sampling has lower costs and faster data collection than measuring the entire population and can provide insights in cases where it is infeasible to measure an entire population.

Quantitative marketing research is the application of quantitative research techniques to the field of marketing research. It has roots in both the positivist view of the world, and the modern marketing viewpoint that marketing is an interactive process in which both the buyer and seller reach a satisfying agreement on the "four Ps" of marketing: Product, Price, Place (location) and Promotion.

Social research Research conducted by social scientists following a systematic plan

Social research is a research conducted by social scientists following a systematic plan. Social research methodologies can be classified as quantitative and qualitative.

Inductive reasoning is a method of reasoning in which a body of observations is considered to derive a general principle. It consists of making broad generalizations based on specific observations. Inductive reasoning is distinct from deductive reasoning. If the premises are correct, the conclusion of a deductive argument is certain; in contrast, the truth of the conclusion of an inductive argument is probable, based upon the evidence given.

A statistical syllogism is a non-deductive syllogism. It argues, using inductive reasoning, from a generalization true for the most part to a particular case.

Mathematical statistics Branch of statistics

Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory.

Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power. In complicated studies there may be several different sample sizes: for example, in a stratified survey there would be different sizes for each stratum. In a census, data is sought for an entire population, hence the intended sample size is equal to the population. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group.

External validity is the validity of applying the conclusions of a scientific study outside the context of that study. In other words, it is the extent to which the results of a study can be generalized to and across other situations, people, stimuli, and times. In contrast, internal validity is the validity of conclusions drawn within the context of a particular study. Because general conclusions are almost always a goal in research, external validity is an important property of any study. Mathematical analysis of external validity concerns a determination of whether generalization across heterogeneous populations is feasible, and devising statistical and computational methods that produce valid generalizations.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

Confounding Variable in statistics

In statistics, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. The existence of confounders is an important quantitative explanation why correlation does not imply causation.

In statistics, resampling is any of a variety of methods for doing one of the following:

  1. Estimating the precision of sample statistics by using subsets of available data (jackknifing) or drawing randomly with replacement from a set of data points (bootstrapping)
  2. Permutation tests are exact tests: Exchanging labels on data points when performing significance tests
  3. Validating models by using random subsets

With the application of probability sampling in the 1930s, surveys became a standard tool for empirical research in social sciences, marketing, and official statistics. The methods involved in survey data collection are any of a number of ways in which data can be collected for a statistical survey. These are methods that are used to collect information from a sample of individuals in a systematic way. First there was the change from traditional paper-and-pencil interviewing (PAPI) to computer-assisted interviewing (CAI). Now, face-to-face surveys (CAPI), telephone surveys (CATI), and mail surveys are increasingly replaced by web surveys.

Convenience sampling is a type of non-probability sampling that involves the sample being drawn from that part of the population that is close to hand. This type of sampling is most useful for pilot testing.

References

  1. Suresh, Sharma (2014). Nursing Research and Statistics. Elsevier Health Sciences. p. 224. ISBN   9788131237861 . Retrieved 29 September 2017.
  2. Schuster, Daniel P.; Powers (MD.), William J. (2005). Translational and Experimental Clinical Research. Lippincott Williams & Wilkins. p. 46. ISBN   9780781755658 . Retrieved 29 September 2017.
  3. Bowers, David; House, Allan; Owens, David H. (2011). Getting Started in Health Research. John Wiley & Sons. ISBN   9781118292969 . Retrieved 29 September 2017.