Sampling frame

Last updated

In statistics, a sampling frame is the source material or device from which a sample is drawn. [1] It is a list of all those within a population who can be sampled, and may include individuals, households or institutions. [1]

Contents

Importance of the sampling frame is stressed by Jessen [2] and Salant and Dillman. [3]

In many practical situations the frame is a matter of choice to the survey planner, and sometimes a critical one. [...] Some very worthwhile investigations are not undertaken at all because of the lack of an apparent frame; others, because of faulty frames, have ended in a disaster or in cloud of doubt.

Raymond James Jessen

A slightly more general concept of sampling frame includes area sampling frames, whose elements have a geographic nature. Area sampling frames can be useful for example in agricultural statistics when a suitable and updated agricultural census is not available. In environmental surveys, area sampling frames may be the only option.

Obtaining and organizing a sampling frame

In the most straightforward cases, such as when dealing with a batch of material from a production run, or using a census, it is possible to identify and measure every single item in the population and to include any one of them in our sample; this is known as direct element sampling. [1] However, in many other cases this is not possible; either because it is cost-prohibitive (reaching every citizen of a country) or impossible (reaching all humans alive).

Having established the frame, there are a number of ways for organizing it to improve efficiency and effectiveness. It's at this stage that the researcher should decide whether the sample is in fact to be the whole population and would therefore be a census.

This list should also facilitate access to the selected sampling units. A frame may also provide additional 'auxiliary information' about its elements; when this information is related to variables or groups of interest, it may be used to improve survey design. While not necessary for simple sampling, a sampling frame used for more advanced sample techniques, such as stratified sampling, may contain additional information (such as demographic information). [1] For instance, an electoral register might include name and sex; this information can be used to ensure that a sample taken from that frame covers all demographic categories of interest. (Sometimes the auxiliary information is less explicit; for instance, a telephone number may provide some information about location.

Sampling frame qualities

An ideal sampling frame will have the following qualities: [1]

Types of sampling frames

The most straightforward type of frame is a list of elements of the population (preferably the entire population) with appropriate contact information. For example, in an opinion poll, possible sampling frames include an electoral register or a telephone directory. Other sampling frames can include employment records, school class lists, patient files in a hospital, organizations listed in a thematic database, and so on. [1] [5] On a more practical levels, sampling frames have the form of computer files. [1]

Not all frames explicitly list population elements; some list only 'clusters'. For example, a street map can be used as a frame for a door-to-door survey; although it doesn't show individual houses, we can select streets from the map and then select houses on those streets. This offers some advantages: such a frame would include people who have recently moved and are not yet on the list frames discussed above, and it may be easier to use because it doesn't require storing data for every unit in the population, only for a smaller number of clusters.

Sampling frames problems

The sampling frame must be representative of the population and this is a question outside the scope of statistical theory demanding the judgment of experts in the particular subject matter being studied. All the above frames omit some people who will vote at the next election and contain some people who will not; some frames will contain multiple records for the same person. People not in the frame have no prospect of being sampled.

Because a cluster-based frame contains less information about the population, it may place constraints on the sample design, possibly requiring the use of less efficient sampling methods and/or making it harder to interpret the resulting data.

Statistical theory tells us about the uncertainties in extrapolating from a sample to the frame. It should be expected that sample frames, will always contain some mistakes. [5] In some cases, this may lead to sampling bias. [1] Such bias should be minimized, and identified, although avoiding it completely in a real world is nearly impossible. [1] One should also not assume that sources which claim to be unbiased and representative are such. [1]

In defining the frame, practical, economic, ethical, and technical issues need to be addressed. The need to obtain timely results may prevent extending the frame far into the future. The difficulties can be extreme when the population and frame are disjoint. This is a particular problem in forecasting where inferences about the future are made from historical data. In fact, in 1703, when Jacob Bernoulli proposed to Gottfried Leibniz the possibility of using historical mortality data to predict the probability of early death of a living man, Gottfried Leibniz recognized the problem in replying: [6]

Nature has established patterns originating in the return of events but only for the most part. New illnesses flood the human race, so that no matter how many experiments you have done on corpses, you have not thereby imposed a limit on the nature of events so that in the future they could not vary.

Gottfried Leibniz

Leslie Kish posited four basic problems of sampling frames: [7]

  1. Missing elements: Some members of the population are not included in the frame.
  2. Foreign elements: The non-members of the population are included in the frame.
  3. Duplicate entries: A member of the population is surveyed more than once.
  4. Groups or clusters: The frame lists clusters instead of individuals.

Problems like those listed can be identified by the use of pre-survey tests and pilot studies.

Related Research Articles

<span class="mw-page-title-main">Cluster sampling</span> Sampling methodology in statistics

In statistics, cluster sampling is a sampling plan used when mutually homogeneous yet internally heterogeneous groupings are evident in a statistical population. It is often used in marketing research.

<span class="mw-page-title-main">Census</span> Acquiring and recording information about the members of a given population

A census is the procedure of systematically acquiring, recording and calculating population information about the members of a given population. This term is used mostly in connection with national population and housing censuses; other common censuses include censuses of agriculture, traditional culture, business, supplies, and traffic censuses. The United Nations (UN) defines the essential features of population and housing censuses as "individual enumeration, universality within a defined territory, simultaneity and defined periodicity", and recommends that population censuses be taken at least every ten years. UN recommendations also cover census topics to be collected, official definitions, classifications and other useful information to co-ordinate international practices.

In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey. The term "survey" may refer to many different types or techniques of observation. In survey sampling it most often involves a questionnaire used to measure the characteristics and/or attitudes of people. Different ways of contacting members of a sample once they have been selected is the subject of survey data collection. The purpose of sampling is to reduce the cost and/or the amount of work that it would take to survey the entire target population. A survey that measures the entire target population is called a census. A sample refers to a group or section of a population from which information is to be obtained

<span class="mw-page-title-main">Frame Relay</span> Wide area network technology

Frame Relay is a standardized wide area network (WAN) technology that specifies the physical and data link layers of digital telecommunications channels using a packet switching methodology. Originally designed for transport across Integrated Services Digital Network (ISDN) infrastructure, it may be used today in the context of many other network interfaces.

High-Level Data Link Control (HDLC) is a bit-oriented code-transparent synchronous data link layer protocol developed by the International Organization for Standardization (ISO). The standard for HDLC is ISO/IEC 13239:2002.

<span class="mw-page-title-main">Sampling (statistics)</span> Selection of data points in statistics.

In statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt to collect samples that are representative of the population. Sampling has lower costs and faster data collection compared to recording data from the entire population, and thus, it can provide insights in cases where it is infeasible to measure an entire population.

In survey methodology, systematic sampling is a statistical method involving the selection of elements from an ordered sampling frame. The most common form of systematic sampling is an equiprobability method.

Questionnaire construction refers to the design of a questionnaire to gather statistically useful information about a given topic. When properly constructed and responsibly administered, questionnaires can provide valuable data about any given subject.

Survey methodology is "the study of survey methods". As a field of applied statistics concentrating on human-research surveys, survey methodology studies the sampling of individual units from a population and associated techniques of survey data collection, such as questionnaire construction and methods for improving the number and accuracy of responses to surveys. Survey methodology targets instruments or procedures that ask one or more questions that may or may not be answered.

An opinion poll, often simply referred to as a survey or a poll, is a human research survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence intervals. A person who conducts polls is referred to as a pollster.

FrameNet is a group of online lexical databases based upon the theory of meaning known as Frame semantics, developed by linguist Charles J. Fillmore. The project's fundamental notion is simple: most words' meanings may be best understood in terms of a semantic frame, which is a description of a certain kind of event, connection, or item and its actors.

In the theory of finite population sampling, a sampling design specifies for every possible sample its probability of being drawn.

In statistics, non-sampling error is a catch-all term for the deviations of estimates from their true values that are not a function of the sample chosen, including various systematic errors and random errors that are not due to sampling. Non-sampling errors are much harder to quantify than sampling errors.

In statistics, a simple random sample is a subset of individuals chosen from a larger set in which a subset of individuals are chosen randomly, all with the same probability. It is a process of selecting a sample in a random way. In SRS, each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals. Simple random sampling is a basic type of sampling and can be a component of other more complex sampling methods.

In survey methodology, the design effect is a measure of the expected impact of a sampling design on the variance of an estimator for some parameter. It is calculated as the ratio of the variance of an estimator based on a sample from an (often) complex sampling design, to the variance of an alternative estimator based on a simple random sample (SRS) of the same number of elements. The can be used to adjust the variance of an estimator in cases where the sample is not drawn using simple random sampling. It may also be useful in sample size calculations and for quantifying the representativeness of a sample. The term "design effect" was coined by Leslie Kish in 1965.

<span class="mw-page-title-main">Coverage error</span>

Coverage error is a type of non-sampling error that occurs when there is not a one-to-one correspondence between the target population and the sampling frame from which a sample is drawn. This can bias estimates calculated using survey data. For example, a researcher may wish to study the opinions of registered voters by calling residences listed in a telephone directory. Undercoverage may occur if not all voters are listed in the phone directory. Overcoverage could occur if some voters have more than one listed phone number. Bias could also occur if some phone numbers listed in the directory do not belong to registered voters. In this example, undercoverage, overcoverage, and bias due to inclusion of unregistered voters in the sampling frame are examples of coverage error.

In survey sampling, Total Survey Error includes all forms of survey error including sampling variability, interviewer effects, frame errors, response bias, and non-response bias. Total Survey Error is discussed in detail in many sources including Salant and Dillman.

With the application of probability sampling in the 1930s, surveys became a standard tool for empirical research in social sciences, marketing, and official statistics. The methods involved in survey data collection are any of a number of ways in which data can be collected for a statistical survey. These are methods that are used to collect information from a sample of individuals in a systematic way. First there was the change from traditional paper-and-pencil interviewing (PAPI) to computer-assisted interviewing (CAI). Now, face-to-face surveys (CAPI), telephone surveys (CATI), and mail surveys are increasingly replaced by web surveys. In addition, remote interviewers could possibly keep the respondent engaged while reducing cost as compared to in-person interviewers.

Online content analysis or online textual analysis refers to a collection of research techniques used to describe and make inferences about online material through systematic coding and interpretation. Online content analysis is a form of content analysis for analysis of Internet-based communication.

An area sampling frame is an alternative to the most traditional type of sampling frames.

References

  1. 1 2 3 4 5 6 7 8 9 10 Carl-Erik Särndal; Bengt Swensson; Jan Wretman (2003). Model assisted survey sampling. Springer. pp. 9–12. ISBN   978-0-387-40620-6 . Retrieved 2 January 2011.
  2. Raymond James Jessen (1978). Statistical survey techniques . Wiley. ISBN   9780471442608 . Retrieved 2 January 2011.[ page needed ]
  3. Salant, Priscilla, and Don A. Dillman. "How to Conduct your own Survey: Leading professional give you proven techniques for getting reliable results" (1995)
  4. Turner, Anthony G. "Sampling frames and master samples" (PDF). United Nations Secretariat. Retrieved December 11, 2012.
  5. 1 2 Roger Sapsford; Victor Jupp (29 March 2006). Data collection and analysis. SAGE. pp. 28–. ISBN   978-0-7619-4363-1 . Retrieved 2 January 2011.
  6. Peter L. Bernstein (1998). Against the gods: the remarkable story of risk . John Wiley and Sons. pp.  118–. ISBN   978-0-471-29563-1 . Retrieved 2 January 2011.
  7. Leslie Kish (1995). Survey sampling. Wiley. ISBN   978-0-471-10949-5 . Retrieved 11 January 2011.[ page needed ]