Coverage error

Last updated
All colored circles are included in the target population. Green and Orange colored circles are included in the sample frame. Green colored circles are a randomly generated sample from the sample frame.  The sample frame includes overcoverage because John and Jack are the same person, but he is included more than once in the sample frame.  The sample frame includes undercoverage because not all of the target population is included in the sample frame. CoverageError.jpg
All colored circles are included in the target population. Green and Orange colored circles are included in the sample frame. Green colored circles are a randomly generated sample from the sample frame.  The sample frame includes overcoverage because John and Jack are the same person, but he is included more than once in the sample frame.  The sample frame includes undercoverage because not all of the target population is included in the sample frame.

Coverage error is a type of non-sampling error [1] that occurs when there is not a one-to-one correspondence between the target population and the sampling frame from which a sample is drawn. [2] This can bias estimates calculated using survey data. [3] For example, a researcher may wish to study the opinions of registered voters (target population) by calling residences listed in a telephone directory (sampling frame). Undercoverage may occur if not all voters are listed in the phone directory. Overcoverage could occur if some voters have more than one listed phone number. Bias could also occur if some phone numbers listed in the directory do not belong to registered voters. [4] In this example, undercoverage, overcoverage, and bias due to inclusion of unregistered voters in the sampling frame are examples of coverage error.

Contents

Discussion

Coverage error is one type of Total survey error that can occur in survey sampling. In survey sampling, a sampling frame is the list of sampling units from which samples of a target population are drawn. [3] Coverage error results when there are differences between the target population and the sample frame. [5]

For example, suppose a researcher is using Twitter to determine the opinion of U.S. voters on a recent action taken by the U.S. President.  Although the researcher's target population is U.S. voters, she is using a list of Twitter users as her sampling frame. Because not all voters are Twitter users, and because not all Twitter users are voters, there will be a misalignment between the target population and the sampling frame that could lead to biased survey results because the demographics and opinions of Twitter using voters might not be representative of the target population of voters. [4]

Undercoverage occurs when the sampling frame does not include all members of the target population. In the previous example, voters are undercovered because not all voters are Twitter users. On the other hand, overcoverage results when some members of the target population are overrepresented in the sampling frame. In the previous example, it is possible that some users have more than one Twitter account, and are more likely to be included in the poll than Twitter users with only one account. [4]

Longitudinal studies are particularly susceptible to undercoverage, since the population being studied in a longitudinal survey can change over time. [6] For example, a researcher might want to study the relationship between the letter grades received by third graders in a particular school district and the wages that these same children earn when they become adults. In this case, the researcher is interested in all third graders in the district who grow-up to be adults (target population). Her sampling frame might be a list of third-graders in the school district (sampling frame). Over time, it is likely that the researcher will lose track of some of the children used in the original study, so that her sample frame of adults no longer matches the sample frame of children used in the study.

Ways to Quantify Coverage Error

Many different methods have been used to quantify and correct for coverage error. Often, the methods employed are unique to specific agencies and organizations.  For example, the United States Census Bureau has developed models using the U.S. Postal Service's Delivery Sequence File, IRS 1040 address data, commercially available foreclosure counts, and other data to develop models capable of predicting undercount by census block.  The Census Bureau has reported some success fitting such models to Zero Inflated Negative Binomial or Zero Inflated Poisson (ZIP) distributions. [7]

Another method for quantifying coverage error employs mark-and-recapture methodology. [8] In mark-and-recapture methodology, a sample is taken directly from the population, marked, and re-introduced into the population.  At a later date, another sample is then taken from the population (re-capture), and the proportion of previously marked samples is used to estimate the actual population size.  This method can be extended to determining the validity of a sampling frame by taking a sample directly from the target population and then taking another sample from the data frame in order to estimate under-coverage. [9] For example, suppose a census was conducted. After the completion of the census, random samples from the frame could be drawn to be counted again. [8]

Ways to Reduce Coverage Error

One way to reduce coverage error is to rely on multiple sources to either build a sample frame or to solicit information. This is called a mixed-mode approach. For example, Washington State University students conducted Student Survey Experience Surveys by building a sample frame using both street addresses and email addresses. [5]

In another example of a mixed-mode approach, the 2010 U.S. Census primarily relied on residential mail responses, and then deployed field interviewers to interview non-responders. That way, Field Interviewers could determine whether or not the particular address still existed, or was still occupied. This approach had the added benefit of cost reduction as the majority of people responded by mail and did not require a field visit. [8] [5]

Example: 2010 Census

The U.S. Census Bureau prepares and maintains a Master Address File of some 144.9 million addresses that it uses as a sampling frame for the U.S. Decennial Census and other surveys.  Despite the efforts of some 111,105 field representatives and an expenditure of nearly half a billion dollars, the Census bureau still found a significant number of addresses that had not found their way into the Master Address File. [7]

Coverage Follow-Up (CFU) and Field Verification (FV) were Census Bureau operations conducted to improve the 2010 census using 2000 census data as a base. These operations were intended to address the following types of coverage error: Not counting someone who should have been counted; counting someone who should not have been counted; and counting someone who should have been counted, but whose identified location was in error. Coverage errors in the U.S. Census have the potential impact of allowing people groups to be underrepresented by the government. Of particular concern are "differential undercounts" which are underestimates of targeted demographic groups. Although the efforts of the CFU and FV improved the 2010 Census accuracy, more study was recommended to address the question of differential undercounts. [10]

See also

Related Research Articles

<span class="mw-page-title-main">Cluster sampling</span> Sampling methodology in statistics

In statistics, cluster sampling is a sampling plan used when mutually homogeneous yet internally heterogeneous groupings are evident in a statistical population. It is often used in marketing research.

<span class="mw-page-title-main">Census</span> Acquiring and recording information about the members of a given population

A census is the procedure of systematically acquiring, recording, and calculating population information about the members of a given population, usually displayed in the form of statistics. This term is used mostly in connection with national population and housing censuses; other common censuses include censuses of agriculture, traditional culture, business, supplies, and traffic censuses. The United Nations (UN) defines the essential features of population and housing censuses as "individual enumeration, universality within a defined territory, simultaneity and defined periodicity", and recommends that population censuses be taken at least every ten years. UN recommendations also cover census topics to be collected, official definitions, classifications, and other useful information to coordinate international practices.

<span class="mw-page-title-main">Sampling bias</span> Bias in the sampling of a population

In statistics, sampling bias is a bias in which a sample is collected in such a way that some members of the intended population have a lower or higher sampling probability than others. It results in a biased sample of a population in which all individuals, or instances, were not equally likely to have been selected. If this is not accounted for, results can be erroneously attributed to the phenomenon under study rather than to the method of sampling.

In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey. The term "survey" may refer to many different types or techniques of observation. In survey sampling it most often involves a questionnaire used to measure the characteristics and/or attitudes of people. Different ways of contacting members of a sample once they have been selected is the subject of survey data collection. The purpose of sampling is to reduce the cost and/or the amount of work that it would take to survey the entire target population. A survey that measures the entire target population is called a census. A sample refers to a group or section of a population from which information is to be obtained.

<span class="mw-page-title-main">United States Census Bureau</span> U.S. agency responsible for the census and related statistics

The United States Census Bureau (USCB), officially the Bureau of the Census, is a principal agency of the U.S. Federal Statistical System, responsible for producing data about the American people and economy. The U.S. Census Bureau is part of the U.S. Department of Commerce and its director is appointed by the President of the United States. Currently, Robert Santos is the Director of the U.S. Census Bureau and Ron S. Jarmin is the Deputy Director.

<span class="mw-page-title-main">Sampling (statistics)</span> Selection of data points in statistics.

In statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample of individuals from within a statistical population to estimate characteristics of the whole population. The subset is meant to reflect the whole population and statisticians attempt to collect samples that are representative of the population. Sampling has lower costs and faster data collection compared to recording data from the entire population, and thus, it can provide insights in cases where it is infeasible to measure an entire population.

An opinion poll, often simply referred to as a survey or a poll, is a human research survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence intervals. A person who conducts polls is referred to as a pollster.

<span class="mw-page-title-main">Statistics New Zealand</span> National statistical service of New Zealand

Statistics New Zealand, branded as Stats NZ, is the public service department of New Zealand charged with the collection of statistics related to the economy, population and society of New Zealand. To this end, Stats NZ produces censuses and surveys.

A straw poll, straw vote, or straw ballot is an ad hoc or unofficial vote. It is used to show the popular opinion on a certain matter, and can be used to help politicians know the majority opinion and help them decide what to say in order to gain votes.

In statistics, a sampling frame is the source material or device from which a sample is drawn. It is a list of all those within a population who can be sampled, and may include individuals, households or institutions.

The Consumer Expenditure Survey is a Bureau of Labor Statistics (BLS) household survey that collects information on the buying habits of U.S. consumers. The program consists of two components — the Interview Survey and the Diary Survey — each with its own sample. The surveys collect data on expenditures, income, and consumer unit characteristics. In May 2020, the American Association for Public Opinion Research recognized the CE program with its 2020 Policy Impact Award, for joint work by the BLS -- including CE and the Division of Price and Index Number Research -- and the Census Bureau on the Supplemental Poverty thresholds and measure, and the essential contributions these data products have made to the understanding, discussion, and advancement of public policy related to the alleviation of poverty in the United States.

An open-access poll is a type of opinion poll in which a nonprobability sample of participants self-select into participation. The term includes call-in, mail-in, and some online polls.

<span class="mw-page-title-main">1870 United States census</span> Ninth US census

The 1870 United States census was the ninth United States census. It was conducted by the Census Office from June 1, 1870, to August 23, 1871. The 1870 census was the first census to provide detailed information on the African American population, only five years after the culmination of the Civil War when slaves were granted freedom. The total population was 38,925,598 with a resident population of 38,558,371 individuals, a 22.6% increase from 1860.

<span class="mw-page-title-main">Survey (human research)</span> List of questions aimed at obtaining data from a group of people

In research of human subjects, a survey is a list of questions aimed for extracting specific data from a particular group of people. Surveys may be conducted by phone, mail, via the internet, and also in person in public spaces. Surveys are used to gather or gain knowledge in fields such as social research and demography.

Participation bias or non-response bias is a phenomenon in which the results of studies, polls, etc. become non-representative because the participants disproportionately possess certain traits which affect the outcome. These traits mean the sample is systematically different from the target population, potentially resulting in biased estimates.

Administrative error is an error resulting from improper administration or execution of a survey. Such errors can be caused by carelessness, confusion, neglect, omission or another blunder.

In survey sampling, Total Survey Error includes all forms of survey error including sampling variability, interviewer effects, frame errors, response bias, and non-response bias. Total Survey Error is discussed in detail in many sources including Salant and Dillman.

With the application of probability sampling in the 1930s, surveys became a standard tool for empirical research in social sciences, marketing, and official statistics. The methods involved in survey data collection are any of a number of ways in which data can be collected for a statistical survey. These are methods that are used to collect information from a sample of individuals in a systematic way. First there was the change from traditional paper-and-pencil interviewing (PAPI) to computer-assisted interviewing (CAI). Now, face-to-face surveys (CAPI), telephone surveys (CATI), and mail surveys are increasingly replaced by web surveys. In addition, remote interviewers could possibly keep the respondent engaged while reducing cost as compared to in-person interviewers.

Program process monitoring is an assessment of the process of a program or intervention. Process monitoring falls under the overall evaluation of a program. Program evaluation involves answering questions about a social program in a systematic way. Examples of social programs include school feeding programs, job training in a community and out-patient services of a community health care facility. Questions about a social program can be asked by program sponsors, developers, policymakers and even taxpayers who want to determine whether or not a particular program is effective. More specifically, purposes of social programs include identifying a programs’ strengths and weaknesses, assessing the impact of a program, justifying the need for additional resources and responding to attacks on a program, among others.

<span class="mw-page-title-main">Joseph Waksberg</span> American statistician (1915–2006)

Joseph Waksberg was an American statistician. While at the United States Census Bureau and Westat, he developed methods for area sampling and telephone sampling and made contributions in many areas of surveys and censuses.

References

  1. Salant, Priscilla, and Don A. Dillman. "How to Conduct your own Survey: Leading professional give you proven techniques for getting reliable results." (1995)
  2. Fisheries, NOAA (2019-02-21). "Survey Statistics Overview | NOAA Fisheries". www.fisheries.noaa.gov. Retrieved 2019-02-24.
  3. 1 2 Scheaffer, Richard L. 1996. Section 5 of Teaching Survey Sampling, by Ronald S. Fecso, William D. Kalsbeek, Sharon L. Lohr, Richard L. Scheaffer, Fritz J. Scheuren, Elizabeth A. Stasny. The American Statistician 50:4 (Nov. 1996), pp 335–337. (on jstor)
  4. 1 2 3 Scheaffer, Richard L. (2012). Elementary survey sampling (7th, student ed.). Boston, MA: Brooks/Cole. ISBN   978-0840053619. OCLC   732960076.
  5. 1 2 3 Dillman, Don A.; Smyth, Jolene D.; Christian, Leah Melani (6 August 2014). Internet, phone, mail, and mixed-mode surveys : the tailored design method (Fourth ed.). Hoboken. ISBN   9781118921302. OCLC   878301194.{{cite book}}: CS1 maint: location missing publisher (link)
  6. Lynn, Peter (2009). Methodology of longitudinal surveys. Chichester, UK: John Wiley & Sons. ISBN   9780470743911. OCLC   317116422.
  7. 1 2 "Selection of Predictors to Model Coverage Errors". www.census.gov. United States Census Bureau. Retrieved 2019-02-24.
  8. 1 2 3 Biemer, Paul P.; de Leeuw, Edith Desirée; Eckman, Stephanie; Edwards, Brad; Kreuter, Frauke; Lyberg, Lars, eds. (6 February 2017). Total survey error in practice. Hoboken, New Jersey. ISBN   9781119041689. OCLC   971891428.{{cite book}}: CS1 maint: location missing publisher (link)
  9. "Coverage Error Models for Census and Survey Data". www.census.gov. United States Census Bureau. Retrieved 2019-02-24.
  10. 2010 census: follow-up should reduce coverage errors, but effects on demographic groups need to be determined: report to congressional requesters. U.S. Govt. Accountability Office. 2010. OCLC   721261877.