Coverage error

Last updated
All colored circles are included in the target population. Green and Orange colored circles are included in the sample frame. Green colored circles are a randomly generated sample from the sample frame.  The sample frame includes overcoverage because John and Jack are the same person, but he is included more than once in the sample frame.  The sample frame includes undercoverage because not all of the target population is included in the sample frame. CoverageError.jpg
All colored circles are included in the target population. Green and Orange colored circles are included in the sample frame. Green colored circles are a randomly generated sample from the sample frame.  The sample frame includes overcoverage because John and Jack are the same person, but he is included more than once in the sample frame.  The sample frame includes undercoverage because not all of the target population is included in the sample frame.

Coverage error is a type of non-sampling error [1] that occurs when there is not a one-to-one correspondence between the target population and the sampling frame from which a sample is drawn. [2] This can bias estimates calculated using survey data. [3] For example, a researcher may wish to study the opinions of registered voters (target population) by calling residences listed in a telephone directory (sampling frame). Undercoverage may occur if not all voters are listed in the phone directory. Overcoverage could occur if some voters have more than one listed phone number. Bias could also occur if some phone numbers listed in the directory do not belong to registered voters. [4] In this example, undercoverage, overcoverage, and bias due to inclusion of unregistered voters in the sampling frame are examples of coverage error.

Contents

Discussion

Coverage error is one type of Total survey error that can occur in survey sampling. In survey sampling, a sampling frame is the list of sampling units from which samples of a target population are drawn. [3] Coverage error results when there are differences between the target population and the sample frame. [5]

For example, suppose a researcher is using Twitter to determine the opinion of U.S. voters on a recent action taken by the U.S. President.  Although the researcher's target population is U.S. voters, she is using a list of Twitter users as her sampling frame. Because not all voters are Twitter users, and because not all Twitter users are voters, there will be a misalignment between the target population and the sampling frame that could lead to biased survey results because the demographics and opinions of Twitter using voters might not be representative of the target population of voters. [4]

Undercoverage occurs when the sampling frame does not include all members of the target population. In the previous example, voters are undercovered because not all voters are Twitter users. On the other hand, overcoverage results when some members of the target population are overrepresented in the sampling frame. In the previous example, it is possible that some users have more than one Twitter account, and are more likely to be included in the poll than Twitter users with only one account. [4]

Longitudinal studies are particularly susceptible to undercoverage, since the population being studied in a longitudinal survey can change over time. [6] For example, a researcher might want to study the relationship between the letter grades received by third graders in a particular school district and the wages that these same children earn when they become adults. In this case, the researcher is interested in all third graders in the district who grow-up to be adults (target population). Her sampling frame might be a list of third-graders in the school district (sampling frame). Over time, it is likely that the researcher will lose track of some of the children used in the original study, so that her sample frame of adults no longer matches the sample frame of children used in the study.

Ways to Quantify Coverage Error

Many different methods have been used to quantify and correct for coverage error. Often, the methods employed are unique to specific agencies and organizations.  For example, the United States Census Bureau has developed models using the U.S. Postal Service's Delivery Sequence File, IRS 1040 address data, commercially available foreclosure counts, and other data to develop models capable of predicting undercount by census block.  The Census Bureau has reported some success fitting such models to Zero Inflated Negative Binomial or Zero Inflated Poisson (ZIP) distributions. [7]

Another method for quantifying coverage error employs mark-and-recapture methodology. [8] In mark-and-recapture methodology, a sample is taken directly from the population, marked, and re-introduced into the population.  At a later date, another sample is then taken from the population (re-capture), and the proportion of previously marked samples is used to estimate the actual population size.  This method can be extended to determining the validity of a sampling frame by taking a sample directly from the target population and then taking another sample from the data frame in order to estimate under-coverage. [9] For example, suppose a census was conducted. After the completion of the census, random samples from the frame could be drawn to be counted again. [8]

Ways to Reduce Coverage Error

One way to reduce coverage error is to rely on multiple sources to either build a sample frame or to solicit information. This is called a mixed-mode approach. For example, Washington State University students conducted Student Survey Experience Surveys by building a sample frame using both street addresses and email addresses. [5]

In another example of a mixed-mode approach, the 2010 U.S. Census primarily relied on residential mail responses, and then deployed field interviewers to interview non-responders. That way, Field Interviewers could determine whether or not the particular address still existed, or was still occupied. This approach had the added benefit of cost reduction as the majority of people responded by mail and did not require a field visit. [8] [5]

Example: 2010 Census

The U.S. Census Bureau prepares and maintains a Master Address File of some 144.9 million addresses that it uses as a sampling frame for the U.S. Decennial Census and other surveys.  Despite the efforts of some 111,105 field representatives and an expenditure of nearly half a billion dollars, the Census bureau still found a significant number of addresses that had not found their way into the Master Address File. [7]

Coverage Follow-Up (CFU) and Field Verification (FV) were Census Bureau operations conducted to improve the 2010 census using 2000 census data as a base. These operations were intended to address the following types of coverage error: Not counting someone who should have been counted; counting someone who should not have been counted; and counting someone who should have been counted, but whose identified location was in error. Coverage errors in the U.S. Census have the potential impact of allowing people groups to be underrepresented by the government. Of particular concern are "differential undercounts" which are underestimates of targeted demographic groups. Although the efforts of the CFU and FV improved the 2010 Census accuracy, more study was recommended to address the question of differential undercounts. [10]

See also

Related Research Articles

Census Acquiring and recording information about the members of a given population

A census is the procedure of systematically calculating, acquiring and recording information about the members of a given population. This term is used mostly in connection with national population and housing censuses; other common censuses include the census of agriculture, and other censuses such as the traditional culture, business, supplies, and traffic censuses. The United Nations defines the essential features of population and housing censuses as "individual enumeration, universality within a defined territory, simultaneity and defined periodicity", and recommends that population censuses be taken at least every ten years. United Nations recommendations also cover census topics to be collected, official definitions, classifications and other useful information to co-ordinate international practices.

In statistics, sampling bias is a bias in which a sample is collected in such a way that some members of the intended population have a lower or higher sampling probability than others. It results in a biased sample of a population in which all individuals, or instances, were not equally likely to have been selected. If this is not accounted for, results can be erroneously attributed to the phenomenon under study rather than to the method of sampling.

Statistics Study of the collection, analysis, interpretation, and presentation of data

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey. The term "survey" may refer to many different types or techniques of observation. In survey sampling it most often involves a questionnaire used to measure the characteristics and/or attitudes of people. Different ways of contacting members of a sample once they have been selected is the subject of survey data collection. The purpose of sampling is to reduce the cost and/or the amount of work that it would take to survey the entire target population. A survey that measures the entire target population is called a census. A sample refers to a group or section of a population from which information is to be obtained

Sampling (statistics) Selection of data points in statistics.

In statistics, quality assurance, and survey methodology, sampling is the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt for the samples to represent the population in question. Two advantages of sampling are lower cost and faster data collection than measuring the entire population.

Survey methodology is "the study of survey methods". As a field of applied statistics concentrating on human-research surveys, survey methodology studies the sampling of individual units from a population and associated techniques of survey data collection, such as questionnaire construction and methods for improving the number and accuracy of responses to surveys. Survey methodology targets instruments or procedures that ask one or more questions that may or may not be answered.

Opinion poll Type of survey

An opinion poll, often simply referred to as a poll or a survey, is a human research survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence intervals. A person who conducts polls is referred to as a pollster.

Statistics New Zealand National statistical service of New Zealand

Statistics New Zealand, branded as Stats NZ, is the public service department of New Zealand charged with the collection of statistics related to the economy, population and society of New Zealand. To this end, Stats NZ produces censuses and surveys.

A straw poll, straw vote, or straw ballot is an ad hoc or unofficial vote. It is used to show the popular opinion on a certain matter, and can be used to help politicians know the majority opinion and help them decide what to say in order to gain votes.

American Community Survey Demographic survey in the United States

The American Community Survey (ACS) is a demographics survey program conducted by the U.S. Census Bureau. It regularly gathers information previously contained only in the long form of the decennial census, such as ancestry, citizenship, educational attainment, income, language proficiency, migration, disability, employment, and housing characteristics. These data are used by many public-sector, private-sector, and not-for-profit stakeholders to allocate funding, track shifting demographics, plan for emergencies, and learn about local communities. Sent to approximately 295,000 addresses monthly, it is the largest household survey that the Census Bureau administers.

In sociology and statistics research, snowball sampling is a nonprobability sampling technique where existing study subjects recruit future subjects from among their acquaintances. Thus the sample group is said to grow like a rolling snowball. As the sample builds up, enough data are gathered to be useful for research. This sampling technique is often used in hidden populations, such as drug users or sex workers, which are difficult for researchers to access. As sample members are not selected from a sampling frame, snowball samples are subject to numerous biases. For example, people who have many friends are more likely to be recruited into the sample. When virtual social networks are used, then this technique is called virtual snowball sampling.

The Consumer Expenditure Survey is a Bureau of Labor Statistics (BLS) household survey that collects information on the buying habits of U.S. consumers. The program consists of two components — the Interview Survey and the Diary Survey — each with its own sample. The surveys collect data on expenditures, income, and consumer unit characteristics. In May 2020, the American Association for Public Opinion Research recognized the CE program with its 2020 Policy Impact Award, for joint work by the BLS -- including CE and the Division of Price and Index Number Research -- and the Census Bureau on the Supplemental Poverty thresholds and measure, and the essential contributions these data products have made to the understanding, discussion, and advancement of public policy related to the alleviation of poverty in the United States.

2010 United States census 23rd United States national census

The United States census of 2010 was the twenty-third United States national census. National Census Day, the reference day used for the census, was April 1, 2010. The census was taken via mail-in citizen self-reporting, with enumerators serving to spot-check randomly selected neighborhoods and communities. As part of a drive to increase the count's accuracy, 635,000 temporary enumerators were hired. The population of the United States was counted as 308,745,538, a 9.7% increase from the 2000 census. This was the first census in which all states recorded a population of over half a million people as well as the first in which all 100 largest cities recorded populations of over 200,000.

Survey (human research)

In research of human subjects, a survey is a list of questions aimed for extracting specific data from a particular group of people. Surveys may be conducted by phone, mail, via the internet, and also at street corners or in malls. Surveys are used to gather or gain knowledge in fields such as social research and demography.

Participation bias or non-response bias is a phenomenon in which the results of elections, studies, polls, etc. become non-representative because the participants disproportionately possess certain traits which affect the outcome. These traits mean the sample is systematically different from the target population, potentially resulting in biased estimates.

Improper administration or execution of a survey results in administrative errors. Such errors can be caused by carelessness, confusion, neglect, omission or another blunder.

In demographics, an intercensal estimate is an estimate of population between official census dates with both of the census counts being known. Some nations produce regular intercensal estimates while others do not. Intercensal estimates can be less or more informative than official census figures, depending on methodology, completeness, accuracy and date of data, and can be released by nations, subnational entities, or other organizations including those not affiliated with governments. They differ from population projections as they are from past dates, although intercensal estimates can be used to form population projections.

In survey sampling, total survey error includes all forms of survey error including sampling variability, interviewer effects, frame errors, response bias, and non-response bias. Total survey error is discussed in detail in many sources including Salant and Dillman.

With the application of probability sampling in the 1930s, surveys became a standard tool for empirical research in social sciences, marketing, and official statistics. The methods involved in survey data collection are any of a number of ways in which data can be collected for a statistical survey. These are methods that are used to collect information from a sample of individuals in a systematic way. First there was the change from traditional paper-and-pencil interviewing (PAPI) to computer-assisted interviewing (CAI). Now, face-to-face surveys (CAPI), telephone surveys (CATI), and mail surveys are increasingly replaced by web surveys.

Convenience sampling is a type of non-probability sampling that involves the sample being drawn from that part of the population that is close to hand. This type of sampling is most useful for pilot testing.

References

  1. Salant, Priscilla, and Don A. Dillman. "How to Conduct your own Survey: Leading professional give you proven techniques for getting reliable results." (1995)
  2. Fisheries, NOAA (2019-02-21). "Survey Statistics Overview | NOAA Fisheries". www.fisheries.noaa.gov. Retrieved 2019-02-24.
  3. 1 2 Scheaffer, Richard L. 1996. Section 5 of Teaching Survey Sampling, by Ronald S. Fecso, William D. Kalsbeek, Sharon L. Lohr, Richard L. Scheaffer, Fritz J. Scheuren, Elizabeth A. Stasny. The American Statistician 50:4 (Nov. 1996), pp 335–337. (on jstor)
  4. 1 2 3 Scheaffer, Richard L. (2012). Elementary survey sampling (7th, student ed.). Boston, MA: Brooks/Cole. ISBN   978-0840053619. OCLC   732960076.
  5. 1 2 3 Dillman, Don A.; Smyth, Jolene D.; Christian, Leah Melani (6 August 2014). Internet, phone, mail, and mixed-mode surveys : the tailored design method (Fourth ed.). Hoboken. ISBN   9781118921302. OCLC   878301194.
  6. Lynn, Peter (2009). Methodology of longitudinal surveys. Chichester, UK: John Wiley & Sons. ISBN   9780470743911. OCLC   317116422.
  7. 1 2 Bureau, US Census. "Selection of Predictors to Model Coverage Errors". www.census.gov. Retrieved 2019-02-24.
  8. 1 2 3 Biemer, Paul P.; de Leeuw, Edith Desirée; Eckman, Stephanie; Edwards, Brad; Kreuter, Frauke; Lyberg, Lars, eds. (6 February 2017). Total survey error in practice. Hoboken, New Jersey. ISBN   9781119041689. OCLC   971891428.
  9. Bureau, US Census. "Coverage Error Models for Census and Survey Data". www.census.gov. Retrieved 2019-02-24.
  10. 2010 census: follow-up should reduce coverage errors, but effects on demographic groups need to be determined: report to congressional requesters. U.S. Govt. Accountability Office. 2010. OCLC   721261877.