Self-selection bias

Last updated

In statistics, self-selection bias arises in any situation in which individuals select themselves into a group, causing a biased sample with nonprobability sampling. It is commonly used to describe situations where the characteristics of the people which cause them to select themselves in the group create abnormal or undesirable conditions in the group. It is closely related to the non-response bias, describing when the group of people responding has different responses than the group of people not responding.

Contents

Self-selection bias is a major problem in research in sociology, psychology, economics and many other social sciences. [1] In such fields, a poll suffering from such bias is termed a self-selected listener opinion poll or "SLOP". [2] The term is also used in criminology to describe the process by which specific predispositions may lead an offender to choose a criminal career and lifestyle.

While the effects of self-selection bias are closely related to those of selection bias, the problem arises for rather different reasons; thus there may be a purposeful intent on the part of respondents leading to self-selection bias whereas other types of selection bias may arise more inadvertently, possibly as the result of mistakes by those designing any given study.

Explanation

Self-selection makes determination of causation more difficult. For example, when attempting to assess the effect of a test preparation course in increasing participant's test scores, significantly higher test scores might be observed among students who choose to participate in the preparation course itself. Due to self-selection, there may be a number of differences between the people who choose to take the course and those who choose not to, such as motivation, socioeconomic status, or prior test-taking experience. Due to self-selection according to such factors, a significant difference in mean test scores could be observed between the two populations independent of any ability of the course to affect test scores. An outcome might be that those who elect to do the preparation course would have achieved higher scores in the actual test anyway. If the study measures an improvement in absolute test scores due to participation in the preparation course, they may be skewed to show a higher effect. A relative measure of 'improvement' might improve the reliability of the study somewhat, but only partially.

Self-selection bias causes problems for research about programs or products. In particular, self-selection affects evaluation of whether or not a given program has some effect, and complicates interpretation of market research.

The Roy model provides one of the earliest academic illustrations of the self-selection problem.

See also

Related Research Articles

In statistics, sampling bias is a bias in which a sample is collected in such a way that some members of the intended population have a lower or higher sampling probability than others. It results in a biased sample of a population in which all individuals, or instances, were not equally likely to have been selected. If this is not accounted for, results can be erroneously attributed to the phenomenon under study rather than to the method of sampling.

Cultural bias is the phenomenon of interpreting and judging phenomena by standards inherent to one's own culture. The phenomenon is sometimes considered a problem central to social and human sciences, such as economics, psychology, anthropology, and sociology. Some practitioners of the aforementioned fields have attempted to develop methods and theories to compensate for or eliminate cultural bias.

<span class="mw-page-title-main">Sampling (statistics)</span> Selection of data points in statistics.

In statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt to collect samples that are representative of the population. Sampling has lower costs and faster data collection compared to recording data from the entire population, and thus, it can provide insights in cases where it is infeasible to measure an entire population.

Statistical bias, in the mathematical field of statistics, is a systematic tendency in which the methods used to gather data and generate statistics present an inaccurate, skewed or biased depiction of reality. Statistical bias exists in numerous stages of the data collection and analysis process, including: the source of the data, the methods used to collect the data, the estimator chosen, and the methods used to analyze the data. Data analysts can take various measures at each stage of the process to reduce the impact of statistical bias in their work. Understanding the source of statistical bias can help to assess whether the observed results are close to actuality. Issues of statistical bias has been argued to be closely linked to issues of statistical validity.

Sampling is the use of a subset of the population to represent the whole population or to inform about (social) processes that are meaningful beyond the particular cases, individuals or sites studied. Probability sampling, or random sampling, is a sampling technique in which the probability of getting any particular sample may be calculated. In cases where external validity is not of critical importance to the study's goals or purpose, researchers might prefer to use nonprobability sampling. Nonprobability sampling does not meet this criterion. Nonprobability sampling techniques are not intended to be used to infer from the sample to the general population in statistical terms. Instead, for example, grounded theory can be produced through iterative nonprobability sampling until theoretical saturation is reached.

Survey methodology is "the study of survey methods". As a field of applied statistics concentrating on human-research surveys, survey methodology studies the sampling of individual units from a population and associated techniques of survey data collection, such as questionnaire construction and methods for improving the number and accuracy of responses to surveys. Survey methodology targets instruments or procedures that ask one or more questions that may or may not be answered.

An opinion poll, often simply referred to as a survey or a poll, is a human research survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence intervals. A person who conducts polls is referred to as a pollster.

Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population intended to be analyzed. It is sometimes referred to as the selection effect. The phrase "selection bias" most often refers to the distortion of a statistical analysis, resulting from the method of collecting samples. If the selection bias is not taken into account, then some conclusions of the study may be false.

Egocentric bias is the tendency to rely too heavily on one's own perspective and/or have a higher opinion of oneself than reality. It appears to be the result of the psychological need to satisfy one's ego and to be advantageous for memory consolidation. Research has shown that experiences, ideas, and beliefs are more easily recalled when they match one's own, causing an egocentric outlook. Michael Ross and Fiore Sicoly first identified this cognitive bias in their 1979 paper, "Egocentric biases in availability and attribution". Egocentric bias is referred to by most psychologists as a general umbrella term under which other related phenomena fall.

<span class="mw-page-title-main">Response bias</span> Type of bias

Response bias is a general term for a wide range of tendencies for participants to respond inaccurately or falsely to questions. These biases are prevalent in research involving participant self-report, such as structured interviews or surveys. Response biases can have a large impact on the validity of questionnaires or surveys.

Internal validity is the extent to which a piece of evidence supports a claim about cause and effect, within the context of a particular study. It is one of the most important properties of scientific studies and is an important concept in reasoning about evidence more generally. Internal validity is determined by how well a study can rule out alternative explanations for its findings. It contrasts with external validity, the extent to which results can justify conclusions about other contexts. Both internal and external validity can be described using qualitative or quantitative forms of causal notation.

External validity is the validity of applying the conclusions of a scientific study outside the context of that study. In other words, it is the extent to which the results of a study can be generalized to and across other situations, people, stimuli, and times. In contrast, internal validity is the validity of conclusions drawn within the context of a particular study. Because general conclusions are almost always a goal in research, external validity is an important property of any study. Mathematical analysis of external validity concerns a determination of whether generalization across heterogeneous populations is feasible, and devising statistical and computational methods that produce valid generalizations.

The "ceiling effect" is one type of scale attenuation effect; the other scale attenuation effect is the "floor effect". The ceiling effect is observed when an independent variable no longer has an effect on a dependent variable, or the level above which variance in an independent variable is no longer measurable. The specific application varies slightly in differentiating between two areas of use for this term: pharmacological or statistical. An example of use in the first area, a ceiling effect in treatment, is pain relief by some kinds of analgesic drugs, which have no further effect on pain above a particular dosage level. An example of use in the second area, a ceiling effect in data-gathering, is a survey that groups all respondents into income categories, not distinguishing incomes of respondents above the highest level measured in the survey instrument. The maximum income level able to be reported creates a "ceiling" that results in measurement inaccuracy, as the dependent variable range is not inclusive of the true values above that point. The ceiling effect can occur any time a measure involves a set range in which a normal distribution predicts multiple scores at or above the maximum value for the dependent variable.

An open-access poll is a type of opinion poll in which a nonprobability sample of participants self-select into participation. The term includes call-in, mail-in, and some online polls.

A self-report study is a type of survey, questionnaire, or poll in which respondents read the question and select a response by themselves without any outside interference. A self-report is any method which involves asking a participant about their feelings, attitudes, beliefs and so on. Examples of self-reports are questionnaires and interviews; self-reports are often used as a way of gaining participants' responses in observational studies and experiments.

<span class="mw-page-title-main">Observational study</span> Study with uncontrolled variable of interest

In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample to a population where the independent variable is not under the control of the researcher because of ethical concerns or logistical constraints. One common observational study is about the possible effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator. This is in contrast with experiments, such as randomized controlled trials, where each subject is randomly assigned to a treated group or a control group. Observational studies, for lacking an assignment mechanism, naturally present difficulties for inferential analysis.

Impact evaluation assesses the changes that can be attributed to a particular intervention, such as a project, program or policy, both the intended ones, as well as ideally the unintended ones. In contrast to outcome monitoring, which examines whether targets have been achieved, impact evaluation is structured to answer the question: how would outcomes such as participants' well-being have changed if the intervention had not been undertaken? This involves counterfactual analysis, that is, "a comparison between what actually happened and what would have happened in the absence of the intervention." Impact evaluations seek to answer cause-and-effect questions. In other words, they look for the changes in outcome that are directly attributable to a program.

Participation bias or non-response bias is a phenomenon in which the results of elections, studies, polls, etc. become non-representative because the participants disproportionately possess certain traits which affect the outcome. These traits mean the sample is systematically different from the target population, potentially resulting in biased estimates.

In the field of social psychology, illusory superiority is a condition of cognitive bias wherein a person overestimates their own qualities and abilities, in relation to the same qualities and abilities of other people. Illusory superiority is one of many positive illusions, relating to the self, that are evident in the study of intelligence, the effective performance of tasks and tests, and the possession of desirable personal characteristics and personality traits. Overestimation of abilities compared to an objective measure is known as the overconfidence effect.

Heuristics is the process by which humans use mental shortcuts to arrive at decisions. Heuristics are simple strategies that humans, animals, organizations, and even machines use to quickly form judgments, make decisions, and find solutions to complex problems. Often this involves focusing on the most relevant aspects of a problem or situation to formulate a solution. While heuristic processes are used to find the answers and solutions that are most likely to work or be correct, they are not always right or the most accurate. Judgments and decisions based on heuristics are simply good enough to satisfy a pressing need in situations of uncertainty, where information is incomplete. In that sense they can differ from answers given by logic and probability.

References

  1. Ziliak, S.T., McCloskey, D.N. (2008) The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives, University of Michigan Press. ISBN   0-472-05007-9
  2. Lenskyj, Helen Jefferson (2008). Olympic Industry Resistance: Challenging Olympic Power and Propaganda . State University of New York Press. p.  56. ISBN   978-0-7914-7479-2.