Randomized response

Last updated

Randomised response is a research method used in structured survey interview. It was first proposed by S. L. Warner in 1965 and later modified by B. G. Greenberg and coauthors in 1969. [1] [2] It allows respondents to respond to sensitive issues (such as criminal behavior or sexuality) while maintaining confidentiality. Chance decides, unknown to the interviewer, whether the question is to be answered truthfully, or "yes", regardless of the truth.

Contents

For example, social scientists have used it to ask people whether they use drugs, whether they have illegally installed telephones, or whether they have evaded paying taxes. Before abortions were legal, social scientists used the method to ask women whether they had had abortions. [3]

The concept is somewhat similar to plausible deniability. Plausible deniability allows the subject to credibly say that they did not make a statement, while the randomized response technique allows the subject to credibly say that they had not been truthful when making a statement.

Example

With a coin

A person is asked if they had sex with a prostitute this month. Before they answer, they flip a coin. They are then instructed to answer "yes" if the coin comes up tails, and truthfully, if it comes up heads. Only they know whether their answer reflects the toss of the coin or their true experience. It is very important to assume that people who get heads will answer truthfully, otherwise the surveyor is not able to speculate.

Half the people—or half the questionnaire population—get tails and the other half get heads when they flip the coin. Therefore, half of those people will answer "yes" regardless of whether they have done it. The other half will answer truthfully according to their experience. So whatever proportion of the group said "no", the true number who did not have sex with a prostitute is double that, based on the assumption that the two halves are probably close to the same as it is a large randomized sampling. For example, if 20% of the population surveyed said "no", then the true fraction that did not have sex with a prostitute is 40%.

With cards

The same question can be asked with three cards which are unmarked on one side, and bear a question on the other side. The cards are randomly mixed, and laid in front of the subject. The subject takes one card, turns it over, and answers the question on it truthfully with either "yes" or "no".

The researcher does not know which question has been asked.

Under the assumption that the "yes" and "no" answers to the control questions cancel each other out, the number of subjects who have had sex with a prostitute is triple that of all "yes" answers in excess of the "no" answers.

Original version

Warner's original version (1965) is slightly different: The sensitive question is worded in two dichotomous alternatives, and chance decides, unknown to the interviewer, which one is to be answered honestly. The interviewer gets a "yes" or "no" without knowing to which of the two questions. For mathematical reasons chance cannot be "fair" (½ and ½). Let be the probability to answer the sensitive question and the true proportion of those interviewed bearing the embarrassing property, then the proportion of "yes"-answers is composed as follows:

Transformed to yield EP:

Example

The interviewed are asked to secretly throw a die and answer the first question only if they throw a 6, otherwise the second question (). The "yes"-answers are now composed of consumers who have thrown a 6 and non-consumers who have thrown a different number. Let the result be 75 "yes"-answers out of 100 interviewed (). Inserted into the formula you get

If all interviewed have answered honestly then their true proportion of consumers is 1/8 (= 12.5%).

See also

Related Research Articles

In complexity theory and computability theory, an oracle machine is an abstract machine used to study decision problems. It can be visualized as a Turing machine with a black box, called an oracle, which is able to solve certain problems in a single operation. The problem can be of any complexity class. Even undecidable problems, such as the halting problem, can be used.

<span class="mw-page-title-main">Prime number</span> Number divisible only by 1 or itself

A prime number is a natural number greater than 1 that is not a product of two smaller natural numbers. A natural number greater than 1 that is not prime is called a composite number. For example, 5 is prime because the only ways of writing it as a product, 1 × 5 or 5 × 1, involve 5 itself. However, 4 is composite because it is a product (2 × 2) in which both numbers are smaller than 4. Primes are central in number theory because of the fundamental theorem of arithmetic: every natural number greater than 1 is either a prime itself or can be factorized as a product of primes that is unique up to their order.

In mathematics, Pascal's triangle is a triangular array of the binomial coefficients arising in probability theory, combinatorics, and algebra. In much of the Western world, it is named after the French mathematician Blaise Pascal, although other mathematicians studied it centuries before him in Persia, India, China, Germany, and Italy.

<span class="mw-page-title-main">Bernoulli trial</span> Any experiment with two possible random outcomes

In the theory of probability and statistics, a Bernoulli trial is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is conducted. It is named after Jacob Bernoulli, a 17th-century Swiss mathematician, who analyzed them in his Ars Conjectandi (1713).

<span class="mw-page-title-main">Hardy–Weinberg principle</span> Principle in genetics

In population genetics, the Hardy–Weinberg principle, also known as the Hardy–Weinberg equilibrium, model, theorem, or law, states that allele and genotype frequencies in a population will remain constant from generation to generation in the absence of other evolutionary influences. These influences include genetic drift, mate choice, assortative mating, natural selection, sexual selection, mutation, gene flow, meiotic drive, genetic hitchhiking, population bottleneck, founder effect,inbreeding and outbreeding depression.

<span class="mw-page-title-main">Random graph</span> Graph generated by a random process

In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs lies at the intersection between graph theory and probability theory. From a mathematical perspective, random graphs are used to answer questions about the properties of typical graphs. Its practical applications are found in all areas in which complex networks need to be modeled – many random graph models are thus known, mirroring the diverse types of complex networks encountered in different areas. In a mathematical context, random graph refers almost exclusively to the Erdős–Rényi random graph model. In other contexts, any graph model may be referred to as a random graph.

<span class="mw-page-title-main">Questionnaire</span> Series of questions for gathering information

A questionnaire is a research instrument that consists of a set of questions for the purpose of gathering information from respondents through survey or statistical study. A research questionnaire is typically a mix of close-ended questions and open-ended questions. Open-ended, long-term questions offer the respondent the ability to elaborate on their thoughts. The Research questionnaire was developed by the Statistical Society of London in 1838.

<span class="mw-page-title-main">Response bias</span> Type of bias

Response bias is a general term for a wide range of tendencies for participants to respond inaccurately or falsely to questions. These biases are prevalent in research involving participant self-report, such as structured interviews or surveys. Response biases can have a large impact on the validity of questionnaires or surveys.

The Boy or Girl paradox surrounds a set of questions in probability theory, which are also known as The Two Child Problem, Mr. Smith's Children and the Mrs. Smith Problem. The initial formulation of the question dates back to at least 1959, when Martin Gardner featured it in his October 1959 "Mathematical Games column" in Scientific American. He titled it The Two Children Problem, and phrased the paradox as follows:

In social science research, social-desirability bias is a type of response bias that is the tendency of survey respondents to answer questions in a manner that will be viewed favorably by others. It can take the form of over-reporting "good behavior" or under-reporting "bad", or undesirable behavior. The tendency poses a serious problem with conducting research with self-reports. This bias interferes with the interpretation of average tendencies as well as individual differences.

A self-report study is a type of survey, questionnaire, or poll in which respondents read the question and select a response by themselves without any outside interference. A self-report is any method which involves asking a participant about their feelings, attitudes, beliefs and so on. Examples of self-reports are questionnaires and interviews; self-reports are often used as a way of gaining participants' responses in observational studies and experiments.

The Three Prisoners problem appeared in Martin Gardner's "Mathematical Games" column in Scientific American in 1959. It is mathematically equivalent to the Monty Hall problem with car and goat replaced respectively with freedom and execution.

<span class="mw-page-title-main">Paley graph</span>

In mathematics, Paley graphs are undirected graphs constructed from the members of a suitable finite field by connecting pairs of elements that differ by a quadratic residue. The Paley graphs form an infinite family of conference graphs, which yield an infinite family of symmetric conference matrices. Paley graphs allow graph-theoretic tools to be applied to the number theory of quadratic residues, and have interesting properties that make them useful in graph theory more generally.

<span class="mw-page-title-main">Monty Hall problem</span> Probability puzzle

The Monty Hall problem is a brain teaser, in the form of a probability puzzle, based nominally on the American television game show Let's Make a Deal and named after its original host, Monty Hall. The problem was originally posed in a letter by Steve Selvin to the American Statistician in 1975. It became famous as a question from reader Craig F. Whitaker's letter quoted in Marilyn vos Savant's "Ask Marilyn" column in Parade magazine in 1990:

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

Discrimination testing is a technique employed in sensory analysis to determine whether there is a detectable difference among two or more products. The test uses a group of assessors (panellists) with a degree of training appropriate to the complexity of the test to discriminate from one product to another through one of a variety of experimental designs. Though useful, these tests typically do not quantify or describe any differences, requiring a more specifically trained panel under different study design to describe differences and assess significance of the difference.

The Hardest Logic Puzzle Ever is a logic puzzle so called by American philosopher and logician George Boolos and published in The Harvard Review of Philosophy in 1996. Boolos' article includes multiple ways of solving the problem. A translation in Italian was published earlier in the newspaper La Repubblica, under the title L'indovinello più difficile del mondo.

In statistics, in the analysis of two-way randomized block designs where the response variable can take only two possible outcomes, Cochran's Q test is a non-parametric statistical test to verify whether k treatments have identical effects. It is named after William Gemmell Cochran. Cochran's Q test should not be confused with Cochran's C test, which is a variance outlier test. Put in simple technical terms, Cochran's Q test requires that there only be a binary response and that there be more than 2 groups of the same size. The test assesses whether the proportion of successes is the same between groups. Often it is used to assess if different observers of the same phenomenon have consistent results.

In theoretical computer science, the Aanderaa–Karp–Rosenberg conjecture is a group of related conjectures about the number of questions of the form "Is there an edge between vertex and vertex ?" that have to be answered to determine whether or not an undirected graph has a particular property such as planarity or bipartiteness. They are named after Stål Aanderaa, Richard M. Karp, and Arnold L. Rosenberg. According to the conjecture, for a wide class of properties, no algorithm can guarantee that it will be able to skip any questions: any algorithm for determining whether the graph has the property, no matter how clever, might need to examine every pair of vertices before it can give its answer. A property satisfying this conjecture is called evasive.

Differential privacy (DP) is an approach for providing privacy while sharing information about a group of individuals, by describing the patterns within the group while withholding information about specific individuals. This is done by making arbitrary small changes to individual data that do not change the statistics of interest. Thus the data cannot be used to infer much about any individual.

In psychology and social research, unmatched count, or item count, is a technique to improve, through anonymity, the number of true answers to possibly embarrassing or self-incriminating questions. It is very simple to use but yields only the number of people bearing the property of interest and leads to a larger sampling error than direct questions. It was introduced by Raghavarao and Federer in 1979.

References

  1. Warner, S. L. (March 1965). "Randomised response: a survey technique for eliminating evasive answer bias". Journal of the American Statistical Association . Taylor & Francis. 60 (309): 63–69. doi:10.1080/01621459.1965.10480775. JSTOR   2283137. PMID   12261830. S2CID   35435339.
  2. Greenberg, B. G.; et al. (June 1969). "The Unrelated Question Randomised Response Model: Theoretical Framework". Journal of the American Statistical Association . Taylor & Francis. 64 (326): 520–39. doi:10.2307/2283636. JSTOR   2283636.
  3. Abernathy, James R.; et al. (February 1970). "Estimates of induced abortion in urban North Carolina". Demography . 7 (1): 19–29. doi: 10.2307/2060019 . JSTOR   2060019. PMID   5524615.

Further reading