This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these template messages)
|
Discrimination testing is a technique employed in sensory analysis to determine whether there is a detectable difference among two or more products. The test uses a group of assessors (panellists) with a degree of training appropriate to the complexity of the test to discriminate from one product to another through one of a variety of experimental designs. Though useful, these tests typically do not quantify or describe any differences, requiring a more specifically trained panel under different study design to describe differences and assess significance of the difference.
The statistical principle behind any discrimination test should be to reject a null hypothesis (H0) that states there is no detectable difference between two (or more) products. If there is sufficient evidence to reject H0 in favor of the alternative hypothesis, HA: There is a detectable difference, then a difference can be recorded. However, failure to reject H0 should not be assumed to be sufficient evidence to accept it. H0 is formulated on the premise that all of the assessors guessed when they made their response. The statistical test chosen should give a probability value that the result was arrived at through pure guesswork. If this probability is sufficiently low (usually below 0.05 or 5%) then H0 can be rejected in favor of HA.
Tests used to decide whether or not to reject H0 include binomial, χ2 (Chi-squared), t-test etc.
A number of tests can be classified as discrimination tests. If it's designed to detect a difference then it's a discrimination test. The type of test determines the number of samples presented to each member of the panel and also the question(s) they are asked to respond to.
Schematically, these tests may be described as follows; A & B are used for knowns, X and Y are used for different unknowns, while (AB) means that the order of presentation is unknown:
In this type of test the assessors are presented with two products and are asked to state which product fulfils a certain condition. This condition will usually be some attribute such as sweetness, sourness, intensity of flavor, etc. The probability for each assessor arriving at a correct response by guessing is
Minimum number of samples required. Most straightforward approach when the question is 'Which sample is more ____?"
Need to know in advance the attribute that is likely to change. Not statistically powerful with large panel sizes required to obtain sufficient confidence (citation?).
The assessors are presented with three products, one of which is identified as the control. Of the other two, one is identical to the control, the other is the test product. The assessors are asked to state which product more closely resembles the control.
The probability for each assessor arriving at a correct response by guessing is
Quick to set up and execute. No need to have prior knowledge of nature of difference.
Not statistically powerful therefore relatively large panel sizes required to obtain sufficient confidence.
The assessors are presented with three products, two of which are identical and the other one different. The assessors are asked to state which product they believe is the odd one out. [1]
The probability for each assessor arriving at a correct response by guessing is
Can be quick to execute and offers greater power than paired comparison or duo-trio.
Error might occur:
There are many other errors which can occur but the above are the main possible errors. It is evident from the above information that randomization, control and professional conduct of the experiment are essential for obtaining the most accurate results.
Important
Used to assist research and development in formulating and reformulating products. Using the triangle design to determine if a particular ingredient change, or a change in processing, creates a detectable difference in the final product. Triangle taste testing is also used in quality control to determine if a particular production run (or production from different factories) meets the quality-control standard (i.e., is not different from the product standard in a triangle taste test using discriminators).
The assessors are presented with three products, two of which are identified as reference A and alternative B, the third is unknown X, and identical to either A or B. The assessors are asked to state which of A and B the unknown is; the test may also be described as "matching-to-sample", or "duo-trio in balanced reference mode" (both knowns are presented as reference, rather than only one).
ABX testing is widely used in comparison of audio compression algorithms, but less used in food science.
ABX testing differs from the other listed tests in that subjects are given two known different samples, and thus are able to compare them with an eye towards differences – there is an "inspection phase". While this may be hypothesized to make discrimination easier, no advantage has been observed in discrimination performance in ABX testing compared with other testing methods. [2]
Like triangle testing, but third is known to not be the odd one out. Intermediate between ABX (where which of the first is which – which is control, which is proposed new one – is stated), and triangle, where any of the three could be out.
This section is empty. You can help by adding to it. (March 2010) |
This section is empty. You can help by adding to it. (July 2010) |
This section is empty. You can help by adding to it. (July 2010) |
The design of experiments, also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associated with experiments in which the design introduces conditions that directly affect the variation, but may also refer to the design of quasi-experiments, in which natural conditions that influence the variation are selected for observation.
Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters.
The likelihood function represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood function indicates which parameter values are more likely than others, in the sense that they would have made the observed data more probable. Consequently, the likelihood is often written as instead of , to emphasize that it is to be understood as a function of the parameters instead of the random variable .
In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by , and represents the chances of a true positive detection conditional on the actual existence of an effect to detect. Statistical power ranges from 0 to 1, and as the power of a test increases, the probability of making a type II error by wrongly failing to reject the null hypothesis decreases.
An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently, the ratio of the odds of B in the presence of A and the odds of B in the absence of A. Two events are independent if and only if the OR equals 1, i.e., the odds of one event are the same in either the presence or absence of the other event. If the OR is greater than 1, then A and B are associated (correlated) in the sense that, compared to the absence of B, the presence of B raises the odds of A, and symmetrically the presence of A raises the odds of B. Conversely, if the OR is less than 1, then A and B are negatively correlated, and the presence of one event reduces the odds of the other event.
In psychometrics, item response theory (IRT) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is a theory of testing based on the relationship between individuals' performances on a test item and the test takers' levels of performance on an overall measure of the ability that item was designed to measure. Several different statistical models are used to represent both item and test taker characteristics. Unlike simpler alternatives for creating scales and evaluating questionnaire responses, it does not assume that each item is equally difficult. This distinguishes IRT from, for instance, Likert scaling, in which "All items are assumed to be replications of each other or in other words items are considered to be parallel instruments". By contrast, item response theory treats the difficulty of each item as information to be incorporated in scaling items.
Psychophysics quantitatively investigates the relationship between physical stimuli and the sensations and perceptions they produce. Psychophysics has been described as "the scientific study of the relation between stimulus and sensation" or, more completely, as "the analysis of perceptual processes by studying the effect on a subject's experience or behaviour of systematically varying the properties of a stimulus along one or more physical dimensions".
A t-test is any statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is estimated based on the data, the test statistic—under certain conditions—follows a Student's t distribution. The t-test's most common application is to test whether the means of two populations are different.
A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The method was originally developed for operators of military radar receivers starting in 1941, which led to its name.
Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power. In complicated studies there may be several different sample sizes: for example, in a stratified survey there would be different sizes for each stratum. In a census, data is sought for an entire population, hence the intended sample size is equal to the population. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group.
This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.
The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between the respondent's abilities, attitudes, or personality traits, and the item difficulty. For example, they may be used to estimate a student's reading ability or the extremity of a person's attitude to capital punishment from responses on a questionnaire. In addition to psychometrics and educational research, the Rasch model and its extensions are used in other areas, including the health profession, agriculture, and market research
The sign test is a statistical method to test for consistent differences between pairs of observations, such as the weight of subjects before and after treatment. Given pairs of observations for each subject, the sign test determines if one member of the pair tends to be greater than the other member of the pair.
In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two random variables, or two probability distributions or samples, or the distance can be between an individual sample point and a population or a wider sample of points.
Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.
An ABX test is a method of comparing two choices of sensory stimuli to identify detectable differences between them. A subject is presented with two known samples followed by one unknown sample X that is randomly selected from either A or B. The subject is then required to identify X as either A or B. If X cannot be identified reliably with a low p-value in a predetermined number of trials, then the null hypothesis cannot be rejected and it cannot be proven that there is a perceptible difference between A and B.
Differential item functioning (DIF) is a statistical characteristic of an item that shows the extent to which the item might be measuring different abilities for members of separate subgroups. Average item scores for subgroups having the same overall score on the test are compared to determine whether the item is measuring in essentially the same way for all subgroups. The presence of DIF requires review and judgment, and it does not necessarily indicate the presence of bias. DIF analysis provides an indication of unexpected behavior of items on a test. An item does not display DIF if people from different groups have a different probability to give a certain response; it displays DIF if and only if people from different groups with the same underlying true ability have a different probability of giving a certain response. Common procedures for assessing DIF are Mantel-Haenszel, item response theory (IRT) based methods, and logistic regression.
Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or proportion of findings in the data. Frequentist-inference underlies frequentist statistics, in which the well-established methodologies of statistical hypothesis testing and confidence intervals are founded.
A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a computer. In the past, sometimes mechanical or electronic plotters were used. Graphs are a visual representation of the relationship between variables, which are very useful for humans who can then quickly derive an understanding which may not have come from lists of values. Given a scale or ruler, graphs can also be used to read off the value of an unknown variable plotted as a function of a known one, but this can also be done with data presented in tabular form. Graphs of functions are used in mathematics, sciences, engineering, technology, finance, and other areas.