Alternative hypothesis

Last updated

In statistical hypothesis testing, the alternative hypothesis is one of the proposed proposition in the hypothesis test. In general the goal of hypothesis test is to demonstrate that in the given condition, there is sufficient evidence supporting the credibility of alternative hypothesis instead of the exclusive proposition in the test (null hypothesis). [1] It is usually consistent with the research hypothesis because it is constructed from literature review, previous studies, etc. However, the research hypothesis is sometimes consistent with the null hypothesis.

Contents

In statistics, alternative hypothesis is often denoted as Ha or H1. Hypotheses are formulated to compare in a statistical hypothesis test.

In the domain of inferential statistics two rival hypotheses can be compared by explanatory power and predictive power.

Basic definition

The alternative hypothesis and null hypothesis are types of conjectures used in statistical tests, which are formal methods of reaching conclusions or making judgments on the basis of data. In statistical hypothesis testing, the null hypothesis and alternative hypothesis are two mutually exclusive statements.

"The statement being tested in a test of statistical significance is called the null hypothesis. The test of significance is designed to assess the strength of the evidence against the null hypothesis. Usually, the null hypothesis is a statement of 'no effect' or 'no difference'." [2] Null hypothesis is often denoted as H0.

The statement that is being tested against the null hypothesis is the alternative hypothesis [2] . Alternative hypothesis is often denoted as Ha or H1.

In statistical hypothesis testing, to prove the alternative hypothesis is true, it should be shown that the data is contradictory to the null hypothesis. Namely, there is sufficient evidence against null hypothesis to demonstrate that the alternative hypothesis is true.

Example

One example is where water quality in a stream has been observed over many years, and a test is made of the null hypothesis that "there is no change in quality between the first and second halves of the data", against the alternative hypothesis that "the quality is poorer in the second half of the record".

If the statistical hypothesis testing is thought of as a judgement in a court trial, the null hypothesis corresponds to the position of the defendant (the defendant is innocent) while the alternative hypothesis is in the rival position of prosecutor (the defendant is guilty). The defendant is innocent until proven guilty, so likewise in a hypothesis test, the null hypothesis is initially presumed to be true. To prove the statement of the prosecutor, evidence must be convincing enough to convict the defendant; this is analogous to sufficient statistical significance in a hypothesis test.

In the court, only legal evidence can be considered as the foundation for the trial. As for hypothesis testing, a reasonable test statistic should be set to measure the statistic significance of the null hypothesis. Evidence would support the alternative hypothesis if the null hypothesis is rejected at a certain significance level. However, this does not necessarily mean that the alternative hypothesis is true due to the potential presence of a type I error. In order to quantify the statistical significance, the test statistic variables are assumed to follow a certain probability distribution such as the normal distribution or t-distribution to determine the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct, which is defined as the p-value. [3] [4] If the p-value is smaller than the than the chosen significance level (α), it can be claimed that observed data is sufficiently inconsistent with the null hypothesis and hence the null hypothesis may be rejected. After testing, a valid claim would be "at the significance level of (α), the null hypothesis is rejected, supporting the alternative hypothesis instead". In the metaphor of a trial, the announcement may be "with tolerance for the probability α of an incorrect conviction, the defendant is guilty."

History

The concept of an alternative hypothesis in testing was devised by Jerzy Neyman and Egon Pearson, and it is used in the Neyman–Pearson lemma. It forms a major component in modern statistical hypothesis testing. However it was not part of Ronald Fisher's formulation of statistical hypothesis testing, and he opposed its use. [5] In Fisher's approach to testing, the central idea is to assess whether the observed dataset could have resulted from chance if the null hypothesis were assumed to hold, notionally without preconceptions about what other models might hold.[ citation needed ] Modern statistical hypothesis testing accommodates this type of test since the alternative hypothesis can be just the negation of the null hypothesis.

Types

In the case of a scalar parameter, there are four principal types of alternative hypothesis:

Related Research Articles

A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters.

In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models, specifically one found by maximization over the entire parameter space and another found after imposing some constraint, based on the ratio of their likelihoods. If the constraint is supported by the observed data, the two likelihoods should not differ by more than sampling error. Thus the likelihood-ratio test tests whether this ratio is significantly different from one, or equivalently whether its natural logarithm is significantly different from zero.

In statistical hypothesis testing, a result has statistical significance when a result at least as "extreme" would be very infrequent if the null hypothesis were true. More precisely, a study's defined significance level, denoted by , is the probability of the study rejecting the null hypothesis, given that the null hypothesis is true; and the p-value of a result, , is the probability of obtaining a result at least as extreme, given that the null hypothesis is true. The result is statistically significant, by the standards of the study, when . The significance level for a study is chosen before data collection, and is typically set to 5% or much lower—depending on the field of study.

In scientific research, the null hypothesis is the claim that no relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is due to chance alone, and an underlying causative relationship does not exist, hence the term "null". In addition to the null hypothesis, an alternative hypothesis is also developed, which claims that a relationship does exist between two variables.

In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by , and represents the chances of a true positive detection conditional on the actual existence of an effect to detect. Statistical power ranges from 0 to 1, and as the power of a test increases, the probability of making a type II error by wrongly failing to reject the null hypothesis decreases.

In null-hypothesis significance testing, the p-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis. Even though reporting p-values of statistical tests is common practice in academic publications of many quantitative fields, misinterpretation and misuse of p-values is widespread and has been a major topic in mathematics and metascience. In 2016, the American Statistician Association (ASA) made a formal statement that "p-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone" and that "a p-value, or statistical significance, does not measure the size of an effect or the importance of a result" or "evidence regarding a model or hypothesis." That said, a 2019 task force by ASA has issued a statement on statistical significance and replicability, concluding with: "p-values and significance tests, when properly applied and interpreted, increase the rigor of the conclusions drawn from data."

In statistics, the Neyman–Pearson lemma was introduced by Jerzy Neyman and Egon Pearson in a paper in 1933. The Neyman-Pearson lemma is part of the Neyman-Pearson theory of statistical testing, which introduced concepts like errors of the second kind, power function, and inductive behavior. The previous Fisherian theory of significance testing postulated only one hypothesis. By introducing a competing hypothesis, the Neyman-Pearsonian flavor of statistical testing allows investigating the two types of errors. The trivial cases where one always rejects or accepts the null hypothesis are of little interest but it does prove that one must not relinquish control over one type of error while calibrating the other. Neyman and Pearson accordingly proceeded to restrict their attention to the class of all level tests while subsequently minimizing type II error, traditionally denoted by . Their seminal paper of 1933, including the Neyman-Pearson lemma, comes at the end of this endeavor, not only showing the existence of tests with the most power that retain a prespecified level of type I error, but also providing a way to construct such tests. The Karlin-Rubin theorem extends the Neyman-Pearson lemma to settings involving composite hypotheses with monotone likelihood ratios.

In statistics, the binomial test is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories using sample data.

<span class="mw-page-title-main">One- and two-tailed tests</span> Alternative ways of computing the statistical significance of a parameter inferred from a data set

In statistical significance testing, a one-tailed test and a two-tailed test are alternative ways of computing the statistical significance of a parameter inferred from a data set, in terms of a test statistic. A two-tailed test is appropriate if the estimated value is greater or less than a certain range of values, for example, whether a test taker may score above or below a specific range of scores. This method is used for null hypothesis testing and if the estimated value exists in the critical areas, the alternative hypothesis is accepted over the null hypothesis. A one-tailed test is appropriate if the estimated value may depart from the reference value in only one direction, left or right, but not both. An example can be whether a machine produces more than one-percent defective products. In this situation, if the estimated value exists in one of the one-sided critical areas, depending on the direction of interest, the alternative hypothesis is accepted over the null hypothesis. Alternative names are one-sided and two-sided tests; the terminology "tail" is used because the extreme portions of distributions, where observations lead to rejection of the null hypothesis, are small and often "tail off" toward zero as in the normal distribution, colored in yellow, or "bell curve", pictured on the right and colored in green.

A test statistic is a statistic used in statistical hypothesis testing. A hypothesis test is typically specified in terms of a test statistic, considered as a numerical summary of a data-set that reduces the data to one value that can be used to perform the hypothesis test. In general, a test statistic is selected or defined in such a way as to quantify, within observed data, behaviours that would distinguish the null from the alternative hypothesis, where such an alternative is prescribed, or that would characterize the null hypothesis if there is no explicitly stated alternative hypothesis.

In statistics, an exact (significance) test is a test such that if the null hypothesis is true, then all assumptions made during the derivation of the distribution of the test statistic are met. Using an exact test provides a significance test that maintains the type I error rate of the test at the desired significance level of the test. For example, an exact test at a significance level of , when repeated over many samples where the null hypothesis is true, will reject at most of the time. This is in contrast to an approximate test in which the desired type I error rate is only approximately maintained, while this approximation may be made as close to as desired by making the sample size sufficiently large.

In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests.

In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis, while a type II error is the failure to reject a null hypothesis that is actually false. Much of statistical theory revolves around the minimization of one or both of these errors, though the complete elimination of either is a statistical impossibility if the outcome is not determined by a known, observable causal process. By selecting a low threshold (cut-off) value and modifying the alpha (α) level, the quality of the hypothesis test can be increased. The knowledge of type I errors and type II errors is widely used in medical science, biometrics and computer science.

Omnibus tests are a kind of statistical test. They test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall. One example is the F-test in the analysis of variance. There can be legitimate significant effects within a model even if the omnibus test is not significant. For instance, in a model with two independent variables, if only one variable exerts a significant effect on the dependent variable and the other does not, then the omnibus test may be non-significant. This fact does not affect the conclusions that may be drawn from the one significant variable. In order to test effects within an omnibus test, researchers often use contrasts.

<span class="mw-page-title-main">Fisher's method</span> Statistical method

In statistics, Fisher's method, also known as Fisher's combined probability test, is a technique for data fusion or "meta-analysis" (analysis of analyses). It was developed by and named for Ronald Fisher. In its basic form, it is used to combine the results from several independence tests bearing upon the same overall hypothesis (H0).

<span class="mw-page-title-main">Multiple comparisons problem</span> Statistical interpretation with many tests

In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values.

Statistics is a scientific field that deals with data. It utilizes data to solve practical problems and analyzes collected data in order to draw conclusions. Different approaches to analyzing the same data can lead to different conclusions. For example, weather forecasts can vary among different forecasting agencies that use different forecasting algorithms and techniques. Conclusions drawn from statistical analysis often involve uncertainty as they represent the probability of an event occurring. For instance, a weather forecast indicating a 90% probability of rain would classify it as likely to rain, while a 5% probability would suggest it is unlikely to rain. The actual outcome, whether it rains or not, can only be determined afterwards.

Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or proportion of findings in the data. Frequentist-inference underlies frequentist statistics, in which the well-established methodologies of statistical hypothesis testing and confidence intervals are founded.

Misuse of p-values is common in scientific research and scientific education. p-values are often used or interpreted incorrectly; the American Statistical Association states that p-values can indicate how incompatible the data are with a specified statistical model. From a Neyman–Pearson hypothesis testing approach to statistical inferences, the data obtained by comparing the p-value to a significance level will yield one of two results: either the null hypothesis is rejected, or the null hypothesis cannot be rejected at that significance level. From a Fisherian statistical testing approach to statistical inferences, a low p-value means either that the null hypothesis is true and a highly improbable event has occurred or that the null hypothesis is false.

In statistical hypothesis testing, the error exponent of a hypothesis testing procedure is the rate at which the probabilities of Type I and Type II decay exponentially with the size of the sample used in the test. For example, if the probability of error of a test decays as , where is the sample size, the error exponent is .

References

  1. Carlos Cortinhas; Ken Black (23 September 2014). Statistics for Business and Economics. Wiley. p. 314. ISBN   978-1-119-94335-8.
  2. 1 2 Moore, David S. (2003). Introduction to the practice of statistics. George P. McCabe (Fourth ed.). New York. ISBN   0-7167-9657-0. OCLC   49751157.
  3. "Which scientists can winningly explain a flame, time, sleep, color, or sound to 11-year-olds?". Physics Today. 2015-11-24. doi:10.1063/pt.5.8150. ISSN   1945-0699.
  4. Wasserstein, Ronald L.; Lazar, Nicole A. (2016-04-02). "The ASA Statement on p -Values: Context, Process, and Purpose". The American Statistician. 70 (2): 129–133. doi: 10.1080/00031305.2016.1154108 . ISSN   0003-1305. S2CID   124084622.
  5. Cohen, J. (1990). "Things I have learned (so far)". American Psychologist. 45 (12): 1304–1312. doi:10.1037/0003-066X.45.12.1304. S2CID   7180431.

See also