Predictive probability of success

Last updated

Predictive probability of success (PPOS) is a statistics concept commonly used in the pharmaceutical industry including by health authorities to support decision making. In clinical trials, PPOS is the probability of observing a success in the future based on existing data. It is one type of probability of success. A Bayesian means by which the PPOS can be determined is through integrating the data's likelihood over possible future responses (posterior distribution). [1]

Contents

Types of PPOS

  1. Cross trial PPOS: using data from one trial to predict the other trial
  2. Within trial PPOS: using data at interim analysis to predict the same trial at final analysis
  1. 1 to 1 PPOS: using one end point to predict the same end point
  2. 1 to 1* PPOS: using one end point to predict another different but correlated end point

Relationship with conditional power and predictive power

Conditional power is the probability of observing a statistically significance assuming the parameter equals to a specific value. [2] More specifically, these parameters could be treatment and placebo event rates that could be fixed in future observations. [3] This is a frequentist statistical power. Conditional power is often criticized for assuming the parameter equals to a specific value which is not known to be true. If the true value of the parameter is known, there is no need to do an experiment.

Predictive power addresses this issue assuming the parameter has a specific distribution. Predictive power is a Bayesian power. A parameter in Bayesian setting is a random variable. Predictive power is a function of a parameter(s), therefore predictive power is also a variable.

Both conditional power and predictive power use statistical significance as success criteria. However statistical significance is often not enough to define success. For example, health authorities often require the magnitude of treatment effect to be bigger than statistical significance to support a registration decision.

To address this issue, predictive power can be extended to the concept of PPOS. The success criteria for PPOS is not restricted to statistical significance. It can be something else such as clinical meaningful results. PPOS is conditional probability conditioned on a random variable, therefore it is also a random variable. The observed value is just a realization of the random variable. [4]

Relationship with posterior probability of success

Posterior probability of success is calculated from posterior distribution. PPOS is calculated from predictive distribution. Posterior distribution is the summary of uncertainties about the parameter. Predictive distribution has not only the uncertainty about parameter but also the uncertainty about estimating parameter using data. Posterior distribution and predictive distribution have same mean, but former has smaller variance.

Common issues in current practice of PPOS

PPOS is a conditional probability conditioned on randomly observed data and hence is a random variable itself. Currently common practice of PPOS uses only its point estimate in applications. This can be misleading. For a variable, the amount of uncertainty is an important part of the story. To address this issue, Tang [5] introduced PPOS credible interval to quantify the amount of its uncertainty. Tang advocates to use both PPOS point estimate and credible interval in applications such as decision making and clinical trial designs. Another common issue is the mixed use of posterior probability of success and PPOS. As described in the previous section, the 2 statistics are measured in 2 different metrics, comparing them is like comparing apples and oranges.

Applications in clinical trial design

PPOS can be used to design futility interim for a big confirmatory trials or pilot trials.

Pilot trial design using PPOS

Traditional pilot trial design is typically done by controlling type I error rate and power for detecting a specific parameter value. The goal of a pilot trials such as a phase II trial is usually not to support registration. Therefore, it doesn't make sense to control type I error rate especially a big type I error as typically done in a phase II trial. A pilot trial usually provides evidence to support a Go/No Go decision for a confirmatory trial. Therefore, it makes more sense to design a trial based on PPOS. To support a No/Go decision, traditional methods require the PPOS to be small. However the PPOS can be small just due to chance. To solve this issue, we can require the PPOS credible interval to be tight such that the PPOS calculation is supported by sufficient information and hence PPOS is not small just due to chance. Finding an optimal design is equivalent to find the solution to the following 2 equations.

  1. PPOS=PPOS1
  2. upper bound of PPOS credible interval=PPOS2

where PPOS1 and PPOS2 are some user-defined cutoff values. The first equation ensures that the PPOS is small such that not too many trials will be prevented entering next stage to guard against false negative. The first equation also ensures that the PPOS is not too small such that not too many trials will enter the next stage to guard against false positive. The second equation ensures that the PPOS credible interval is tight such that the PPOS calculation is supported by sufficient information. The second equation also ensures that the PPOS credible interval is not too tight such that it won't demand too much resource.

Futility interim design using PPOS

PPOS can also be used in Interim analysis to determine whether a clinical trial should be continued. PPOS can be used for this purpose because its value can be used to indicate if there is enough convincing evidence to either reject or fail to reject the null hypothesis with the presently available data. [1] PPOS can also be used in the assessment of futility. [1] Futility is when a clinical trial does not show signs of reaching its objective (i.e. providing enough to make a conclusion about the null). [6]

Traditional futility interim is designed based on beta spending. However beta spending doesn't have intuitive interpretation. Therefore, it is difficult to communicate with non-statistician colleagues. Since PPOS has intuitive interpretation, it makes more sense to design futility interim using PPOS. To declare futility, we mandate the PPOS to be small and PPOS calculation is supported by sufficient information. Finding the optimal design is equivalent to solving the following 2 equations.

  1. PPOS=PPOS1
  2. upper bound of PPOS credible interval=PPOS2

Calculating PPOS using simulations

In interim analysis, Predictive Probability of Success can also be calculated through the use of simulations through the following method: [1]

  1. Sample the parameter of interest from the posterior distribution attained from the currently available set of data
  2. Complete the dataset by sampling from the predictive distribution which holds values not yet observed in the data under interim analysis
  3. Use the newly completed dataset to calculate criteria used to calculate success which could be things like p-values, posterior probabilities, etc. This can then be used to categorized if a trial was a success or not.
  4. These three steps then get repeated a total of n number of times. The PPOS is determined by getting the proportion of trials that were successes in the dataset.

Using simulation to calculate PPOS makes it possible to test statistics with complex distributions since it alleviates the computing complexity that would otherwise be required. [3]

Related Research Articles

Statistical inference

Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, carpooling, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".

In statistics, point estimation involves the use of sample data to calculate a single value which is to serve as a "best guess" or "best estimate" of an unknown population parameter. More formally, it is the application of a point estimator to the data to obtain a point estimate.

(For a list of mathematical logic notation used in this article see Notation in Probability and Statistics and/or List of Logic Symbols.)

In statistics, a confidence interval (CI) is a type of estimate computed from the statistics of the observed data. This gives a range of values for an unknown parameter. The interval has an associated confidence level that gives the probability with which an estimated interval will contain the true value of the parameter. The confidence level is chosen by the investigator. For a given estimation in a given sample, using a higher confidence level generates a wider confidence interval. In general terms, a confidence interval for an unknown parameter is based on sampling the distribution of a corresponding estimator.

In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence or background is taken into account. "Posterior", in this context, means after taking into account the relevant evidence related to the particular case being examined.

(For a list of mathematical logic notation used in this article see Notation in Probability and Statistics and/or List of Logic Symbols.)

In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into account. For example, the prior could be the probability distribution representing the relative proportions of voters who will vote for a particular politician in a future election. The unknown quantity may be a parameter of the model or a latent variable rather than an observable variable.

In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis.

In Bayesian probability theory, if the posterior distribution p(θ | x) is in the same probability distribution family as the prior probability distribution p(θ), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function p(x | θ).

The following is a glossary of terms used in the mathematical sciences statistics and probability.

In Bayesian statistics, a credible interval is an interval within which an unobserved parameter value falls with a particular probability. It is an interval in the domain of a posterior probability distribution or a predictive distribution. The generalisation to multivariate problems is the credible region. Credible intervals are analogous to confidence intervals in frequentist statistics, although they differ on a philosophical basis: Bayesian intervals treat their bounds as fixed and the estimated parameter as a random variable, whereas frequentist confidence intervals treat their bounds as random variables and the parameter as a fixed value. Also, Bayesian credible intervals use knowledge of the situation-specific prior distribution, while the frequentist confidence intervals do not.

Uncertainty quantification (UQ) is the science of quantitative characterization and reduction of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known. An example would be to predict the acceleration of a human body in a head-on crash with another car: even if the speed was exactly known, small differences in the manufacturing of individual cars, how tightly every bolt has been tightened, etc., will lead to different results that can only be predicted in a statistical sense.

In probability theory and statistics, a categorical distribution is a discrete probability distribution that describes the possible results of a random variable that can take on one of K possible categories, with the probability of each category separately specified. There is no innate underlying ordering of these outcomes, but numerical labels are often attached for convenience in describing the distribution,. The K-dimensional categorical distribution is the most general distribution over a K-way event; any other discrete distribution over a size-K sample space is a special case. The parameters specifying the probabilities of each possible outcome are constrained only by the fact that each must be in the range 0 to 1, and all must sum to 1.

In Bayesian statistics, a hyperparameter is a parameter of a prior distribution; the term is used to distinguish them from parameters of the model for the underlying system under analysis.

In Bayesian inference, the Bernstein-von Mises theorem provides the basis for using Bayesian credible sets for confidence statements in parametric models. It states that under some conditions, a posterior distribution converges in the limit of infinite data to a multivariate normal distribution centered at the maximum likelihood estimator with covariance matrix given by , where is the true population parameter and is the Fisher information matrix at the true population parameter value.

In Bayesian statistics, the posterior predictive distribution is the distribution of possible unobserved values conditional on the observed values.

In statistics, spike-and-slab regression is a Bayesian variable selection technique that is particularly useful when the number of possible predictors is larger than the number of observations.

The probability of success (POS) is a statistics concept commonly used in the pharmaceutical industry including by health authorities to support decision making.

References

  1. 1 2 3 4 Saville, Benjamin R.; Connor, Jason T.; Ayers, Gregory D.; Alvarez, JoAnn (2014-08-01). "The utility of Bayesian predictive probabilities for interim monitoring of clinical trials". Clinical Trials. 11 (4): 485–493. doi:10.1177/1740774514531352. ISSN   1740-7745. PMC   4247348 . PMID   24872363.
  2. Ankerst, J; Ankerst, D. Handbook of statistics in clinical oncology (2nd ed.). p. 232.
  3. 1 2 Trzaskoma, Benjamin; Sashegyi, Andreas (2007-01-01). "Predictive Probability of Success and the Assessment of Futility in Large Outcomes Trials". Journal of Biopharmaceutical Statistics. 17 (1): 45–63. doi:10.1080/10543400601001485. ISSN   1054-3406. PMID   17219755.
  4. Tang, Z (2015-05-28). "PPOS design". slideshare.
  5. Tang, Z (2015). "Optimal futility interim design: a predictive probability of success approach with time to event end point". Journal of Biopharmaceutical Statistics. 25 (6): 1312–1319. doi:10.1080/10543406.2014.983646. PMID   25379701.
  6. Snapinn, Steven; Chen, Mon-Gy; Jiang, Qi; Koutsoukos, Tony (2016-12-01). "Assessment of futility in clinical trials". Pharmaceutical Statistics. 5 (4): 273–281. doi:10.1002/pst.216. ISSN   1539-1604. PMID   17128426.