Recurrent event analysis

Last updated

Recurrent event analysis is a branch of survival analysis that analyzes the time until recurrences occur, such as recurrences of traits or diseases. Recurrent events are often analyzed in social sciences and medical studies, for example recurring infections, depressions or cancer recurrences. Recurrent event analysis attempts to answer certain questions, such as: how many recurrences occur on average within a certain time interval? Which factors are associated with a higher or lower risk of recurrence?

Contents

The processes which generate events repeatedly over time are referred to as recurrent event processes, which are different from processes analyzed in time-to-event analysis: whereas time-to-event analysis focuses on the time to a single terminal event, individuals may be at risk for subsequent events after the first in recurrent event analysis, until they are censored.

Introduction

Objectives of recurrent event analysis include: [1]

Notation and frameworks

For a single recurrent event process starting at , let denote the event times, where is the time of the th event. The associated counting process records the cumulative number of events generated by the process; specifically, is the number of events occurring over the time interval .

Models for recurrent events can be specified by considering the probability distribution for the number of recurrences in short intervals , given the history of event occurrence before time . The intensity function describes the instantaneous probability of an event occurring at time , conditional on the process history, and describes the process mathematically. Define the process history as , then the intensity is formally defined asWhen a heterogeneous group of individuals or processes is considered, the assumption of a common event intensity is no longer plausible. Greater generality can be achieved by incorporating fixed or time-varying covariates in the intensity function.

Description of recurrent event data

As a counterpart of the Kaplan–Meier curve, which is used to describe the time to a terminal event, recurrent event data can be described using the mean cumulative function, which is the average number of cumulative events experienced by an individual in the study at each point in time since the start of follow-up.

Statistical models for recurrent event data

Poisson model

The Poisson model is a popular model for recurrent event data, which models the number of recurrences that have occurred. Poisson regression assumes that the number of recurrences has a Poisson distribution with a fixed rate of recurrence over time. The logarithm of the expected number of recurrences is modeled by a linear combination of explanatory variables.

Marginal means/rates model

The marginal means/rates model considers all recurrent events of the same subject as a single counting process and does not require time-varying covariates to reflect the past history of the process, which makes it a more flexible model. [2] Instead, the full history of the counting process may influence the mean function of recurrent events.

Multi-state model

In multi-state models, the recurrent event processes of individuals are described by different states. The different states may describe the recurrence number, or whether the subject is at risk of recurrence. A change of state is called a transition (or an event) and is central in this framework, which is fully characterized through estimation of transition probabilities between states and transition intensities that are defined as instantaneous hazards of progression to one state, conditional on occupying another state. [2]

Extended Cox proportional hazards (PH) models

Extensions of the Cox proportional hazard models are popular models in social sciences and medical science to assess associations between variables and risk of recurrence, or to predict recurrent event outcomes. Many extensions of survival models based on the Cox proportional hazards approach have been proposed to handle recurrent event data. These models can be characterized by four model components: [3]

Well-known examples of Cox-based recurrent event models are the Andersen and Gill model, [4] the Prentice, Williams and Petersen model [5] and the Wei–Lin–Weissfeld model [6]

Correlated event times within subjects

Time to recurrence is often correlated within subjects, as some subjects can be more frail to experiencing recurrences. If the correlated nature of the data is ignored, the confidence intervals (CI) for the estimated rates could be artificially narrow, which may result in false positive results.

Robust variance

It is possible to use robust 'sandwich' estimators for the variance of regression coefficients. Robust variance estimators are based on a jackknife estimate, which anticipates correlation within subjects and provides robust standard errors.

Frailty models

In frailty models, a random effect is included in the recurrent event model which describes the individual excess risk that can not be explained by the included covariates. The frailty term induces dependence among the recurrence times within subjects.

Related Research Articles

In survival analysis, the hazard ratio (HR) is the ratio of the hazard rates corresponding to the conditions characterised by two distinct levels of a treatment variable of interest. For example, in a clinical study of a drug, the treated population may die at twice the rate per unit time of the control population. The hazard ratio would be 2, indicating a higher hazard of death from the treatment.

Survival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory, reliability analysis or reliability engineering in engineering, duration analysis or duration modelling in economics, and event history analysis in sociology. Survival analysis attempts to answer certain questions, such as what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?

Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function. The physical continuum over which these functions are defined is often time, but may also be spatial location, wavelength, probability, etc. Intrinsically, functional data are infinite dimensional. The high intrinsic dimensionality of these data brings challenges for theory as well as computation, where these challenges vary with how the functional data were sampled. However, the high or infinite dimensional structure of the data is a rich source of information and there are many interesting challenges for research and data analysis.

In statistics and probability theory, a point process or point field is a collection of mathematical points randomly located on a mathematical space such as the real line or Euclidean space. Point processes can be used for spatial data analysis, which is of interest in such diverse disciplines as forestry, plant ecology, epidemiology, geography, seismology, materials science, astronomy, telecommunications, computational neuroscience, economics and others.

The birth–death process is a special case of continuous-time Markov process where the state transitions are of only two types: "births", which increase the state variable by one and "deaths", which decrease the state by one. It was introduced by William Feller. The model's name comes from a common application, the use of such models to represent the current size of a population where the transitions are literal births and deaths. Birth–death processes have many applications in demography, queueing theory, performance engineering, epidemiology, biology and other areas. They may be used, for example, to study the evolution of bacteria, the number of people with a disease within a population, or the number of customers in line at the supermarket.

In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.

<span class="mw-page-title-main">Kaplan–Meier estimator</span> Non-parametric statistic used to estimate the survival function

The Kaplan–Meier estimator, also known as the product limit estimator, is a non-parametric statistic used to estimate the survival function from lifetime data. In medical research, it is often used to measure the fraction of patients living for a certain amount of time after treatment. In other fields, Kaplan–Meier estimators may be used to measure the length of time people remain unemployed after a job loss, the time-to-failure of machine parts, or how long fleshy fruits remain on plants before they are removed by frugivores. The estimator is named after Edward L. Kaplan and Paul Meier, who each submitted similar manuscripts to the Journal of the American Statistical Association. The journal editor, John Tukey, convinced them to combine their work into one paper, which has been cited more than 34,000 times since its publication in 1958.

Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes, before some event occurs, to one or more covariates that may be associated with that quantity of time. In a proportional hazards model, the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. For example, taking a drug may halve one's hazard rate for a stroke occurring, or, changing the material from which a manufactured component is constructed may double its hazard rate for failure. Other types of survival models such as accelerated failure time models do not exhibit proportional hazards. The accelerated failure time model describes a situation where the biological or mechanical life history of an event is accelerated.

Bootstrapping is a procedure for estimating the distribution of an estimator by resampling one's data or a model estimated from the data. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

In the statistical area of survival analysis, an accelerated failure time model is a parametric model that provides an alternative to the commonly used proportional hazards models. Whereas a proportional hazards model assumes that the effect of a covariate is to multiply the hazard by some constant, an AFT model assumes that the effect of a covariate is to accelerate or decelerate the life course of a disease by some constant. There is strong basic science evidence from C. elegans experiments by Stroustrup et al. indicating that AFT models are the correct model for biological survival processes.

In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCR is used for estimating the unknown regression coefficients in a standard linear regression model.

More colloquially, a first passage time in a stochastic system, is the time taken for a state variable to reach a certain value. Understanding this metric allows one to further understand the physical system under observation, and as such has been the topic of research in very diverse fields, from economics to ecology.

In statistics and machine learning, lasso is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model. The lasso method assumes that the coefficients of the linear model are sparse, meaning that few of them are non-zero. It was originally introduced in geophysics, and later by Robert Tibshirani, who coined the term.

<span class="mw-page-title-main">Poisson distribution</span> Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.

In statistics, a zero-inflated model is a statistical model based on a zero-inflated probability distribution, i.e. a distribution that allows for frequent zero-valued observations.

<span class="mw-page-title-main">Poisson point process</span> Type of random mathematical object

In probability theory, statistics and related fields, a Poisson point process is a type of mathematical object that consists of points randomly located on a mathematical space with the essential feature that the points occur independently of one another. The process's name derives from the fact that the distribution of the number of points regions of the same size has a Poisson distribution. The process and the distribution are named after French mathematician Siméon Denis Poisson. The process itself was discovered independently and repeatedly in several settings, including experiments on radioactive decay, telephone call arrivals and actuarial science.

In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a link function. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.

In probability theory and statistics, the discrete Weibull distribution is the discrete variant of the Weibull distribution. The Discrete Weibull Distribution, first introduced by Toshio Nakagawa and Shunji Osaki, is a discrete analog of the continuous Weibull distribution, predominantly used in reliability engineering. It is particularly applicable for modeling failure data measured in discrete units like cycles or shocks. This distribution provides a versatile tool for analyzing scenarios where the timing of events is counted in distinct intervals, making it distinctively useful in fields that deal with discrete data patterns and reliability analysis. The discrete Weibull distribution is infinitely divisible only for .

Ordinal data is a categorical, statistical data type where the variables have natural, ordered categories and the distances between the categories are not known. These data exist on an ordinal scale, one of four levels of measurement described by S. S. Stevens in 1946. The ordinal scale is distinguished from the nominal scale by having a ranking. It also differs from the interval scale and ratio scale by not having category widths that represent equal increments of the underlying attribute.

Hypertabastic survival models were introduced in 2007 by Mohammad Tabatabai, Zoran Bursac, David Williams, and Karan Singh. This distribution can be used to analyze time-to-event data in biomedical and public health areas and normally called survival analysis. In engineering, the time-to-event analysis is referred to as reliability theory and in business and economics it is called duration analysis. Other fields may use different names for the same analysis. These survival models are applicable in many fields such as biomedical, behavioral science, social science, statistics, medicine, bioinformatics, medical informatics, data science especially in machine learning, computational biology, business economics, engineering, and commercial entities. They not only look at the time to event, but whether or not the event occurred. These time-to-event models can be applied in a variety of applications for instance, time after diagnosis of cancer until death, comparison of individualized treatment with standard care in cancer research, time until an individual defaults on loans, relapsed time for drug and smoking cessation, time until property sold after being put on the market, time until an individual upgrades to a new phone, time until job relocation, time until bones receive microscopic fractures when undergoing different stress levels, time from marriage until divorce, time until infection due to catheter, and time from bridge completion until first repair.

References

  1. The Statistical Analysis of Recurrent Events. Statistics for Biology and Health. 2007. doi:10.1007/978-0-387-69810-6. ISBN   978-0-387-69809-0.
  2. 1 2 Amorim, Leila DAF; Cai, Jianwen (2014-12-09). "Modelling recurrent events: a tutorial for analysis in epidemiology". International Journal of Epidemiology. 44 (1): 324–333. doi: 10.1093/ije/dyu222 . ISSN   1464-3685. PMC   4339761 . PMID   25501468.
  3. Kelly, Patrick J.; Lim, Lynette L-Y. (2000-01-15). <13::aid-sim279>3.0.co;2-5 "Survival analysis for recurrent event data: an application to childhood infectious diseases". Statistics in Medicine. 19 (1): 13–33. doi:10.1002/(sici)1097-0258(20000115)19:1<13::aid-sim279>3.0.co;2-5. ISSN   0277-6715. PMID   10623910.
  4. Andersen, P. K.; Gill, R. D. (1982-12-01). "Cox's Regression Model for Counting Processes: A Large Sample Study". The Annals of Statistics. 10 (4). doi: 10.1214/aos/1176345976 . ISSN   0090-5364.
  5. R. L. Prentice, B. J. Williams, A. V. Peterson, On the regression analysis of multivariate failure time data, Biometrika, Volume 68, Issue 2, August 1981, Pages 373–379, doi : 10.1093/biomet/68.2.373
  6. Wei, L. J.; Lin, D. Y.; Weissfeld, L. (1989). "Regression Analysis of Multivariate Incomplete Failure Time Data by Modeling Marginal Distributions". Journal of the American Statistical Association. 84 (408): 1065–1073. doi:10.1080/01621459.1989.10478873. ISSN   0162-1459.