Competitive risk analysis refers to a special type of survival analysis that aims to correctly estimate the marginal probability of an event in the presence of competing events. Conventional methods of describing the survival process, such as the Kaplan-Meier product limit method, are not designed to take into account the competing nature of multiple causes for the same event, so they tend to make inaccurate estimates when analyzing the marginal probability for cause-specific events. As a workaround, the cumulative incidence function (CIF) has been proposed to solve this particular problem by estimating the marginal probability of a given event as a function of its cause-specific probability and overall survival probability. This method combines the idea of the product limit approach and the idea of competing causal paths, which provides a more interpretable estimate of the survival experience of multiple competing events for a group of subjects. Like many analyzes, competitive risk analysis involves a non-parametric method that involves using a modified chi-square test to compare CIF curves between groups and a parametric approach that models the CIF based on a sub-distributive hazard function.
1. What is Competitive Event and Competitive Risk?
The standard survival data assume that subjects experience only one type of event, such as death from breast cancer, during follow-up. On the contrary, in real life, subjects may possibly experience more than one type of a particular event. For example, if mortality is of research interest, our observations - elderly patients in an oncology department - could potentially die of a heart attack or breast cancer, or even a traffic accident. If only one of these different types of events can occur, we refer to those events as competing events in the sense that they compete with each other to deliver the event of interest, and the occurrence of one type of event prevents the others from occurring. Hence, we refer to the likelihood of these events as competing risks in the sense that the likelihood of each competing event is somehow regulated by the other competing events, which has a suitable interpretation to describe the survival process determined by several types of events.
Consider the following examples to better understand the competitive event scenario:
1) A patient can die of breast cancer or stroke, but not both;
3) A soldier can die in combat or in a traffic accident.
In the examples above, there is more than one way in which a subject can fail, but failure, either death or infection, can only occur once for each subject (regardless of any recurring event). Therefore, the errors caused by different paths are mutually exclusive and are therefore known as competing events. The analysis of such data requires special considerations.
2. Why shouldn't we use the Kaplan Meier Estimator?
As with standard survival analysis, the analytical goal for competing event data is to estimate the likelihood of an event occurring over time from among the many possible events so that subjects fail at competing events. In the examples above, we may want to estimate the breast cancer death rate over time and whether the breast cancer death rate differs between two or more treatment groups with or without covariate adjustment. In standard survival analysis, these questions can be answered using the Kaplan-Meier product limit method to obtain the probability of occurrence over time and the Cox proportional hazard model to predict such probability. Similarly, with competing event data, the typical approach to competing event data involves using a KM estimator to estimate the likelihood of each type of event separately, while the other competing events, in addition to those censored from loss to follow-up or withdrawal, treated as censored. This method of estimating the likelihood of an event is known as the cause-specific hazard function, which is expressed mathematically as:
The random variable Tc denotes the time to failure of event type c, therefore the cause-specific hazard function hc (t) indicates the current failure rate at time t of event type c if there is no failure of event c at time t.
Accordingly, there is a cause-specific risk model based on the Cox proportional risk model in the form:
This proportional hazard model of event type c at time t allows the effects of the covariates to differ depending on the event type, as the subscript beta coefficient suggests.
Using these methods, one can estimate the failure rate for each of the competing events separately. In our example of breast cancer mortality, if death from breast cancer is of concern, death from heart attack and all other causes should be treated as censored in addition to conventional censored observations. This would allow us to estimate the cause-specific risk for the breast cancer mortality rate and adapt a cause-specific risk model for the breast cancer mortality rate. The same procedure can apply to death from heart attack if it becomes an event of concern.
A major caveat with the root-cause approach is that it still requires independent censorship for subjects who were not actually censored but failed on competing events, such as standard censorship such as losses in follow-up. Assuming this assumption is correct, if one focuses on the cause-specific mortality rate from breast cancer, then every censored subject would have the same mortality rate from breast cancer at time t, regardless of whether the reason for the censorship was either cardiovascular disease or a other cause of death, or loss in tracking. This assumption is consistent with saying that competing events are independent, which is the basis for the validity of the KM analysis type. However, there is no way to explicitly test whether this assumption is true for a particular data set. For example, we can never determine whether a patient who died from a heart attack would have died from breast cancer if they had not died from a heart attack, since the possible death from cancer in patients who have died from a heart attack is not observed can be. Therefore, estimates from the cause-specific hazard function have no informative interpretation as they are heavily based on the assumption of independence censoring.
3. What is the solution?
Currently, the most popular alternative approach to analyzing competing event data is called the cumulative incidence function (CIF), which estimates the marginal probability for each competing event. The marginal probability is defined as the probability of subjects who actually developed the event of interest, regardless of whether they were censored or failed other competing events. In the simplest case, when only one event is of interest, the CIF should be the estimate (1-KM). However, for competing events, the marginal probability of each competing event can be estimated from the CIF, which is derived from the causal hazard, as we discussed earlier. By definition, the marginal probability does not assume the independence of competing events, and it has an interpretation that is more relevant to clinicians in cost-benefit analyzes where the risk probability is used to assess treatment benefit.
3.1 Cumulative Incidence Function (CIF)
Building a CIF is as simple as estimating KM. It's a product of two estimates:
1) The estimate of the risk at the ordered failure time tf for the event type of interest, expressed as:
Here, mcf denotes the number of events for risk c at time tf and nf the number of subjects at this time.
2) The estimate of the overall probability of survival of the previous time (td-1):
where S (t) denotes the overall survival function and not the cause-specific survival function. The reason we need to consider overall survival is simple but important: a subject must have survived all other competing events to fail event type c at time tf.
With these two estimates we can calculate the estimated probability of occurrence of failure of event type c at time tf as follows:
The equation is self-explanatory: the failure probability of event type c at time tf is simply the product of the survival of the previous periods and the cause-specific hazard at time tf.
The CIF for event type c at time tf is then the cumulative sum up to time tf (i.e. from f ’= 1 to f’ = f) of these incidence probabilities over all downtimes of event type c, which is expressed as follows:
As mentioned earlier, the CIF corresponds to the 1 KM estimator if there is no competing event. When there is a competing event, the CIF differs from the 1 KM estimator in that it uses the overall survival function S (t), which counts failures from competing events in addition to the event of interest, while the 1 KM estimator uses the event type specific Survival function Sc (t), which treats failures of competing events as censored.
By using the overall survival function, CIF circumvents the need to make unverifiable assumptions about the independence of censorship in competing events. Since S (t) is always less than Sc (t), the CIF in competing event data is always less than 1-KM estimates, which means that 1-KM tends to overestimate the probability of failure of the event type of interest. Another benefit is that the CIF of each competing event is by definition a fraction of the S (t), so the sum of each individual risk for all competing events should equal the total risk. This characteristic of CIF makes it possible to analyze the overall risk, which has more practical interpretations.
3.2 Nonparametric Analysis
Gray (1988) suggested a non-parametric test to compare two or more CIFs. The test is analogous to the log rank test for comparing KM curves using a modified chi-square test statistic. This test does not require the independent censorship assumption. Please read the original article for details on how this test statistic is built.
3.3 Parametric Analysis
Fine and Gray (1999) proposed a proportional hazards model that aims to model the CIF with covariates by treating the CIF curve as a subdivision function. The sub-distribution function is analogous to the Cox proportional hazard model, except that it models a hazard function (known as the sub-distribution hazard) derived from a CIF. The hazard function fine and gray subdistribution for event type c can be expressed as follows:
The above function estimates the hazard rate for event type c at time t based on the rate of risk that remains at time t after considering all previously occurring event types, including competing events.
The CIF-based proportional hazard model is then defined as follows:
This model fulfilled the proportional hazard assumption for the subpopulation risk to be modeled, which means that the general hazard ratio formula is essentially the same as the Cox model, with the exception of a minor cosmetic difference that the betas in the Cox model through gammas in fine and. to be replaced by Gray's model. Thus, we should interpret the gammas in a similar way to the betas estimated by a Cox model, except that it estimates the effect of certain covariates in the presence of competing events. The fine and gray models can also be expanded to take time-dependent covariates into account.
Today, analysis of competing data using either non-parametric or parametric methods is available in major statistics packages including R, STATA and SAS.
Textbooks & Chapters
J. D. Kalbfleisch und Ross L. Prentice, „Competing Risks and Multistate Models“, in The Statistical Analysis of Failure Time Data (Hoboken, N.J.: J. Wiley, 2002), S. 247-77.
The idea of CIF was first suggested in this book. It gives you a compelling reason why you cannot analyze competing data using the Kaplan-Meier method.
David G. Kleinbaum und Mitchel Klein, „Competing Risks Survival Analysis“, in Survival Analysis: A Self-Learning Text (New York: Springer, 2012), S. 425-95.
This whole page was borrowed a lot from this great chapter by Kleinbaum & Klein, I can only recommend it! PS In general, I can warmly recommend all of Kleinbaum's statistical textbooks.
Bob Gray (2013). cmprsk: Subdivision Analysis of Competing Risks. R package version 2.2-6. http://CRAN.R-project.org/package=cmprsk
This is the R package cmprsk user manual, it provides a user-friendly guide on how to implement these functions.
stcrreg - Competing Risk Regression, StataCorp. 2013. Stata 13 Base Reference Guide. College Station, TX: Stata Press.
This is the STATA user manual, I know very little about it but seems to be informative for experienced STATA users.
Proportional Subdistribution Hazards Model for Competing-Risks Data, SAS Institute Inc. 2013. SAS/STAT® 13.1 User’s Guide: S.5991-5995. Cary, NC: SAS Institute Inc.
This is one of those SAS forum papers that describes how competing risks are analyzed with PROC PHREG in SAS. Very detailed and useful.
Prentice, Ross L. et al. The analysis of downtime with competing risks. Biometrics (1978): 541-554.
This paper is very similar to the book chapter of Kalbfleisch and Prentice, probably the same paper.
Gray, Robert J. A class of K-sample tests used to compare the cumulative incidence of competing risk. The Annals of Statistics (1988): 1141-1154.
In this article, the modified chi-square test was suggested to compare two or more CIFs. Epos!
Well, Jason P. and Robert J. Gray. A proportional hazards model for sub-distributing a competing risk. American Statistical Association Journal 94,446 (1999): 496-509.
In this paper, the sub-distributive hazard function and proportional hazard model for CIF were proposed. Epos!
Latouche, Aurélien, et al. Incorrectly specified regression model for the risk subdivision of a competing risk. Statistics in Medicine 26.5 (2007): 965-974.
This paper criticized the abuse of the sub-distributive hazard function in published papers. It's kind of helpful as it pointed out some common mistakes when using this method.
Lau, Bryan, Stephen R. Cole, and Stephen J. Gange. Competing risk regression models for epidemiological data. American Journal of Epidemiology 170.2 (2009): 244-256.
This paper provides an excellent summary of the CIF and competing risk regression with descriptive graphics. It also has an application of this method in real data. Very useful for epidemiologists.
Zhou, Bingqing, et al. Competing Risk Regression for Stratified Data. Biometrics 67.2 (2011): 661-670.
The paper expanded Gray's methods of analyzing layered data.
Zhou, Bingqing, et al. Competing Risk Regression for Clustered Data. Biostatistics 13.3 (2012): 371–383.
The paper expanded Gray's methods of analyzing clustered data.
Andersen, Per Kragh et al. Competing Risks in Epidemiology: Opportunities and Pitfalls. International Journal of Epidemiology 41.3 (2012): 861-870.
A good summary and review of Gray's methods.
Wolbers, Marcel et al. Competitive Risk Prediction Models: Methods and Application to Coronary Risk Prediction. Epidemiology 20.4 (2009): 555-561.
This paper compared Fine and Gray's model to the standard Cox model in analyzing mortality from coronary artery disease and showed that the Cox model overestimated the danger.
Wolbers, Marcel et al. Competitive Risk Analysis: Goals and Approaches. European Heart Journal (2014): ehu131.
This paper is also from Wolbers et al. however, provides a more detailed review of Gray's method and an example analysis of the effectiveness of implantable cardioverter-defibrillators.
Grover, Gurprit, Prafulla Kumar Swain and Vajala Ravi. A competing risk approach with censoring to estimate the probability of death of HIV / AIDS patients receiving antiretroviral therapy in the presence of covariates. Statistics Research Letters 3.1 (2014).
A classic application in HIV treatment research.
Dignam, James J., Qiang Zhang, and Masha Kocherginsky. The use and interpretation of regression models for competing risks. Clinical Cancer Research 8/18 (2012): 2301-2308.
In this article, sample data from a radiation therapy oncology group clinical trial for prostate cancer was used to show that different hazard models can lead to very different conclusions about the same predictor.
Scrucca, L., A. Santucci, and F. Aversa. Competitive Risk Analysis Using R: A Simple Guide for Clinicians. Bone Marrow Transplantation 40.4 (2007): 381-387.
A very nice tutorial on estimating CIF in R for non-statistical people.
Scrucca, L., A. Santucci, and F. Aversa. Regression Modeling Competing Risks Using R: A In-Depth Clinician's Guide. Bone Marrow Transplantation 45.9 (2010): 1388-1395.
A very nice tutorial on adjusting competing risk regression in R for non-statistical people.
Scheike, Thomas H. and Mei-Jie Zhang. Analysis of competing risk data with the R timereg package. Journal of Statistics Software 38.2 (2011).
An introduction to another R package timereg as the cmprsk package for competitive data analysis.
Coviello, Vincenzo and May Boggess. Estimating the cumulative incidence at competing risks. STATA Journal 4 (2004): 103-112.
Lin, Guixian, Ying So, and Gordon Johnston. Analysis of survival data with competing risks with SAS software. Global SAS forum. vol. 2102. 2012.
Sally R. Hinchlie. Competing Risks - What, Why, When, and How? Survival Analysis for Young Scientists, Department of Health Sciences, University of Leicester, 2012
A great talk on competitive risk analysis with lots of graphs to help you understand the method.
Bernhard Haller. Analysis of competing risks data and simulation of data following predened subdistribution hazards, Research Seminar, Institute for Medical Statistics and Epidemiology, Technical University of Munich, 2013
Teach you how to simulate competing data that is a little difficult to keep track of
Roberto G. Gutiérrez. Competing Risk Regression, 2009 Australian and New Zealand Stata Users Group Meeting. StataCorp LP, 2009
A talk about using STATA to analyze competing risk data.
Zaixing Shi, Competing Risk Analysis - Epi VI presentation, presentation of the lecture in the spring semester 2014.
These are my presentation slides!