Survival analysis


Stata's survival analysis capabilities include data summary, Kaplan–Meier product-limit survivor function estimates, testing equality of survivor functions, Nelson–Aalen estimators, incidence rate comparisons, estimating Cox proportional hazards models, estimating exponential, Weibull, Gompertz, lognormal, log-logistic and gamma models, and more. All of the above support stratified analysis (up to 5 variables), time-varying data, left truncation (delayed entry), gaps (exits and reentry), right censoring, and recurring events. In the case of model estimation, both conventional and robust estimates of variance are available. In the case of the Cox model, tests of the proportional hazards assumption are available. Also available for all estimation models are Cox–Snell, martingale, Schoenfeld, deviance, and score residuals.

The input data for the survival analysis commands are duration records: each observation records a span of time over which the subject was observed along with an outcome at the end of the period. There can be one record per subject or, if covariates vary over time, multiple records.

One can obtain simple descriptions:

. stdes

                                   |-------------- per subject --------------|
Category                   total        mean         min     median        max
------------------------------------------------------------------------------
no. of subjects               48   
no. of records                48           1           1          1          1

(first) entry time                         0           0          0          0
(final) exit time                       15.5           1       12.5         39

subjects with gap              0   
time on gap if gap             0   
time at risk                 744        15.5           1       12.5         39

failures                      31    .6458333           0          1          1
------------------------------------------------------------------------------

. stsum, by(dose)


         |               incidence       no. of    |------ Survival time -----|
dose     | time at risk     rate        subjects        25%       50%       75%
---------+---------------------------------------------------------------------
 Control |          180   .1055556            20          4         8        12
    5 mg |          209   .0287081            14         13        22        23
   10 mg |          355   .0169014            14         25        33         .
---------+---------------------------------------------------------------------
   total |          744   .0416667            48          8        17        33
One can compare the survivor functions:
. sts list, by(dose) compare

                 -----Survivor Function------
dose            Control       5 mg      10 mg
---------------------------------------------
time       1     0.9000     1.0000     1.0000
           5     0.6000     1.0000     1.0000
           9     0.4500     0.8512     0.9286
          13     0.2250     0.7448     0.8571
          17     0.1125     0.6207     0.8571
          21     0.1125     0.6207     0.8571
          25          .     0.2069     0.6857
          29          .     0.2069     0.5878
          33          .          .     0.4408
          37          .          .     0.4408
          41          .          .          .
---------------------------------------------
Or review the complete life table:

. sts list, by(dose)

           Beg.          Net            Survivor      Std.
  Time    Total   Fail   Lost           Function     Error     [95% Conf. Int.]
-------------------------------------------------------------------------------
Control 
     1       20      2      0             0.9000    0.0671     0.6560    0.9740
     2       18      1      0             0.8500    0.0798     0.6038    0.9490
     3       17      1      0             0.8000    0.0894     0.5511    0.9198
     4       16      2      0             0.7000    0.1025     0.4505    0.8525
(output omitted)
    23        1      1      0             0.0000         .          .         .
5 mg 
     6       14      1      1             0.9286    0.0688     0.5908    0.9896
     7       12      1      0             0.8512    0.0973     0.5234    0.9607
     9       11      0      1             0.8512    0.0973     0.5234    0.9607
(output omitted)
    32        1      0      1             0.2069    0.1769     0.0104    0.5804
10 mg 
     6       14      1      0             0.9286    0.0688     0.5908    0.9896
    10       13      1      0             0.8571    0.0935     0.5394    0.9622
    17       12      0      1             0.8571    0.0935     0.5394    0.9622
(output omitted)
    39        1      0      1             0.4408    0.1673     0.1312    0.7187
-------------------------------------------------------------------------------
One can as easily obtain a graph

. sts graph, by(dose)
or test the equality of the survivor functions:

. sts test dose

Log-rank test for equality of survivor functions
------------------------------------------------

        |  Events
dose    |  observed       expected
--------+-------------------------
Control |        19           7.25
5 mg    |         6           8.20
10 mg   |         6          15.56
--------+-------------------------
Total   |        31          31.00

              chi2(2) =      30.19
              Pr>chi2 =     0.0000
We used the log-rank test, but we could have specified the Wilcoxon test. Both tests are available in stratified forms.

Stata can estimate Cox proportional hazards models, exponential, Weibull, Gompertz, lognormal, log-logistic and gamma models. In the case of the Cox proportional hazards model

  • Simple and stratified estimates are available
  • Right censoring and left truncation (delayed entry) are allowed and, in fact, so are intermediary gaps
  • Conventional and robust estimates of variance are available (Lin and Wei 1989)
The same is true of the parametric models except that they do not compute stratified estimates for which there is no statistical counterpart. For exponential and Weibull models, estimates are available in either the accelerated-time or hazard metric.

Here we will focus on the Cox proportional hazards model. Here is a model estimated on our dose-age data that we described above:

. stcox age dose5 dose10

Iteration 0:   log likelihood = -99.911448
Iteration 1:   log likelihood = -82.331523
Iteration 2:   log likelihood = -81.676487
Iteration 3:   log likelihood = -81.652584
Iteration 4:   log likelihood = -81.652567
Refining estimates:
Iteration 0:   log likelihood = -81.652567

Cox regression -- Breslow method for ties

No. of subjects =           48                     Number of obs   =        48
No. of failures =           31
Time at risk    =          744
                                                   LR chi2(3)      =     36.52
Log likelihood  =   -81.652567                     Prob > chi2     =    0.0000

------------------------------------------------------------------------------
      _t |
      _d | Haz. Ratio   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     age |   1.118334   .0409074      3.058   0.002       1.040963    1.201455
   dose5 |   .1805839   .0892742     -3.462   0.001       .0685292    .4758636
  dose10 |   .0520066    .034103     -4.508   0.000       .0143843    .1880305
------------------------------------------------------------------------------
By default, stcox uses Breslow's method for handling ties and presents results as hazard ratios — that is, exponentiated coefficients — but we can see the underlying coefficients if we wish:

. stcox, nohr

No. of subjects =           48                     Number of obs   =        48
No. of failures =           31
Time at risk    =          744
                                                   LR chi2(3)      =     36.52
Log likelihood  =   -81.652567                     Prob > chi2     =    0.0000

------------------------------------------------------------------------------
      _t |
      _d | Haz. Ratio   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     age |     .11184   .0365789      3.058   0.002       .0401467    .1835333
   dose5 |   -1.71156   .4943639     -3.462   0.001      -2.680495   -.7426241
  dose10 |  -2.956384   .6557432     -4.508   0.000      -4.241617   -1.671151
------------------------------------------------------------------------------
stcox can also handle ties using Efron's, the exact partial likelihood or the exact marginal likelihood methods.

stcox, like all Stata estimation commands, can redisplay previous estimation results.

Based on these estimates, we could test that the effect of dose5 (the 5 mg dose) is the same as dose10 using the standard Stata post-estimation commands:

. test dose5 = dose10

 ( 1)  dose5 - dose10 = 0.0

           chi2(  1) =    3.31
         Prob > chi2 =    0.0690
We can as easily estimate the model with robust estimates of variance and use the robust estimates as the basis for our test:

. stcox age dose5 dose10, robust

Iteration 0:   log likelihood = -99.911448
Iteration 1:   log likelihood = -82.331523
Iteration 2:   log likelihood = -81.676487
Iteration 3:   log likelihood = -81.652584
Iteration 4:   log likelihood = -81.652567
Refining estimates:
Iteration 0:   log likelihood = -81.652567

Cox regression — Breslow method for ties

No. of subjects =           48                     Number of obs   =        48
No. of failures =           31
Time at risk    =          744
                                                   Wald chi2(3)    =     32.39
Log likelihood  =   -81.652567                     Prob > chi2     =    0.0000

------------------------------------------------------------------------------
      _t |               Robust
      _d | Haz. Ratio   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     age |   1.118334   .0327643      3.817   0.000       1.055926     1.18443
   dose5 |   .1805839   .0773571     -3.995   0.000       .0779917    .4181288
  dose10 |   .0520066   .0349232     -4.403   0.000       .0139465    .1939333
------------------------------------------------------------------------------

. test dose5 = dose10

 ( 1)  dose5 - dose10 = 0.0

           chi2(  1) =    3.12
         Prob > chi2 =    0.0773
stcox will also let us obtain
  • the baseline hazard (and stratified baseline hazard if estimates are stratified)
  • the cumulative baseline hazard
  • the baseline survivor function (and stratified baseline survivor function if estimates are stratified)
  • the Cox–Snell residuals
  • the martingale residuals
  • the Schoenfeld residuals
  • the deviance residuals
  • the efficient score residuals
The data used above has censored observations but no time-varying covariates and no left truncation. The failure event — death — occurs only once. Had the data included time-varying covariates, left truncation, or recurring failure events, however, nothing would have changed in terms of what we type and, importantly, all the same features are available post estimation regardless of the characteristics of the data.

References

Lin, D. Y. and L. J. Wei. 1989.
The robust inference of the Cox proportional hazards model. Journal of the American Statistical Association 84: 1074--1078.


© Copyright 2005 Stata Corporation.