Lab 6: Survival Analysis

Author

Introduction

All questions relate to the question of whether treatment (DES) and index (a measure of disease severity) are prognostic of survival time for prostate cancer. The data is posted on the class web pages (prostaticcancer.dta, prostaticcancer.dat). We will consider four variables in this lab:

  • Time: Time to death or censoring (months)
  • Status: Indicator of death (status=0 if subject censored, status=1 if subject died)
  • Treatment: Two treatments are considered. Treatment 1 is placebo, Treatment 2 is DES
  • Index: Gleason index, a measure of disease severity

Perform analyses to determine whether the distribution of time to relapse differs across groups defined by treatment and index.

R functions

Some useful R functions

  • Surv
  • survfit
  • survdiff
  • ggsurvplot
  • coxph

Setup

Pacakges

Loading required package: Hmisc

Attaching package: 'Hmisc'
The following objects are masked from 'package:base':

    format.pval, units
Loading required package: ggpubr

Attaching package: 'survminer'
The following object is masked from 'package:survival':

    myeloma

Load data

cancer <- stata.get("https://biostat.app.vumc.org/wiki/pub/Main/CourseBios312/prostaticcancer.dta")
names(cancer)
[1] "patient"   "treatment" "time"      "status"    "age"       "shb"      
[7] "size"      "index"    
summary(cancer)
    patient        treatment          time           status      
 Min.   : 1.00   Min.   :1.000   Min.   : 2.00   Min.   :0.0000  
 1st Qu.:10.25   1st Qu.:1.000   1st Qu.:42.25   1st Qu.:0.0000  
 Median :19.50   Median :2.000   Median :56.00   Median :0.0000  
 Mean   :19.50   Mean   :1.526   Mean   :49.74   Mean   :0.1579  
 3rd Qu.:28.75   3rd Qu.:2.000   3rd Qu.:65.00   3rd Qu.:0.0000  
 Max.   :38.00   Max.   :2.000   Max.   :70.00   Max.   :1.0000  
      age             shb             size           index       
 Min.   :51.00   Min.   :10.70   Min.   : 2.00   Min.   : 6.000  
 1st Qu.:65.00   1st Qu.:13.43   1st Qu.: 4.00   1st Qu.: 8.000  
 Median :71.00   Median :13.85   Median : 7.50   Median : 9.000  
 Mean   :68.63   Mean   :13.94   Mean   :10.47   Mean   : 9.132  
 3rd Qu.:73.00   3rd Qu.:14.68   3rd Qu.:13.75   3rd Qu.:10.000  
 Max.   :77.00   Max.   :16.40   Max.   :37.00   Max.   :12.000  

Questions

1. Before looking at the data, we should decide if we are going to robust standard errors or not

  • What are the benefits of using robust standard errors over classical PH regression?

  • What are the benefits of using classical standard error over robust standard errors?

2. Provide suitable descriptive statistics regarding the distribution of time to relapse according to treatment status.

  • Create and plot Kaplan-Meier estimate of the survival curves by treatment

  • What is the (approximate) survival estimate at 30 months for each treatment arm? What is it at 60 months? Obtain the estimate and 95% confidence interval for survival at 30 months. Interpret the estimate and CI.

  • What is the estimated median survival time in each treatment arm? Provide a 95% confidence interval for these estimates. Interpret the estimate and CI.

3. List the variables time and status for the subjects with treatment==2

  • Be able to interpret what each row indicates in terms of event/censoring time and event/censoring indicator. That is, which observations are events and which are censored? When did each occur?

  • Calculate by hand the Kaplan-Meier estimate of survivorship for the first few event times. Compare to the plot. At which time points does survivorship decrease? At which time points does it stay the same?

4. Perform analysis comparing the instantaneous risk of relapse across groups defined treatment status using the following approaches. Compare the inference obtained from each approach.

  • The log rank test (this is a score test)

  • Cox proportional hazard regression using classical standard errors. This will give both a Wald and Likelihood Ratio test

  • Cox proportional hazard regression using robust standard errors. This will give a Wald test

  • When you “compare the inference obtained…”, interpret the hazard ratio and corresponding 95% confidence intervals.

5. Perform a proportional hazards regression comparing the instantaneous risk of relapse across groups defined by Gleason index (index). Compare the inference obtained from each approach.

  • Cox proportional hazard regression using classical standard errors.

  • Cox proportional hazard regression using robust standard errors.