This vignette shows how to use the package {icpack
}, using the drugusers
data set. A detailed description of the methods can be found in Putter, Gampe & Eilers (2023).
The drug users data are available after loading the package.
library(icpack)
library(survival)
There are three covariates (see the help file). Here is a summary of their frequencies and distribution.
table(drugusers$period)
##
## 1972-1980 1981-1985 1986-1991 1992-1997
## 173 374 306 87
table(drugusers$gender)
##
## male female
## 759 181
summary(drugusers$age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.00 17.00 19.00 20.26 23.00 49.00
The time-to-event data are interval censored. Here is a first fit. The output of the function icfit
is an object of class icfit
.
icf <- icfit(Surv(left, right, type = "interval2") ~ period + gender + age, data = drugusers)
Information about the estimated coefficients and their significance is obtained by printing the object.
print(icf)
## Object of class 'icfit'
## Call:
## icfit(formula = Surv(left, right, type = "interval2") ~ period +
## gender + age, data = drugusers)
##
## Bins summary: tmin = 0, tmax = 241.4, number of bins = 100, bin width = 2.414
## P-splines summary: number of segments = 20, degree = 3, penalty order = 2
##
## Parameter estimates:
## coef SE HR lower upper pvalue
## period1981-1985 0.039268 0.114076 1.040050 0.831672 1.301 0.7307
## period1986-1991 -0.326614 0.129151 0.721362 0.560041 0.929 0.0114 *
## period1992-1997 -0.576174 0.270656 0.562044 0.330665 0.955 0.0333 *
## genderfemale 0.224810 0.109141 1.252084 1.010956 1.551 0.0394 *
## age -0.004659 0.010490 0.995352 0.975097 1.016 0.6569
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Effective baseline dimension: 1.043, log-likelihood: -2561, AIC: 5124
## Smoothness parameter lambda: 1e+06
## Number of iterations: 50
## n = 940, number of events = 597
The print method also gives basic information on the bins and splines used, as well as the effective dimension, log-likelihood, AIC, tuing parameter, number of iterations, number of subjects and evvents. A summary method is also available, which gives additional information on the iterations and on computation time.
A plot of the baseline hazard is obtained by plotting the object. The default option is to add shaded confidence intervals; they can be switched off if desired.
plot(icf, title = 'Drug users')
Plots of the cumulative hazard and the survival function are also available.
plot(icf, type = 'cumhazard', title = 'Drug users')
plot(icf, type = 'survival', title = 'Drug users')
The plot function above plots the baseline hazard. Suppose we are interested in the hazards and survival functions for a male aged 25 who started intravenous drug use in each of the four periods. We can use predict
with a newdata
object to obtain these hazards and survival functions.
newd <- data.frame(period=1:4, gender=1, age=25)
newd$period <- factor(newd$period, levels=1:4, labels=levels(drugusers$period))
newd$gender <- factor(newd$gender, levels=1:2, labels=levels(drugusers$gender))
picf <- predict(icf, newdata=newd)
The result is an object of class predict.icfit
, which is (here) a list with 4 items (one for each subject), each containing a data frame with time points and hazards, cumulative hazards, survival functions at these time points with associated standard errors and confidence bounds. The summary
method can be used to extract the hazards etc at specified time points.
summary(picf, times=seq(0, 60, by=12))
## [[1]]
## time haz sehaz lowerhaz upperhaz Haz seHaz
## 1 0 0.02076260 0.002615486 0.016220153 0.02657715 0.0000000 0.00000000
## 2 12 0.01854103 0.002271171 0.014583661 0.02357226 0.2355676 0.02923304
## 3 24 0.01655717 0.001993712 0.013076447 0.02096441 0.4459306 0.05467113
## 4 36 0.01478559 0.001771208 0.011691512 0.01869849 0.6337858 0.07697656
## 5 48 0.01320357 0.001592985 0.010423064 0.01672581 0.8015416 0.09669178
## 6 60 0.01179081 0.001449637 0.009265986 0.01500361 0.9513485 0.11425790
## lowerHaz upperHaz Surv seSurv lowerSurv upperSurv
## 1 0.0000000 0.0000000 1.0000000 0.00000000 1.0000000 1.0000000
## 2 0.1782718 0.2928633 0.7901262 0.02309683 0.7461298 0.8367173
## 3 0.3387771 0.5530840 0.6402324 0.03500122 0.5751790 0.7126440
## 4 0.4829145 0.7846571 0.5305826 0.04084164 0.4562805 0.6169848
## 5 0.6120292 0.9910540 0.4486390 0.04337919 0.3711880 0.5422510
## 6 0.7274071 1.1752898 0.3862211 0.04412852 0.3087310 0.4831611
##
## [[2]]
## time haz sehaz lowerhaz upperhaz Haz seHaz
## 1 0 0.02159413 0.002039439 0.01794505 0.02598524 0.0000000 0.00000000
## 2 12 0.01928359 0.001751007 0.01613973 0.02303985 0.2450019 0.02264613
## 3 24 0.01722028 0.001533398 0.01446252 0.02050389 0.4637899 0.04216152
## 4 36 0.01537774 0.001372813 0.01290933 0.01831815 0.6591686 0.05921658
## 5 48 0.01373236 0.001255969 0.01147873 0.01642845 0.8336430 0.07434621
## 6 60 0.01226303 0.001170780 0.01017025 0.01478645 0.9894496 0.08796911
## lowerHaz upperHaz Surv seSurv lowerSurv upperSurv
## 1 0.0000000 0.0000000 1.0000000 0.00000000 1.0000000 1.0000000
## 2 0.2006163 0.2893875 0.7827072 0.01772453 0.7487276 0.8182293
## 3 0.3811548 0.5464250 0.6289001 0.02651460 0.5790218 0.6830754
## 4 0.5431063 0.7752310 0.5172847 0.03063123 0.4606016 0.5809436
## 5 0.6879271 0.9793589 0.4344660 0.03230050 0.3755546 0.5026187
## 6 0.8170333 1.1618659 0.3717826 0.03270515 0.3129033 0.4417413
##
## [[3]]
## time haz sehaz lowerhaz upperhaz Haz seHaz
## 1 0 0.014977353 0.0015225662 0.012271671 0.01827959 0.0000000 0.00000000
## 2 12 0.013374800 0.0013212146 0.011020525 0.01623201 0.1699295 0.01699864
## 3 24 0.011943717 0.0011666245 0.009862723 0.01446379 0.3216774 0.03180116
## 4 36 0.010665763 0.0010493592 0.008795202 0.01293415 0.4571891 0.04485204
## 5 48 0.009524553 0.0009606996 0.007816056 0.01160651 0.5782018 0.05650514
## 6 60 0.008505446 0.0008929752 0.006923579 0.01044873 0.6862668 0.06703779
## lowerHaz upperHaz Surv seSurv lowerSurv upperSurv
## 1 0.0000000 0.0000000 1.0000000 0.00000000 1.0000000 1.0000000
## 2 0.1366128 0.2032463 0.8437265 0.01434177 0.8160803 0.8723094
## 3 0.2593483 0.3840066 0.7249345 0.02305328 0.6811302 0.7715560
## 4 0.3692807 0.5450975 0.6330627 0.02839376 0.5797879 0.6912328
## 5 0.4674538 0.6889498 0.5609075 0.03169388 0.5021049 0.6265968
## 6 0.5548751 0.8176585 0.5034529 0.03375020 0.4414652 0.5741446
##
## [[4]]
## time haz sehaz lowerhaz upperhaz Haz seHaz
## 1 0 0.011669501 0.002845395 0.007236062 0.01881925 0.0000000 0.00000000
## 2 12 0.010420883 0.002536051 0.006467762 0.01679017 0.1323994 0.03224049
## 3 24 0.009305865 0.002267178 0.005772706 0.01500148 0.2506328 0.06099565
## 4 36 0.008310156 0.002032865 0.005144992 0.01342251 0.3562157 0.08669707
## 5 48 0.007420990 0.001828060 0.004579098 0.01202662 0.4505019 0.10971568
## 6 60 0.006626960 0.001648454 0.004069848 0.01079072 0.5347000 0.13037079
## lowerHaz upperHaz Surv seSurv lowerSurv upperSurv
## 1 0.00000000 0.0000000 1.0000000 0.00000000 1.0000000 1.0000000
## 2 0.06920922 0.1955896 0.8759924 0.02824176 0.8223525 0.9331318
## 3 0.13108348 0.3701820 0.7783098 0.04747272 0.6906117 0.8771450
## 4 0.18629259 0.5261389 0.7003229 0.06071527 0.5908846 0.8300311
## 5 0.23546315 0.6655407 0.6373092 0.06992232 0.5139973 0.7902051
## 6 0.27917799 0.7902221 0.5858456 0.07637686 0.4537450 0.7564055
Finally, the hazards or cumulative hazards or survival functions can be plotted.
library(gridExtra)
plot(picf, type="surv", ylim=c(0, 1),
xlab = "Months since start intravenous drug use",
title = levels(drugusers$period))
## TableGrob (2 x 2) "arrange": 4 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
## 3 3 (2-2,1-1) arrange gtable[layout]
## 4 4 (2-2,2-2) arrange gtable[layout]
Putter, H., Gampe, J., Eilers, P. H. (2023). Smooth hazard regression for interval censored data. Submitted for publication.