How to Resolve the 95% Confidence Interval Mismatch in the Survival Function in R

I am about to write some functions to extract information from the survival analysis results, and I ran into a discrepancy between the extraction of the lower and upper survival times as defined by the 95% confidence interval and what is reported from the package itself as a summary.

I am using package survival

(v 2.37-7) in R (v 3.1.2).

So my problem is that sometimes my extraction of the lower and / or upper bounds of the 95% CI for the median survival time does not match what is returned when I simply evaluate the results survfit

. When I check the data I think the results are survfit

wrong, it seems to return a border value + 1 (again, sometimes). Here is some data that illustrates the problem.

# Fit my data stratified by gender of subject
survFit30Sex <- survfit(Surv(thirtyDaySuicides$daysFromInvestigation) ~ thirtyDaySuicides$Sex)

# Display median survival and confidence interval
survFit30Sex


Call: survfit(formula = Surv(thirtyDaySuicides$daysFromInvestigation) ~ 
thirtyDaySuicides$Sex)

                    records n.max n.start events median 0.95LCL 0.95UCL
thirtyDaySuicides$Sex=1      35    35      35     35     15       9      20
thirtyDaySuicides$Sex=2      93    93      93     93      9       6      13

      

survfit

defines the lower and upper bounds for Sex = 1

as 9 days and 20 days respectively, but when I check the data it seems like the upper bound should be 19 and not 20

Here is the actual data; I am just showing Sex=1

, since this is a mismatch, I have also stripped out the values ​​long before and after the critical region to make the data easier to read

Call: survfit(formula = Surv(thirtyDaySuicides$daysFromInvestigation) ~ 
    thirtyDaySuicides$Sex)

summary( thirtyDaySuicides$Sex=1 )
     time n.risk n.event survival std.err lower 95% CI upper 95% CI
    9     24       2   0.6286  0.0817      0.48725        0.811
   10     22       1   0.6000  0.0828      0.45780        0.786
   11     21       1   0.5714  0.0836      0.42890        0.761
   13     20       1   0.5429  0.0842      0.40055        0.736
   14     19       1   0.5143  0.0845      0.37272        0.710
   15     18       1   0.4857  0.0845      0.34541        0.683
   16     17       1   0.4571  0.0842      0.31861        0.656
   17     16       3   0.3714  0.0817      0.24138        0.572
   19     13       1   0.3429  0.0802      0.21673        0.542
   20     12       2   0.2857  0.0764      0.16921        0.482
   21     10       2   0.2286  0.0710      0.12437        0.420
   22      8       1   0.2000  0.0676      0.10310        0.388

      

As I understand it, the lower 95% CI for median survival time is 0.34541. Searching down the survival column until a value <0.34541 is found occurs in the series associated with survival time 19 (survival = 0.3429). Isn't this the upper limit? Why survfit

Top 20 Survival Time Returns? I have automated this algorithm, and most of the time I compare the outcome to survival, but not always.

This makes me think that there is either some strange bug in the package survival

(which I doubt) or I am finding the border wrong (most likely).

--------- Update

Unfortunately I don't know how to link the data file to my question, but the data is pretty short, so I can put it here. Note that I have eliminated stratification by Sex for simplicity, so this is just the data for women where I get the discrepancy.

It seems to me that I was approaching this the wrong way, perhaps 95% of the CI is being calculated from the standard error rather than looking how I think about it. But even with this idea, I have similar problems. More generally, the question is: how do you pull the Xth percentile survival time with its 95% CI in time units from the survivor object?

Here's the survival data via dput and then an unstructured copy below that.

structure(list(daysFromInvestigation = c(27L, 27L, 10L, 20L, 
15L, 21L, 27L, 1L, 9L, 22L, 29L, 14L, 4L, 19L, 7L, 3L, 2L, 7L, 
21L, 4L, 17L, 20L, 16L, 2L, 9L, 7L, 17L, 2L, 17L, 26L, 25L, 11L, 
3L, 13L, 27L), censored = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1)), class = "data.frame", row.names = c(NA, -35L), .Names = c("daysFromInvestigation", 
"censored"))

       daysFromInvestigation censored
1                     27        1
2                     27        1
3                     10        1
4                     20        1
5                     15        1
6                     21        1
7                     27        1
8                      1        1
9                      9        1
10                    22        1
11                    29        1
12                    14        1
13                     4        1
14                    19        1
15                     7        1
16                     3        1
17                     2        1
18                     7        1
19                    21        1
20                     4        1
21                    17        1
22                    20        1
23                    16        1
24                     2        1
25                     9        1
26                     7        1
27                    17        1
28                     2        1
29                    17        1
30                    26        1
31                    25        1
32                    11        1
33                     3        1
34                    13        1
35                    27        1

      

+3


source to share


1 answer


I have an answer to my question, at least a good rough answer, if not a better answer.

The main problem I ran into was not to use a weighted average. In my question, I was interested in the average survival time, so survival = 0.5. But my data did not lead to events at the exact median time, and therefore I had a survival probability of 14 days = 0.5143 and 15 days = 0.4857, which weighted averaged up to 15 days.

The second problem was a misunderstanding of the use of confidence intervals. To match what is reported in the survival report, to find the lower end of the median survival interval, one looks for the lower anchor vector to find the first value that is less than the median, and then calculates the time weighted average for the value just below the median and slightly above. Likewise, for the upper bound, the upper bound vector is searched to find the target interval, and then the weighted average is calculated. For my example, the upper bound for median survival occurs between 19 days and 20 days. Weighted average rounds up to 20 days.



I haven't traced the survival code deeply to confirm this is correct, but in my case I have about 50 specific survival combinations looking at different time periods and different moderators, and I agree the median output provided by the 100% survival package.

I hope that someone who comes across this question is helped by this summary, and if anyone wants to help fix / improve my understanding it is very much appreciated.

0


source







All Articles