Correlation in R, when I do "pairwise.complet.obs" I get the error "standard deviation is 0"

Question

Correlation in R, when I do "pairwise.complet.obs" I get the error "standard deviation is 0"

I'm trying to do some group correlation and have used this very helpful thread:

however, there are some NA values in my 2 variables and in my groups, so I get NA as the result for each group

so i tried this:

> j <- lapply(split(HTNPS, HTNPS$callcat), function(HTNPS){cor(HTNPS$NPS_int, 
HTNPS$holdtime_int,use="pairwise.complete.obs", method = "spearman")})

but then, although I get more reasonable numbers, I get this warning: In cor (HTNPS $ NPS_int, HTNPS $ holdtime_int, use = "pairwise.complete.obs",: standard deviation is zero

As requested, I did dput (head (HTNPS, 40) for the respective columns

> dput(head(HTNPS[,20:24], 40))
structure(list(holdtime_int = structure(c(6, 11, 7, 7, 5, 7, 
6, 5, 3, 6, 3, 5, 6, 105, 7, 6, 353, 5, 6, 9, 6, 6, 12, 5, 5, 
5, 249, 5, 7, 11, 5, 7, 5, 290, 6, 6, 6, 6, 5, 6), .Dim = c(40L, 
1L)), NPS_int = structure(c(1, NA, NA, 3, NA, 1, 1, 2, NA, NA, 
NA, NA, 3, 2, 1, NA, 2, 4, 1, 2, NA, 3, 1, 1, 1, 1, 1, 1, 1, 
2, 1, 3, 1, 1, 1, 2, 4, 2, 1, 1), .Dim = c(40L, 1L)), HTnot0 = structure(c(6, 
11, 7, 7, 5, 7, 6, 5, 3, 6, 3, 5, 6, 105, 7, 6, 353, 5, 6, 9, 
6, 6, 12, 5, 5, 5, 249, 5, 7, 11, 5, 7, 5, 290, 6, 6, 6, 6, 5, 
6), .Dim = c(40L, 1L)), callcat = structure(c(NA, NA, "CARD", 
"CARD", "GENERAL", "LOAN", "CHANGE DETAILS", "GENERAL", "LOAN", 
"CHANGE DETAILS", "LOAN", "CARD", "FUNDS TRANSFER", "FEE", "BALANCE", 
NA, "CARD", NA, NA, "STATEMENT", "CARD", "CARD", "GENERAL", "CARD", 
"CARD", "TERM DEPOSIT", "CARD", "GENERAL", "CARD", "CARD", "GENERAL", 
NA, NA, NA, NA, "CARD", "CARD", "FUNDS TRANSFER", "GENERAL", 
"MyBusinessOverride"), .Dim = c(40L, 1L), .Dimnames = list(NULL, 
"callcat")), HTcat = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 1L, 1L, 12L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 9L, 1L, 1L, 1L, 1L, 1L, 1L, 10L, 1L, 1L, 
1L, 1L, 1L, 1L), .Dim = c(40L, 1L), .Dimnames = list(NULL, "HTcat"))), .Names = c("holdtime_int", 
"NPS_int", "HTnot0", "callcat", "HTcat"), row.names = c(NA, 40L
), class = "data.frame")

+3

r warnings na correlation

Rnovice 26 Aug 14 at 5:48 am

source to share

1 answer

Joris meys · Answer 1 · 2014-08-26T09:50:53+0000

If you do this split, many of your samples consist of only one observation (after removing the NA). Obviously there is no correlation there.

The warning you get is when one of the two variables only contains one value. In your example, this is, for example, a dataframe for callcat==FUNDS TRANSFER

. holdtime_int

has only one value (6), so the standard deviation is 0 (hence a warning) and the resulting correlation is NA.

I don't know why you are looking at these correlations, but in the data you provided, they almost make no sense to me. If you want to get rid of the warning, you can create a check, for example:

lapply(split(HTNPS,HTNPS$callcat), function(x){
  x <- na.exclude( x[c("holdtime_int","NPS_int")] )
  if(any(sapply(x, function(i)length(unique(i))) < 2 )){
    NA
  } else {
    cor(x[,1],x[,2], method="spearman")
  }
})

Which should give you the same result, but without warning. Note the use na.exclude

to get rid of NA.

Correlation in R, when I do "pairwise.complet.obs" I get the error "standard deviation is 0"

More articles: