Replace unwanted factor level values with NA
I have a large dataframe that contains both empty missing values and NA. Doing the pivot (factor (df $ col)) gives me something like
A
123
B
50000
90000
C
26000
NA
12476
(Note the space after 50000
.)
And sum(is.na(df$col))
is 12476, the same as the number NA
's, but I would like it to be the sum of the spaces and NA
s.
I tried to create a level for spaces by doing
levels(df$col) <- c("A", "B", "Blank", "C")
And then try df$col <- factor(df$col, exclude="Blank")
and it says it NA
was generated but my output is the same. Does anyone know how to create a factor based neural network, or have a better solution to replace missing values? I think the problem might be that the spaces are more than one space character, so they didn't turn into NA
, but I don't know how to confirm this.
source to share
Try the following:
df <- data.frame(a=11:18, col=c("C", "", "A", NA, "A", "", "C", NA))
levels(df$col) # "" "A" "C"
sum(is.na(df$col)) # 2
df$col <- factor(df$col, levels=LETTERS[1:3])
levels(df$col) # "A" "B" "C"
sum(is.na(df$col)) # 4
Since the new levels do not include a space (""), all spaces become NA.
source to share