Replace unwanted factor level values ​​with NA

I have a large dataframe that contains both empty missing values ​​and NA. Doing the pivot (factor (df $ col)) gives me something like

A  
123  
B  
50000  

90000  
C  
26000
NA  
12476  

      

(Note the space after 50000

.)
And sum(is.na(df$col))

is 12476, the same as the number NA

's, but I would like it to be the sum of the spaces and NA

s.
I tried to create a level for spaces by doing levels(df$col) <- c("A", "B", "Blank", "C")


And then try df$col <- factor(df$col, exclude="Blank")

and it says it NA

was generated but my output is the same. Does anyone know how to create a factor based neural network, or have a better solution to replace missing values? I think the problem might be that the spaces are more than one space character, so they didn't turn into NA

, but I don't know how to confirm this.

+3


source to share


1 answer


Try the following:

df <- data.frame(a=11:18, col=c("C", "", "A", NA, "A", "", "C", NA))
levels(df$col) # ""  "A" "C"
sum(is.na(df$col)) # 2

df$col <- factor(df$col, levels=LETTERS[1:3])
levels(df$col) # "A" "B" "C"
sum(is.na(df$col)) # 4

      



Since the new levels do not include a space (""), all spaces become NA.

+2


source







All Articles