Number of matched items in columns with NA values in R
I am working in R and I have a matrix with values "A", "B" and "NA" and I would like to count the number of values "A" or "B" or NA in each column.
sum (MYDATA [, i] == "A") and sum (MYDATA [, i] == "B") worked fine for columns without NA. For columns containing NA, it is possible to count the number of NA with the sum (is.na (mydata [, i]) but in those columns the sum (mydata [, i] == "A") returns NA as result instead of number. How can I to count the number of "A" values in columns containing NA values?
Thank you for your help!
Example:
> mydata
V1 V2 V3 V4
V2 "A" "A" "A" "A"
V3 "A" "A" "A" "A"
V4 "B" "B" NA NA
V5 "A" "A" "A" "A"
V6 "B" "A" "A" "A"
V7 "B" "A" "A" "A"
V8 "A" "A" "A" "A"
> sum(mydata[,2]=="A")
[1] 6
> sum(mydata[,3]=="A")
[1] NA
> sum(is.na(mydata[,3]))
[1] 1
A function sum
(like many other math functions in R) takes an argument na.rm
. If you set na.rm=TRUE
, R removes all values NA
before performing calculations.
Try:
sum(mydata[,3]=="A", na.rm=TRUE)
Not sure if this is what you want. RnewB too so check if it works. The difference between the number of lines and the number of lines will tell you the number of NA elements.
colSums(!is.na(mydata))
To expand on the answer from @Andrie,
mydata <- matrix(c(rep("A", 8), rep("B", 2), rep(NA, 2), rep("A", 4),
rep(c("B", "A", "A", "A"), 2), rep("A", 4)), ncol = 4, byrow = TRUE)
myFun <- function(x) {
data.frame(n.A = sum(x == "A", na.rm = TRUE), n.B = sum(x == "B",
na.rm = TRUE), n.NA = sum(is.na(x)))
}
apply(mydata, 2, myFun)
Another possibility is to convert the column to a coefficient and then use the resume function. Example:
VEC <-c ("A", "B", "A", N. A.)
CV (as.factor (VEC))
A quick way to do this is to do summary statistics for a variable:
summary (mydata $ my_variable) tables (mydata $ my_variable)
This will give you the number of missing variables.
Hope it helps
You can use table
to count all your values at once.