Subset rows based on a specific threshold
I want to get a subset of the column observations of my dataframe based on a threshold. I'll explain this question to you in a little more detail.
I have a data frame with a methylation rate of 35 patients who underwent lung adenocarcinoma. This is a subset of my data:
> df.met[1:5,1:5]
A2BP1 A2M A2ML1 A4GALT AAAS
paciente6 0.36184475 0.4555788 0.6422624 0.08051388 0.15013343
paciente7 0.47566878 0.7329827 0.4938048 0.45487573 0.10827520
paciente8 0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
paciente9 0.04830471 0.5166676 0.8878207 0.08881092 0.11779075
paciente10 0.16757806 0.7896194 0.5408747 0.35315243 0.09234602
Now I need to get another object (with the same number of columns, but fewer rows and different in each column) with a subset of values ββgreater than 0.1 for all columns of the original dataframe.
My intention is to get an object like this (I don't know if it's possible ...):
A2BP1 A2M A2ML1 A4GALT AAAS
paciente6 0.36184475 0.4555788 0.6422624 0.15013343
paciente7 0.47566878 0.7329827 0.4938048 0.45487573 0.10827520
paciente8 0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
paciente9 0.5166676 0.8878207 0.11779075
paciente10 0.16757806 0.7896194 0.5408747 0.35315243
In other words, I want to avoid my dataframe, the values ββare less than 0.1.
Thank you very much!
source to share
You may need
df.met[!rowSums(df.met <= 0.1),,drop=FALSE]
# A2BP1 A2M A2ML1 A4GALT AAAS
#paciente7 0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
#paciente8 0.1745550 0.7528387 0.5686839 0.3701804 0.1242392
Update
Based on editing
is.na(df.met) <- df.met <= 0.1
df.met
# A2BP1 A2M A2ML1 A4GALT AAAS
#paciente6 0.3618447 0.4555788 0.6422624 NA 0.1501334
#paciente7 0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
#paciente8 0.1745550 0.7528387 0.5686839 0.3701804 0.1242392
#paciente9 NA 0.5166676 0.8878207 NA 0.1177907
#paciente10 0.1675781 0.7896194 0.5408747 0.3531524 NA
Using data.table
library(data.table)#v1.9.5+
setDT(df.met, keep.rownames=TRUE)[]
for(j in 2:ncol(df.met)){
set(df.met, i=which(df.met[[j]] <=0.1), j=j, value=NA)
}
df.met
# rn A2BP1 A2M A2ML1 A4GALT AAAS
#1: paciente6 0.3618447 0.4555788 0.6422624 NA 0.1501334
#2: paciente7 0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
#3: paciente8 0.1745550 0.7528387 0.5686839 0.3701804 0.1242392
#4: paciente9 NA 0.5166676 0.8878207 NA 0.1177907
#5: paciente10 0.1675781 0.7896194 0.5408747 0.3531524 NA
data
df.met <- structure(list(A2BP1 = c(0.36184475, 0.47566878, 0.17455497,
0.04830471, 0.16757806), A2M = c(0.4555788, 0.7329827, 0.7528387,
0.5166676, 0.7896194), A2ML1 = c(0.6422624, 0.4938048, 0.5686839,
0.8878207, 0.5408747), A4GALT = c(0.08051388, 0.45487573, 0.37018038,
0.08881092, 0.35315243), AAAS = c(0.15013343, 0.1082752, 0.12423923,
0.11779075, 0.09234602)), .Names = c("A2BP1", "A2M", "A2ML1",
"A4GALT", "AAAS"), class = "data.frame", row.names = c("paciente6",
"paciente7", "paciente8", "paciente9", "paciente10"))
source to share
To match your desired output (values ββ<= 0.1 replaced with an empty field), you can do:
library(dplyr)
df.met %>%
add_rownames("pacientes") %>%
mutate_each(funs(replace(., . <= 0.1, "")))
What gives:
# Source: local data frame [5 x 6]
#
# pacientes A2BP1 A2M A2ML1 A4GALT AAAS
# 1 paciente6 0.36184475 0.4555788 0.6422624 0.15013343
# 2 paciente7 0.47566878 0.7329827 0.4938048 0.45487573 0.1082752
# 3 paciente8 0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
# 4 paciente9 0.5166676 0.8878207 0.11779075
# 5 paciente10 0.16757806 0.7896194 0.5408747 0.35315243
Note. ... This will convert all columns to symbol. You must do the following:
df.met %>%
add_rownames("pacientes") %>%
mutate_each(funs(replace(., . <= 0.1, NA)))
This will keep your original data structure (all columns are numeric)
source to share