Subset rows based on a specific threshold

I want to get a subset of the column observations of my dataframe based on a threshold. I'll explain this question to you in a little more detail.

I have a data frame with a methylation rate of 35 patients who underwent lung adenocarcinoma. This is a subset of my data:

> df.met[1:5,1:5]
                A2BP1       A2M     A2ML1     A4GALT       AAAS
paciente6  0.36184475 0.4555788 0.6422624 0.08051388 0.15013343
paciente7  0.47566878 0.7329827 0.4938048 0.45487573 0.10827520
paciente8  0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
paciente9  0.04830471 0.5166676 0.8878207 0.08881092 0.11779075
paciente10 0.16757806 0.7896194 0.5408747 0.35315243 0.09234602

      

Now I need to get another object (with the same number of columns, but fewer rows and different in each column) with a subset of values ​​greater than 0.1 for all columns of the original dataframe.

My intention is to get an object like this (I don't know if it's possible ...):

            A2BP1       A2M     A2ML1     A4GALT       AAAS
paciente6  0.36184475 0.4555788 0.6422624            0.15013343
paciente7  0.47566878 0.7329827 0.4938048 0.45487573 0.10827520
paciente8  0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
paciente9             0.5166676 0.8878207            0.11779075
paciente10 0.16757806 0.7896194 0.5408747 0.35315243 

      

In other words, I want to avoid my dataframe, the values ​​are less than 0.1.

Thank you very much!

+3


source to share


2 answers


You may need

df.met[!rowSums(df.met <= 0.1),,drop=FALSE]
#           A2BP1       A2M     A2ML1    A4GALT      AAAS
#paciente7 0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
#paciente8 0.1745550 0.7528387 0.5686839 0.3701804 0.1242392

      

Update

Based on editing

is.na(df.met) <- df.met <= 0.1
df.met
#              A2BP1       A2M     A2ML1    A4GALT      AAAS
#paciente6  0.3618447 0.4555788 0.6422624        NA 0.1501334
#paciente7  0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
#paciente8  0.1745550 0.7528387 0.5686839 0.3701804 0.1242392
#paciente9         NA 0.5166676 0.8878207        NA 0.1177907
#paciente10 0.1675781 0.7896194 0.5408747 0.3531524        NA

      



Using data.table

library(data.table)#v1.9.5+
setDT(df.met, keep.rownames=TRUE)[]

for(j in 2:ncol(df.met)){
   set(df.met, i=which(df.met[[j]] <=0.1), j=j, value=NA)
 }

 df.met
 #          rn     A2BP1       A2M     A2ML1    A4GALT      AAAS
 #1:  paciente6 0.3618447 0.4555788 0.6422624        NA 0.1501334
 #2:  paciente7 0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
 #3:  paciente8 0.1745550 0.7528387 0.5686839 0.3701804 0.1242392
 #4:  paciente9        NA 0.5166676 0.8878207        NA 0.1177907
 #5: paciente10 0.1675781 0.7896194 0.5408747 0.3531524        NA

      

data

df.met <- structure(list(A2BP1 = c(0.36184475, 0.47566878, 0.17455497, 
0.04830471, 0.16757806), A2M = c(0.4555788, 0.7329827, 0.7528387, 
0.5166676, 0.7896194), A2ML1 = c(0.6422624, 0.4938048, 0.5686839, 
0.8878207, 0.5408747), A4GALT = c(0.08051388, 0.45487573, 0.37018038, 
0.08881092, 0.35315243), AAAS = c(0.15013343, 0.1082752, 0.12423923, 
0.11779075, 0.09234602)), .Names = c("A2BP1", "A2M", "A2ML1", 
"A4GALT", "AAAS"), class = "data.frame", row.names = c("paciente6", 
"paciente7", "paciente8", "paciente9", "paciente10"))

      

+5


source


To match your desired output (values ​​<= 0.1 replaced with an empty field), you can do:

library(dplyr)
df.met %>% 
  add_rownames("pacientes") %>%
  mutate_each(funs(replace(., . <= 0.1, "")))

      

What gives:

# Source: local data frame [5 x 6]
#
#    pacientes      A2BP1       A2M     A2ML1     A4GALT       AAAS
# 1  paciente6 0.36184475 0.4555788 0.6422624            0.15013343
# 2  paciente7 0.47566878 0.7329827 0.4938048 0.45487573  0.1082752
# 3  paciente8 0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
# 4  paciente9            0.5166676 0.8878207            0.11779075
# 5 paciente10 0.16757806 0.7896194 0.5408747 0.35315243

      



Note. ... This will convert all columns to symbol. You must do the following:

df.met %>% 
  add_rownames("pacientes") %>%
  mutate_each(funs(replace(., . <= 0.1, NA)))   

      

This will keep your original data structure (all columns are numeric)

+2


source







All Articles