Subset rows based on a specific threshold

Question

Subset rows based on a specific threshold

I want to get a subset of the column observations of my dataframe based on a threshold. I'll explain this question to you in a little more detail.

I have a data frame with a methylation rate of 35 patients who underwent lung adenocarcinoma. This is a subset of my data:

> df.met[1:5,1:5]
                A2BP1       A2M     A2ML1     A4GALT       AAAS
paciente6  0.36184475 0.4555788 0.6422624 0.08051388 0.15013343
paciente7  0.47566878 0.7329827 0.4938048 0.45487573 0.10827520
paciente8  0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
paciente9  0.04830471 0.5166676 0.8878207 0.08881092 0.11779075
paciente10 0.16757806 0.7896194 0.5408747 0.35315243 0.09234602

Now I need to get another object (with the same number of columns, but fewer rows and different in each column) with a subset of values greater than 0.1 for all columns of the original dataframe.

My intention is to get an object like this (I don't know if it's possible ...):

            A2BP1       A2M     A2ML1     A4GALT       AAAS
paciente6  0.36184475 0.4555788 0.6422624            0.15013343
paciente7  0.47566878 0.7329827 0.4938048 0.45487573 0.10827520
paciente8  0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
paciente9             0.5166676 0.8878207            0.11779075
paciente10 0.16757806 0.7896194 0.5408747 0.35315243

In other words, I want to avoid my dataframe, the values are less than 0.1.

Thank you very much!

+3

r subset

Dani June 21. 15 at 16:52

source to share

2 answers

To match your desired output (values <= 0.1 replaced with an empty field), you can do:

library(dplyr)
df.met %>% 
  add_rownames("pacientes") %>%
  mutate_each(funs(replace(., . <= 0.1, "")))

What gives:

# Source: local data frame [5 x 6]
#
#    pacientes      A2BP1       A2M     A2ML1     A4GALT       AAAS
# 1  paciente6 0.36184475 0.4555788 0.6422624            0.15013343
# 2  paciente7 0.47566878 0.7329827 0.4938048 0.45487573  0.1082752
# 3  paciente8 0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
# 4  paciente9            0.5166676 0.8878207            0.11779075
# 5 paciente10 0.16757806 0.7896194 0.5408747 0.35315243

Note. ... This will convert all columns to symbol. You must do the following:

df.met %>% 
  add_rownames("pacientes") %>%
  mutate_each(funs(replace(., . <= 0.1, NA)))

This will keep your original data structure (all columns are numeric)

+2

Steven beaupré June 21. 15 at 17:53

source to share

akrun · Accepted Answer · 2015-06-21T16:58:39+0000

You may need

df.met[!rowSums(df.met <= 0.1),,drop=FALSE]
#           A2BP1       A2M     A2ML1    A4GALT      AAAS
#paciente7 0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
#paciente8 0.1745550 0.7528387 0.5686839 0.3701804 0.1242392

Update

Based on editing

is.na(df.met) <- df.met <= 0.1
df.met
#              A2BP1       A2M     A2ML1    A4GALT      AAAS
#paciente6  0.3618447 0.4555788 0.6422624        NA 0.1501334
#paciente7  0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
#paciente8  0.1745550 0.7528387 0.5686839 0.3701804 0.1242392
#paciente9         NA 0.5166676 0.8878207        NA 0.1177907
#paciente10 0.1675781 0.7896194 0.5408747 0.3531524        NA

Using data.table

library(data.table)#v1.9.5+
setDT(df.met, keep.rownames=TRUE)[]

for(j in 2:ncol(df.met)){
   set(df.met, i=which(df.met[[j]] <=0.1), j=j, value=NA)
 }

 df.met
 #          rn     A2BP1       A2M     A2ML1    A4GALT      AAAS
 #1:  paciente6 0.3618447 0.4555788 0.6422624        NA 0.1501334
 #2:  paciente7 0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
 #3:  paciente8 0.1745550 0.7528387 0.5686839 0.3701804 0.1242392
 #4:  paciente9        NA 0.5166676 0.8878207        NA 0.1177907
 #5: paciente10 0.1675781 0.7896194 0.5408747 0.3531524        NA

data

df.met <- structure(list(A2BP1 = c(0.36184475, 0.47566878, 0.17455497, 
0.04830471, 0.16757806), A2M = c(0.4555788, 0.7329827, 0.7528387, 
0.5166676, 0.7896194), A2ML1 = c(0.6422624, 0.4938048, 0.5686839, 
0.8878207, 0.5408747), A4GALT = c(0.08051388, 0.45487573, 0.37018038, 
0.08881092, 0.35315243), AAAS = c(0.15013343, 0.1082752, 0.12423923, 
0.11779075, 0.09234602)), .Names = c("A2BP1", "A2M", "A2ML1", 
"A4GALT", "AAAS"), class = "data.frame", row.names = c("paciente6", 
"paciente7", "paciente8", "paciente9", "paciente10"))

Subset rows based on a specific threshold

Update

data

More articles: