Unique frequency

Im trying to find the number of times a unique pattern is found in patterns with a matching p value

df1 <-  read.table(text="
        Gene        id           Seg.mean    pValue    CNA
         Nfib       8410          0.3108     1.381913 gain
         Mycl       8410          2.7320     1.182842 gain
         Mycl       8410          2.7320     1.846275 gain
         Nfib       8411          0.5920     1.381913 gain
         Nfib       8411          1.3090     1.381913 gain
         Mycl       8412          1.6150     5.765442 gain
         Mycl       8411          1.6150     1.846275 gain
",header=TRUE)

      

expected output

Gene    ID           Freq. of id   pValue
Nfib    8410,8411        2           1.381913
Mycl    8410,8411,8412   3           1.182842,1.846275,5.765442

      

+3


source to share


3 answers


sol'n:

library(dplyr)

df1 %>% 
  group_by(Gene) %>% 
  summarise(ID = paste0(unique(id), collapse=", "),
            pval = paste0(unique(pValue),collapse=", "), 
            n = n_distinct(id))

      

result:



  Gene               ID                         pval n
1 Mycl 8410, 8412, 8411 1.182842, 1.846275, 5.765442 3
2 Nfib       8410, 8411                     1.381913 2

      

breakdown:

  • we want to rate by Gene

    (unit of analysis) and therefore group_by(Gene)

    .
  • then create new variables that match paste0(var,collapse=", ")

    . This applies to Gene

    .
  • count the number of different identifiers. Applies again for Gene

    .
+2


source


I think you can use data.table to get closer to the result you want to achieve:



library(data.table)

df1<-data.table(df1)
df1[,
list(ID = paste(unique(id), collapse=','),
     "Freq. of id"=length(unique(id)), 
     pValue=paste(unique(pValue), collapse=",")),
keyby=list(Gene)]

      

+1


source


library(plyr)
> ddply(data.frame(df1), .(Gene), summarise,ID=paste(unique(id), collapse=","),pValue=paste(unique(pValue), collapse=","),Freq = length(unique(id)))
  Gene             ID                     pValue Freq
1 Mycl 8410,8412,8411 1.182842,1.846275,5.765442    3
2 Nfib      8410,8411                   1.381913    2

      

+1


source







All Articles