Unique frequency
Im trying to find the number of times a unique pattern is found in patterns with a matching p value
df1 <- read.table(text="
Gene id Seg.mean pValue CNA
Nfib 8410 0.3108 1.381913 gain
Mycl 8410 2.7320 1.182842 gain
Mycl 8410 2.7320 1.846275 gain
Nfib 8411 0.5920 1.381913 gain
Nfib 8411 1.3090 1.381913 gain
Mycl 8412 1.6150 5.765442 gain
Mycl 8411 1.6150 1.846275 gain
",header=TRUE)
expected output
Gene ID Freq. of id pValue
Nfib 8410,8411 2 1.381913
Mycl 8410,8411,8412 3 1.182842,1.846275,5.765442
+3
Kryo
source
to share
3 answers
sol'n:
library(dplyr)
df1 %>%
group_by(Gene) %>%
summarise(ID = paste0(unique(id), collapse=", "),
pval = paste0(unique(pValue),collapse=", "),
n = n_distinct(id))
result:
Gene ID pval n
1 Mycl 8410, 8412, 8411 1.182842, 1.846275, 5.765442 3
2 Nfib 8410, 8411 1.381913 2
breakdown:
- we want to rate by
Gene
(unit of analysis) and thereforegroup_by(Gene)
. - then create new variables that match
paste0(var,collapse=", ")
. This applies toGene
. - count the number of different identifiers. Applies again for
Gene
.
+2
npjc
source
to share
I think you can use data.table to get closer to the result you want to achieve:
library(data.table)
df1<-data.table(df1)
df1[,
list(ID = paste(unique(id), collapse=','),
"Freq. of id"=length(unique(id)),
pValue=paste(unique(pValue), collapse=",")),
keyby=list(Gene)]
+1
mucio
source
to share
library(plyr)
> ddply(data.frame(df1), .(Gene), summarise,ID=paste(unique(id), collapse=","),pValue=paste(unique(pValue), collapse=","),Freq = length(unique(id)))
Gene ID pValue Freq
1 Mycl 8410,8412,8411 1.182842,1.846275,5.765442 3
2 Nfib 8410,8411 1.381913 2
+1
RUser
source
to share