Automatically extract p-value from data.frame

Question

Automatically extract p-value from data.frame

I want to compare protein expression values (n = 465 proteins) for two patient groups (resistant and sensitive).

I have 11 resistant patients and 8 sensitive patients. I would like to compare (ttest) the expression values of protein 1 of the resistant group ( A res

before K res

) with those of the sensitive group ( L sens

before S sens

), protein 2 (resistant) against protein 2 (sensitive), etc. As an output, I only want proteins where the p-value is <0.05.

I tried to do this (see below) but something is wrong and I can't figure out what.

        X Protein.1 Protein.2 Protein.3 Protein.4 Protein.5 Protein.6
1   A res      4127     16886      1785      1636       407       135
2   B res     10039     32414      3144      1543       601       154
3   C res       527      1059      1637       317       229       107
4   D res       553      3848      7357      1168      1549       441
5   E res      2351      2272      5868      2606       517       159
6   F res       822      1767      2110       818       293        75
7   G res       673      1887       511       471       214        NA
8   H res      5769      2206      2041       517       355       298
9   I res      1660      4221      1921       629       383       104
10  J res      3281      1804      2400       225       268        52
11  K res      3383      1882      1935       185        NA        NA
12 L sens     10810     20136      2350      1143       527       160
13 M sens      5941     14873      3550       943       308        NA
14 N sens      1100      2325      1359       561       542       284
15 O sens        85       587       619       364        85        52
16 P sens      2321      6335      6494       994        NA        NA
17 Q sens    103810      7102      7986      1464       439       187
18 R sens      1174      2076      1423       340       186        70
19 S sens      1829       973      1343       380       453       221

data <- read.csv("ProteinDataResSens.csv", sep=";", na.strings="weak", header=TRUE)

res <- data.frame(data[1:11, ], row.names=NULL)
colnames(res) <- paste("res", 1:length(res), sep="_")
sens <- data.frame(data[12:19, ], row.names=NULL)
colnames(sens) <- paste("sens", 1:length(sens), sep="_")
com <- combn(c(colnames(res), colnames(sens)), 2)

p <- apply(com, 2, function(x) t.test(data[, x[1]], data[, x[2]])$p.val)
data.frame(comparison=paste(com[1, ], com[2, ],sep=" vs."), p.value=p)

Thanks a lot for any help!

+3

r extraction dataframe p-value

Martin Neumann 08 dec. 14 at 11:14

source to share

1 answer

akrun · Accepted Answer · 2014-12-08T11:38:25+0000

If you want to compare res

with sens

for columnsProtein

grp <- sub(".* ", "", df$X)
Pvals <- mapply(function(x,y) t.test(x[grp=='res'], 
                x[grp=='sens'])$p.value, df[,-1], list(grp))

Pvals[Pvals < 0.05]

Or using data.table

library(data.table)
setDT(df)[, grp:= sub('.* ', "", X)][, lapply(.SD, 
      function(x) t.test(x[grp=='res'], x[grp=='sens'])$p.value),
                                              .SDcols=2:(ncol(df)-1)]

data

 df <- structure(list(X = c("A res", "B res", "C res", "D res", "E res", 
 "F res", "G res", "H res", "I res", "J res", "K res", "L sens", 
 "M sens", "N sens", "O sens", "P sens", "Q sens", "R sens", "S sens"
 ), Protein.1 = c(4127L, 10039L, 527L, 553L, 2351L, 822L, 673L, 
 5769L, 1660L, 3281L, 3383L, 10810L, 5941L, 1100L, 85L, 2321L, 
 103810L, 1174L, 1829L), Protein.2 = c(16886L, 32414L, 1059L, 
 3848L, 2272L, 1767L, 1887L, 2206L, 4221L, 1804L, 1882L, 20136L, 
 14873L, 2325L, 587L, 6335L, 7102L, 2076L, 973L), Protein.3 = c(1785L, 
 3144L, 1637L, 7357L, 5868L, 2110L, 511L, 2041L, 1921L, 2400L, 
 1935L, 2350L, 3550L, 1359L, 619L, 6494L, 7986L, 1423L, 1343L), 
 Protein.4 = c(1636L, 1543L, 317L, 1168L, 2606L, 818L, 471L, 
 517L, 629L, 225L, 185L, 1143L, 943L, 561L, 364L, 994L, 1464L, 
 340L, 380L), Protein.5 = c(407L, 601L, 229L, 1549L, 517L, 
 293L, 214L, 355L, 383L, 268L, NA, 527L, 308L, 542L, 85L, 
 NA, 439L, 186L, 453L), Protein.6 = c(135L, 154L, 107L, 441L, 
 159L, 75L, NA, 298L, 104L, 52L, NA, 160L, NA, 284L, 52L, 
 NA, 187L, 70L, 221L)), .Names = c("X", "Protein.1", "Protein.2", 
 "Protein.3", "Protein.4", "Protein.5", "Protein.6"), class =
 "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8",
 "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19"))

Automatically extract p-value from data.frame

data

More articles: