How do I make the grepl function specific?
My data frame is shown below. I need to extract the data of a specific row according to the column name "geneID" one by one. I am using a function grepl
.
#Data frame:geneDf
geneID=c("EGFR","Her2","PTENPP","PTEN")
patient1=c(12,23,56,23)
patient2=c(23,34,11,6)
patient3=c(56,44,32,45)
patient4=c(23,64,45,23)
geneDf=data.frame(patient1,patient2,patient3,patient4,geneID)
geneDf
patient1 patient2 patient3 patient4 geneID
1 12 23 56 23 EGFR
2 23 34 44 64 Her2
3 56 11 32 45 PTENPP
4 23 6 45 23 PTEN
The first three lines work well.
targetGene<-subset(geneDf,grepl(geneDf$geneID[1],geneDf$geneID))
targetGene
patient1 patient2 patient3 patient4 geneID
1 12 23 56 23 EGFR
When I fetch the 4th row data, I get this:
targetGene<-subset(geneDf,grepl(geneDf$geneID[4],geneDf$geneID))
targetGene
patient1 patient2 patient3 patient4 geneID
3 56 11 32 45 PTENPP
4 23 6 45 23 PTEN
It looks like other data, in this case, the third row of the "geneID" column, which includes the contents of the fourth row, is also matched. What happened to my team? How do I do this to collect data for a specific row each time?
source to share
You may need word boundary
ie \\b
or use
subset(geneDf, grepl(paste0('^', geneID[4], '$'), geneID))
# patient1 patient2 patient3 patient4 geneID
#4 23 6 45 23 PTEN
or
subset(geneDf, grepl(paste0('\\b', geneID[4], '\\b'), geneID))
# patient1 patient2 patient3 patient4 geneID
#4 23 6 45 23 PTEN
source to share
@akrun answered your specific question, but if you want to create subsets of your data according to another variable, you might also be interested in the function split
:
split(geneDf, geneDf$geneID)
## $EGFR
## patient1 patient2 patient3 patient4 geneID
## 1 12 23 56 23 EGFR
##
## $Her2
## patient1 patient2 patient3 patient4 geneID
## 2 23 34 44 64 Her2
##
## $PTEN
## patient1 patient2 patient3 patient4 geneID
## 4 23 6 45 23 PTEN
##
## $PTENPP
## patient1 patient2 patient3 patient4 geneID
## 3 56 11 32 45 PTENPP
##
source to share