Finding patterns in data.table strings in R
I am trying to find patterns in rows of a data table while still maintaining data links across all rows. Here's an example:
Row ID Value
1 C 1000
2 A 500
3 T -200
4 B 5000
5 T -900
6 A 300
I would like to search for all instances of "ATB" in sequential lines and output integers from the value column. Ideally, I want to combine the number of instances as well. The output table will look like this:
String Frequency Value1 Value2 Value 3
ATB 1 500 -200 5000
CAT 1 1000 500 -200
Since the data.table packages seem to be focused on providing operations on a column or a row of rows, I thought it should be possible. However, I have no idea where to start. Any pointers in the right direction would be greatly appreciated.
Thank!
source to share
library("plyr")
library("stringr")
df <- read.table(header = TRUE, text = "Row ID Value
1 C 1000
2 A 500
3 T -200
4 B 5000
5 T -900
6 A 300
7 C 200
8 A 700
9 T -500")
sought <- c("ATB", "CAT", "NOT")
ids <- paste(df$ID, collapse = "")
ldply(sought, function(id) {
found <- str_locate_all(ids, id)
if (nrow(found[[1]])) {
vals <- outer(found[[1]][,"start"], 0:2, function(x, y) df$Value[x + y])
} else {
vals <- as.list(rep(NA, 3))
}
data.frame(ID = id, Count = str_count(ids, id),
setNames(as.data.frame(vals), paste0("Value", 1:3)))
})
Here's a solution using stringr
and plyr
. The ids are dumped into one line, all instances of each target location, and then a dataframe built with the corresponding columns.
source to share