How to multiply data.frame?

Question

How to multiply data.frame?

I have a dataset like this

a <- data.frame(var1 = c("patientA", "patientA", "patientA", "patientB", "patientB", "patientB", "patientB"),
                var2 = as.Date(c("2015-01-02","2015-01-04","2015-02-02","2015-02-06","2015-01-02","2015-01-07","2015-04-02")),
                var3 = c(F, T, F, F, F, T, F)               
                )
sequ <- rle(as.character(a$var1))
a$sequ <- sequence(sequ$lengths)

production

> a
      var1       var2  var3 sequ
1 patientA 2015-01-02 FALSE    1
2 patientA 2015-01-04  TRUE    2
3 patientA 2015-02-02 FALSE    3
4 patientB 2015-02-06 FALSE    1
5 patientB 2015-01-02 FALSE    2
6 patientB 2015-01-07  TRUE    3
7 patientB 2015-04-02 FALSE    4

How can I multiply / filter this dataset to get all rows, var3 == TRUE and var2 is greater than the row where var3 == TRUE (from patient, var1? I tried

subset(a, (var3 == TRUE) & (var2 > var3))

but this does not lead to the correct set of results. Right -

#       var1       var2  var3 sequ
# 1 patientA 2015-01-04  TRUE    2
# 2 patientA 2015-02-02 FALSE    3
# 3 patientB 2015-02-06 FALSE    1
# 4 patientB 2015-01-07  TRUE    3
# 5 patientB 2015-04-02 FALSE    4

+3

r dataframe subset

jrara May 04 '15 at 18:06

source to share

3 answers

I add a date column when it var3

is TRUE

, filters on it, then discards it at the end.

library(dplyr)

a %>% group_by(var1)%>%
    mutate(truedate = first(var2[var3])) %>%
    filter(var2 >= truedate) %>%
    select(-truedate)

# Source: local data frame [5 x 4]
# Groups: var1

#       var1       var2  var3 sequ
# 1 patientA 2015-01-04  TRUE    2
# 2 patientA 2015-02-02 FALSE    3
# 3 patientB 2015-02-06 FALSE    1
# 4 patientB 2015-01-07  TRUE    3
# 5 patientB 2015-04-02 FALSE    4

+4

Gregor May 04 '15 at 18:20

source to share

Base-R solution: first, don't worry about your work rle

/ sequ

. Sort your data instead:

a <- a[order(a$var1,a$var2),]

Find Selected Rows:

myrows <- tapply(
  1:nrow(a),
  a$var1,
  function(ivec){
    istar <- ivec[a$var3[ivec]]
    ivec[ivec>=istar]
  })

Subset of c a[unlist(myrows),]

.

+3

Frank May 04 '15 at 18:48

source to share

akrun · Accepted Answer · 2015-05-04T18:18:22+0000

You can try with data.table

. Here we convert "data.frame" to "data.table" ( setDT(a)

), grouped by "var1", we get a boolean index for "var2" items that are greater than or equal to the corresponding "var2" items for which "var3" is TRUE and a subset of the dataset .SD

.

library(data.table)
setDT(a)[,.SD[var2 >= var2[var3]], var1]
#       var1       var2  var3 sequ
#1: patientA 2015-01-04  TRUE    2
#2: patientA 2015-02-02 FALSE    3
#3: patientB 2015-02-06 FALSE    1
#4: patientB 2015-01-07  TRUE    3
#5: patientB 2015-04-02 FALSE    4

Option using base R

(assuming the data is ordered by "var1")

a[with(a, var2>=rep(var2[var3], table(var1))),]
#      var1       var2  var3 sequ
#2 patientA 2015-01-04  TRUE    2
#3 patientA 2015-02-02 FALSE    3
#4 patientB 2015-02-06 FALSE    1
#6 patientB 2015-01-07  TRUE    3
#7 patientB 2015-04-02 FALSE    4

How to multiply data.frame?

More articles: