The ranges surrounding the values ​​in the data frame R dplyr

I have a data frame that looks something like this:

test <- data.frame(chunk = c(rep("a",27),rep("b",27)), x = c(1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1))

      

There is a column that I would like to group the data by using group_by()

in dplyr

, which in this example is calledchunk

I want to add another column to each chunk test

called x1

, so the resulting dataframe looks like this:

test1 <- data.frame(test, x1 = c(0,0,0,0,0,0,0,1,1,1,1,1,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0))

      

x1

identifies all occurrences of 0 in x

and takes a range of + -5 lines in each direction from the end of 0 and adds an identifier. The id doesn't matter, but in this example the id in x1

is 1 for the range and 2 for occurrences of 0 inx

Thanks for any help!

+2


source to share


1 answer


Here's how you can do it in dplyr

:

Shorter version:

n <- 1:5
test %>%
  group_by(chunk) %>%  
  mutate(x1 = ifelse((row_number() - min(which(x == 0))) %in% -n |
       (row_number(chunk) - max(which(x == 0))) %in% n, 1, ifelse(x == 0, 2, 0))) 

      



Longer (first) version:

test %>%
  group_by(chunk) %>%
  mutate(start = (row_number() - min(which(x == 0))) %in% -5:-1,
         end = (row_number() - max(which(x == 0))) %in% 1:5,
         x1 = ifelse(start | end, 1, ifelse(x == 0, 2, 0))) %>%
  select(-c(start, end))

Source: local data frame [54 x 3]
Groups: chunk

   chunk x x1
1      a 1  0
2      a 1  0
3      a 1  0
4      a 1  0
5      a 1  0
6      a 1  0
7      a 1  0
8      a 1  1
9      a 1  1
10     a 1  1
11     a 1  1
12     a 1  1
13     a 0  2
14     a 0  2
15     a 0  2
16     a 0  2
17     a 1  1
18     a 1  1
19     a 1  1
20     a 1  1
21     a 1  1
22     a 1  0
23     a 1  0
24     a 1  0
25     a 1  0
26     a 1  0
27     a 1  0
28     b 1  0
29     b 1  0
30     b 1  0
31     b 1  0
32     b 1  0
33     b 1  0
34     b 1  0
35     b 1  1
36     b 1  1
37     b 1  1
38     b 1  1
39     b 1  1
40     b 0  2
41     b 0  2
42     b 0  2
43     b 0  2
44     b 1  1
45     b 1  1
46     b 1  1
47     b 1  1
48     b 1  1
49     b 1  0
50     b 1  0
51     b 1  0
52     b 1  0
53     b 1  0
54     b 1  0

      

The assumption in this approach is that there is only one sequence of 0s in each chunk (as in the sampled data). Let me know if this is not the case in your actual data.

+2


source







All Articles