The ranges surrounding the values in the data frame R dplyr
I have a data frame that looks something like this:
test <- data.frame(chunk = c(rep("a",27),rep("b",27)), x = c(1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1))
There is a column that I would like to group the data by using group_by()
in dplyr
, which in this example is calledchunk
I want to add another column to each chunk test
called x1
, so the resulting dataframe looks like this:
test1 <- data.frame(test, x1 = c(0,0,0,0,0,0,0,1,1,1,1,1,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0))
x1
identifies all occurrences of 0 in x
and takes a range of + -5 lines in each direction from the end of 0 and adds an identifier. The id doesn't matter, but in this example the id in x1
is 1 for the range and 2 for occurrences of 0 inx
Thanks for any help!
source to share
Here's how you can do it in dplyr
:
Shorter version:
n <- 1:5
test %>%
group_by(chunk) %>%
mutate(x1 = ifelse((row_number() - min(which(x == 0))) %in% -n |
(row_number(chunk) - max(which(x == 0))) %in% n, 1, ifelse(x == 0, 2, 0)))
Longer (first) version:
test %>%
group_by(chunk) %>%
mutate(start = (row_number() - min(which(x == 0))) %in% -5:-1,
end = (row_number() - max(which(x == 0))) %in% 1:5,
x1 = ifelse(start | end, 1, ifelse(x == 0, 2, 0))) %>%
select(-c(start, end))
Source: local data frame [54 x 3]
Groups: chunk
chunk x x1
1 a 1 0
2 a 1 0
3 a 1 0
4 a 1 0
5 a 1 0
6 a 1 0
7 a 1 0
8 a 1 1
9 a 1 1
10 a 1 1
11 a 1 1
12 a 1 1
13 a 0 2
14 a 0 2
15 a 0 2
16 a 0 2
17 a 1 1
18 a 1 1
19 a 1 1
20 a 1 1
21 a 1 1
22 a 1 0
23 a 1 0
24 a 1 0
25 a 1 0
26 a 1 0
27 a 1 0
28 b 1 0
29 b 1 0
30 b 1 0
31 b 1 0
32 b 1 0
33 b 1 0
34 b 1 0
35 b 1 1
36 b 1 1
37 b 1 1
38 b 1 1
39 b 1 1
40 b 0 2
41 b 0 2
42 b 0 2
43 b 0 2
44 b 1 1
45 b 1 1
46 b 1 1
47 b 1 1
48 b 1 1
49 b 1 0
50 b 1 0
51 b 1 0
52 b 1 0
53 b 1 0
54 b 1 0
The assumption in this approach is that there is only one sequence of 0s in each chunk (as in the sampled data). Let me know if this is not the case in your actual data.
source to share