How do I denote any range of values around a specific string in R?

Question

How do I denote any range of values around a specific string in R?

This is the next question of this .

Data

x <- data.frame(file.ID = "Car1", 
                frames = 1:15, 
                lane.change = c("no", "no", "no", "yes", 
                                "no", "no", "no", "no", 
                                "no", "yes", "no", "no", "no", "no", "no"))

Problem

I want to mark several lines above and several lines after the line lane.change=="yes"

in every band change for a given group file.ID

. The answers to the previous question work for sequential lines, but not for any number of lines. I have tried providing an argument n

in lead

and lag

, but it doesn’t give the desired results.

Desired output

Ideally, I want to be able to mark any number of lines before and after lane.change=="yes"

. In my original dataframe, I want to mark 800 lines before and after. But in the sample dataframe, x

I'm trying to put a label 2. So the desired output should be:

   file.ID frames lane.change range_LC
1     Car1      1          no        .
2     Car1      2          no      LC1
3     Car1      3          no      LC1
4     Car1      4         yes      LC1
5     Car1      5          no      LC1
6     Car1      6          no      LC1
7     Car1      7          no        .
8     Car1      8          no      LC2
9     Car1      9          no      LC2
10    Car1     10         yes      LC2
11    Car1     11          no      LC2
12    Car1     12          no      LC2
13    Car1     13          no        .
14    Car1     14          no        .
15    Car1     15          no        .

Please help me to get the desired result. Since the original data has multiple file.ID

s, I prefer the solution dplyr

because I can later use group_by

. Thank.

EDIT

I want to summarize the code for multiple file.ID

s. You can download a subset of the original dataframe that contains 2 file.ID

s here . I've tried following (thanks to @ G5W's solution):

library(tidyr)
by_file.ID <- c %>% 
  group_by(file.ID) %>% 
  nest()

library(purrr)
by_file.ID <- by_file.ID %>% 
  mutate(range_LC = map(data, ~ ".")) %>% 
  mutate(Changes = map(data, ~ tail(which(.$lane.change=="yes"),-1)))

Note that changing the 1st lane has a very small index in each case. So I'll skip it by executing tail(which(...), -1)

. Also note that in this data I want to use 800 lines before and after the band change line. So the code for the individual file.ID

should be something like this:

range_LC[t(outer(Changes, -800:800, '+'))] = rep(1:length(Changes), each=1601)

The line above is the main piece of code I'm not sure about how to access the file.ID

s groups . I thought about using for loop

with do.call()

, but it will probably be very slow due to a lot of stripe and file.ID

s changes .

Thanks for your time and effort to help me.

+3

r dataframe dplyr tidyr

umair durrani June 29. 17 at 17:57

source to share

3 answers

It just requires careful indexing in the array.

x$range_LC = "."
Changes = which(x$lane.change == "yes")
x$range_LC[t(outer(Changes, -2:2, '+'))] = rep(1:length(Changes), each=5)
x
   file.ID frames lane.change range_LC
1     Car1      1          no        .
2     Car1      2          no        1
3     Car1      3          no        1
4     Car1      4         yes        1
5     Car1      5          no        1
6     Car1      6          no        1
7     Car1      7          no        .
8     Car1      8          no        2
9     Car1      9          no        2
10    Car1     10         yes        2
11    Car1     11          no        2
12    Car1     12          no        2
13    Car1     13          no        .
14    Car1     14          no        .
15    Car1     15          no        .

+3

G5W June 29. 17 at 18:35

source to share

I am just posting this answer to let you know that @ ycw's answer is completely appropriate for this question. You just need to tweak it a bit:

x22 <- x %>%
  mutate(LC_ID = rleid(lane.change)/2) %>%
  mutate(LC_ID2 = ifelse(LC_ID %% 1 == 0, paste0("LC", LC_ID), NA)) %>%
  mutate(LC_ID3 = lag(LC_ID2), LC_ID4 = lead(LC_ID2)) %>% 
  mutate(LC_ID5 = lag(LC_ID3), LC_ID6 = lead(LC_ID4))

x33 <- mutate(x22, range_LC = coalesce(x22$LC_ID2, x22$LC_ID3, x22$LC_ID4,
                                      x22$LC_ID5, x22$LC_ID6, "."))

x44 <- x33 %>% select(file.ID, frames, lane.change, range_LC)

#output:
x44

#    file.ID frames lane.change range_LC 
# 1     Car1      1          no        . 
# 2     Car1      2          no      LC1 
# 3     Car1      3          no      LC1 
# 4     Car1      4         yes      LC1 
# 5     Car1      5          no      LC1 
# 6     Car1      6          no      LC1 
# 7     Car1      7          no        . 
# 8     Car1      8          no      LC2 
# 9     Car1      9          no      LC2 
# 10    Car1     10         yes      LC2 
# 11    Car1     11          no      LC2 
# 12    Car1     12          no      LC2 
# 13    Car1     13          no        . 
# 14    Car1     14          no        . 
# 15    Car1     15          no        .

+3

Masoud June 29. 17 at 18:40

source to share

www · Accepted Answer · 2017-06-30T13:54:50+0000

After further thinking and testing, I think this solution might work for the OP. This is an improved solution from mine and Masoud in this thread. To perform the fill

function from the package, tidyr

it is required to fill

fill NA

between the upper and lower boundaries of the ground change.

# Load packages
library(dplyr)
library(tidyr)
library(data.table)

I created a larger test case than OP. Now there are two file.ID

. I am doing this to check if the grouping can work on more than one car.

# Create example data frames
x <- data.frame(file.ID = c(rep("Car1", 20), rep("Car2", 20)),
                frames = 1:40, 
                lane.change = c(rep(c("no", "no", "no", "no", "no", "yes", 
                                "no", "no", "no", "no", "no", "no",
                                "no", "yes", "no", "no", "no", "no", "no", "no"), 2)))

The OP can set the amount of lead and circle here. Here I have used 3 as an example. Please note that it is the OP's responsibility to make sure they don't overlap.

# Set the lead and lag distance
Step <- 3

# Create LC_ID, uppber bound and lower bound of the lead lag difference
x2 <- x %>%
  group_by(file.ID) %>%
  mutate(LC_ID = rleid(lane.change)/2) %>%
  mutate(LC_ID2 = ifelse(LC_ID %% 1 == 0, paste0("LC", LC_ID), NA)) %>%
  mutate(LC_ID3 = lag(LC_ID2, Step), LC_ID4 = lead(LC_ID2, Step))

LC_groupID1

and LC_groupID2

are meant to be grouped to be able to use fill

.

# Create groups based on LC_ID, Group the data and apply fill for two directions
x3 <- x2 %>%
  mutate(LC_groupID1 = ifelse(LC_ID %% 1 == 0, LC_ID + 0.5, LC_ID), 
         LC_groupID2 = ifelse(LC_ID %% 1 == 0, LC_ID - 0.5, LC_ID)) %>%
  group_by(file.ID, LC_groupID1) %>%
  # Fill one direction based on LC_ID4
  fill(LC_ID4, .direction = "down") %>%
  ungroup() %>%
  # Fill the other direction based on LC_ID3
  group_by(file.ID, LC_groupID2) %>%
  fill(LC_ID3, .direction = "up") %>%
  ungroup()

# Coalesce all the columns
x4 <- mutate(x3, range_LC = coalesce(x3$LC_ID2, x3$LC_ID3, x3$LC_ID4,"."))

# Select the columns
x5 <- x4 %>% select(file.ID, frames, lane.change, range_LC)

x5

- final result.

How do I denote any range of values ​​around a specific string in R?

Data

Problem

Desired output

EDIT

More articles:

How do I denote any range of values around a specific string in R?