How to remove a specific repeating element after the first from a character vector

Question

How to remove a specific repeating element after the first from a character vector

I have a vector of path steps and there is one particular path step that, if it repeats, I want to eliminate repetitions.

For example,

my_vec = "A > A > X > B > X > X > X > C > C"

Now, if "X" is repeated, I want to eliminate all repetitions of X except the first, while maintaining the order of the rest of the elements, so my desired result is:

my_vec = "A > A > X > B > X > C > C"

where duplicate Xs are excluded from the middle.

I tried this with a combination of for-loop and if-else so that I find that the previous element in the vector also contains an "X" and then replaced the NA element, after which I could remove the NA elements, but this approach does not give the desired result.

I tried looking here and here , but they just filter out unique items, while I want to do this on a specific item.

Here's my code:

my_vec <- unlist(str_split(my_vec, '>') )

for (i in length(my_vec)){
if (grepl('X', my_vec[i]) & grepl('X', my_vec[i-1])) {
    steps[i] <- NA

} else {
    next()
}}
my_new_vec <- str_c(steps, collapse = '>')

However, the output is exactly the same as the input and nothing changes in NA.

+3

regex vector r

Edgar Jul 20. 17 at 15:54

source to share

4 answers

We can use gsub

gsub("(?:X > )\\K(X > )\\1*", "", my_vec, perl = TRUE)
#[1] "A > A > X > B > X > C > C"

+2

akrun Jul 20. 17 at 16:03

source to share

Solution without regex. my_vec4

- final result.

# Create example string
my_vec <- "A > A > X > B > X > X > X > C > C"

library(dplyr)

# Split my_vec by " > "
my_vec2 <- strsplit(my_vec, split = " > ")[[1]]

# Same as the previous one and equal to X
X_logi <- my_vec2 == dplyr::lag(my_vec2) & my_vec2 %in% "X"

# Subset my_vec2 if X_logi is false
my_vec3 <- my_vec2[!X_logi]

# Concatenate my_vec3
my_vec4 <- paste(my_vec3, collapse = " > ")

0

www Jul 20. 17 at 16:13

source to share

let str = "A > A > X > B > X > X > X > C > C";
let result = str.replace(/(\s*X >)+/g, " X >");

console.log(result);  // A > A > X > B > X > C > C

Translated to R, it will be: gsub ("(\ s * X>) +", "X>", my_vec) - G. Grothendieck

0

JBone Jul 20. 17 at 16:30

source to share

G. Grothendieck · Accepted Answer · 2017-07-20T16:07:42+0000

1) gsub Replace any repeated X sequence followed by spaces, and more characters with the last match in that sequence. This also works if the sequence is complete. If we knew the sequence was not at the end, like in the example in the question, then we could simplify the first argument"(X > )*"

gsub("(X[> ]*)*", "\\1", my_vec)
## [1] "A > A > X > B > X > C > C"

2) strsplit / rle If you prefer to use strsplit

like in the code in the question, try it in combination with rle

. We manufacture first strsplit

, produce as

and then apply rle

to receive r

. Now, for each run, " X "

change its length to 1 and invert the runs by providing the released version ss

as s

. Finally, convert to string and remove leading and trailing spaces.

ss <- strsplit(paste0(" ", my_vec, " "), ">")[[1]]
r <- rle(ss)
r$lengths[r$values == " X "] <- 1
s <- inverse.rle(r)
trimws(paste(s, collapse = ">"))
##  "A > A > X > B > X > C > C"

(2a) Another approach using the strsplit

following. The first and last lines of code here are the same as the first and last lines of code in (2).

ss <- strsplit(paste0(" ", my_vec, " "), ">")[[1]]
s <- ss[!c(FALSE, ss[-1] == ss[-length(ss)] & ss[-1] == " X ")]
trimws(paste(s, collapse = ">"))
##  "A > A > X > B > X > C > C"

UPDATE: A case with a descriptor where the sequence is at the end and add (2) and (2a).

How to remove a specific repeating element after the first from a character vector

More articles: