How to remove a specific repeating element after the first from a character vector

I have a vector of path steps and there is one particular path step that, if it repeats, I want to eliminate repetitions.

For example,

my_vec = "A > A > X > B > X > X > X > C > C"

      

Now, if "X" is repeated, I want to eliminate all repetitions of X except the first, while maintaining the order of the rest of the elements, so my desired result is:

my_vec = "A > A > X > B > X > C > C"

where duplicate Xs are excluded from the middle.

I tried this with a combination of for-loop and if-else so that I find that the previous element in the vector also contains an "X" and then replaced the NA element, after which I could remove the NA elements, but this approach does not give the desired result.

I tried looking here and here , but they just filter out unique items, while I want to do this on a specific item.

Here's my code:

my_vec <- unlist(str_split(my_vec, '>') )

for (i in length(my_vec)){
if (grepl('X', my_vec[i]) & grepl('X', my_vec[i-1])) {
    steps[i] <- NA

} else {
    next()
}}
my_new_vec <- str_c(steps, collapse = '>')

      

However, the output is exactly the same as the input and nothing changes in NA.

+3


source to share


4 answers


1) gsub Replace any repeated X sequence followed by spaces, and more characters with the last match in that sequence. This also works if the sequence is complete. If we knew the sequence was not at the end, like in the example in the question, then we could simplify the first argument"(X > )*"

gsub("(X[> ]*)*", "\\1", my_vec)
## [1] "A > A > X > B > X > C > C"

      

2) strsplit / rle If you prefer to use strsplit

like in the code in the question, try it in combination with rle

. We manufacture first strsplit

, produce as

and then apply rle

to receive r

. Now, for each run, " X "

change its length to 1 and invert the runs by providing the released version ss

as s

. Finally, convert to string and remove leading and trailing spaces.

ss <- strsplit(paste0(" ", my_vec, " "), ">")[[1]]
r <- rle(ss)
r$lengths[r$values == " X "] <- 1
s <- inverse.rle(r)
trimws(paste(s, collapse = ">"))
##  "A > A > X > B > X > C > C"

      



(2a) Another approach using the strsplit

following. The first and last lines of code here are the same as the first and last lines of code in (2).

ss <- strsplit(paste0(" ", my_vec, " "), ">")[[1]]
s <- ss[!c(FALSE, ss[-1] == ss[-length(ss)] & ss[-1] == " X ")]
trimws(paste(s, collapse = ">"))
##  "A > A > X > B > X > C > C"

      

UPDATE: A case with a descriptor where the sequence is at the end and add (2) and (2a).

+5


source


We can use gsub



gsub("(?:X > )\\K(X > )\\1*", "", my_vec, perl = TRUE)
#[1] "A > A > X > B > X > C > C"

      

+2


source


Solution without regex. my_vec4

- final result.

# Create example string
my_vec <- "A > A > X > B > X > X > X > C > C"

library(dplyr)

# Split my_vec by " > "
my_vec2 <- strsplit(my_vec, split = " > ")[[1]]

# Same as the previous one and equal to X
X_logi <- my_vec2 == dplyr::lag(my_vec2) & my_vec2 %in% "X"

# Subset my_vec2 if X_logi is false
my_vec3 <- my_vec2[!X_logi]

# Concatenate my_vec3
my_vec4 <- paste(my_vec3, collapse = " > ")

      

0


source


let str = "A > A > X > B > X > X > X > C > C";
let result = str.replace(/(\s*X >)+/g, " X >");

console.log(result);  // A > A > X > B > X > C > C

      

Translated to R, it will be: gsub ("(\ s * X>) +", "X>", my_vec) - G. Grothendieck

0


source







All Articles