Remove some elements from a string

So, I have a vector consisting of

data<-c("Mark And (BD Marketing Da 1 Z _ 9793)",
"Andre All (BD Marketing DA 1 Z _ 9794 (plus))", 
"Alli Inn (BD Sport Educ 1 C _ 9722 (plus))",
"Alli Inn (BP Sport Educ 1 Z _ 9347)")

      

And now I need to remove all characters before the _ as well as the parentheses where the word (plus) is missing, so the result should be

Mark And BD Marketing Da 1 Z
Andre All BD Marketing DA 1 Z (plus)
Alli Inn BD Sport Educ 1 C (plus)
Alli Inn BP Sport Educ 1 Z

      

I used gsub("\\s*\\w*$", "", data)

and got

Alli Inn (BP Sport Educ 1 Z

      

but this is not correct as I need to remove the other parenthesis and also keep the (plus) where it is written.

I tried this: gsub('\((?!plus)|(?<!plus)\)|.\\d+', '', rownames(data), perl=TRUE)

and got this Alli Inn BP Sport Educ Z

but now I am missing number 1 to letter

+3


source to share


2 answers


gsub('\\((?!plus)|(?<!plus)\\)|_ [0-9]*', '', data, perl=TRUE)
#[1] "Mark And BD Marketing Da 1 Z "        
#[2] "Andre All BD Marketing DA 1 Z  (plus)"
#[3] "Alli Inn BD Sport Educ 1 C  (plus)"   
#[4] "Alli Inn BP Sport Educ 1 Z " 

      



+1


source


using dplyr and stringr this can be quick and dirty, but it gets the job done:



library(dplyr)
library(stringr)
data %>% 
  str_replace_all(" _ [1-9][0-9]{0,3}|\\(|\\)", "") %>% 
  str_replace_all("plus", "(plus)")

      

+1


source







All Articles