Remove some elements from a string
So, I have a vector consisting of
data<-c("Mark And (BD Marketing Da 1 Z _ 9793)",
"Andre All (BD Marketing DA 1 Z _ 9794 (plus))",
"Alli Inn (BD Sport Educ 1 C _ 9722 (plus))",
"Alli Inn (BP Sport Educ 1 Z _ 9347)")
And now I need to remove all characters before the _ as well as the parentheses where the word (plus) is missing, so the result should be
Mark And BD Marketing Da 1 Z
Andre All BD Marketing DA 1 Z (plus)
Alli Inn BD Sport Educ 1 C (plus)
Alli Inn BP Sport Educ 1 Z
I used
gsub("\\s*\\w*$", "", data)
and got
Alli Inn (BP Sport Educ 1 Z
but this is not correct as I need to remove the other parenthesis and also keep the (plus) where it is written.
I tried this:
gsub('\((?!plus)|(?<!plus)\)|.\\d+', '', rownames(data), perl=TRUE)
and got this
Alli Inn BP Sport Educ Z
but now I am missing number 1 to letter
+3
Miha
source
to share
2 answers
gsub('\\((?!plus)|(?<!plus)\\)|_ [0-9]*', '', data, perl=TRUE)
#[1] "Mark And BD Marketing Da 1 Z "
#[2] "Andre All BD Marketing DA 1 Z (plus)"
#[3] "Alli Inn BD Sport Educ 1 C (plus)"
#[4] "Alli Inn BP Sport Educ 1 Z "
+1
Pierre lafortune
source
to share
using dplyr and stringr this can be quick and dirty, but it gets the job done:
library(dplyr)
library(stringr)
data %>%
str_replace_all(" _ [1-9][0-9]{0,3}|\\(|\\)", "") %>%
str_replace_all("plus", "(plus)")
+1
uhlitz
source
to share