Replace a word with one word without leading @ in R

I am trying to perform a data string replacement in R. I need to find all words preceded by "@" (no space, like @word) and change "@" to "!" (for example, from @word to!). At the same time, it leaves other instances of "@" unchanged (for example, @ or @@ or @ [@]). For example, this is my original dataframe (for modification: @def, @jkl, @stu):

> df = data.frame(number = 1:4, text = c('abc @def ghi', '@jkl @ mno', '@[@] pqr @stu', 'vwx @@@ yz'))
> df
  number          text
1      1  abc @def ghi
2      2    @jkl @ mno
3      3 @[@] pqr @stu
4      4    vwx @@@ yz

      

And here's what I need to look like this:

> df_result = data.frame(number = 1:4, text = c('abc !def ghi', '!jkl @ mno', '@[@] pqr !stu', 'vwx @@@ yz'))
> df_result
  number          text
1      1  abc !def ghi
2      2    !jkl @ mno
3      3 @[@] pqr !stu
4      4    vwx @@@ yz

      

I tried with

> gsub('@.+[a-z] ', '!', df$text)
[1] "abc !ghi"   "!@ mno"     "!@stu"      "vwx @@@ yz"

      

But the result is not desired. Any help is greatly appreciated.

Thank.

+3


source to share


2 answers


What about

gsub("(^| )@(\\w)", "\\1!\\2", df$text)
# [1] "abc !def ghi"  "!jkl @ mno"    "@[@] pqr !stu" "vwx @@@ yz"  

      

This matches a character @

at the beginning of a line or after a space. Then we fix the word character after the character @

and replace @

with !

.



Clarification courtesy of regex101.com :

  • (^| )

    - the first group of capture; ^

    approves the position at the beginning of the line; |

    stands for "or"; white space literally runs through the space.
  • @

    matches character @

    literally (case sensitive)
  • (\\w)

    - the second capture group, it stands for the word character

The replacement string \\1!\\2

replaces the regex match with the first capturing group ( \\1

) !

followed by the second capturing group ( \\2

).

+3


source


You can use positive viewing (?=...)

gsub("@(?=[A-Za-z])", "!", df$text, perl = TRUE)
[1] "abc !def ghi"  "!jkl @ mno"    "@[@] pqr !stu" "vwx @@@ yz"  

      



From the documentation page "Regular Expressions Used in R":

The patterns (? = ...) and (?! ...) are zero-width positive and negative assertions: they match if an attempt to match ... forward from the current position is successful (or not)), but don't use no characters in the processed string.

+3


source







All Articles