How to split a string into continuous one letter in R

I have a line like this:

s <- "aaehhhhhhhaannd"

      

How can I split a string into the following format using R?

c("aa", "e", "hhhhhhh", "aa","nn","d") 

      

+3


source to share


2 answers


You can use the database R strsplit

with PCRE regular expression-based appeals .

s <- "aaehhhhhhhaannd"
strsplit(s, "(?<=(.))(?!\\1)", perl=TRUE)
# [[1]]
# [1] "aa"      "e"       "hhhhhhh" "aa"      "nn"      "d"      

      

See the R demo online and the regex demo .

Regular Expression Details :



  • (?<=(.))

    - a positive lookbehind ( (?<=...)

    ) that "looks" to the left and commits any char to group 1 with a (.)

    capturing group (this value can be traced from within the template using a \1

    backreference )
  • (?!\\1)

    - a negative result that does not match if there is the same value that was written to group 1 immediately to the right of the current location.

Since the images do not consume text, the separation occurs in the space between the different characters.

NOTE. If you want to .

match a newline, add (?s)

at the beginning of the pattern (as in PCRE regex, .

does not match a line break by default).

+2


source


You can use str_extract_all

, with regex (.)\\1*

, which uses backreference to match repeated characters:



library(stringr)
str_extract_all("aaehhhhhhhaannd", "(.)\\1*")
#[[1]]
#[1] "aa"      "e"       "hhhhhhh" "aa"      "nn"      "d"

      

+2


source







All Articles