R retrieves elements from a string

I am trying to extract all words containing two adjacent vowels in a given string.

x <- "The team sat next to each other all year and still failed."

      

Results will be "team", "each", "year", "failed"

So far I've tried using [aeiou][aeiou]

help for this regmatches

, but that only gives me a word part.

Thank.

+3


source to share


4 answers


You can place \w*

before and after the character class according to the characters of the word "zero or more".



x <- "The team sat next to each other all year and still failed."
regmatches(x, gregexpr('\\w*[aeiou]{2}\\w*', x))[[1]]
# [1] "team"   "each"   "year"   "failed"

      

+5


source


words <-unlist(strsplit(x, " "))
words[grepl("[aeiou]{2}", words)]
#[1] "team"    "each"    "year"    "failed."

      

If you want to clear up the punctuation, it could be:



> words <-unlist(strsplit(x, "[[:punct:] ]"))
> words[grepl("[aeiou]{2}", words)]

      

+4


source


\w*[aeiou][aeiou]\w*

      

Try it. Check out the demo.

https://regex101.com/r/hJ3zB0/5

+1


source


The same with stringr

library(stringr)
xx <- str_split(x, " ")[[1]]
xx[str_detect(xx, "[aeiou]{2}")]
## [1] "team"    "each"    "year"    "failed."

      

Edit

As @akrun showed, this can be simplified to

str_extract_all(x, "\\w*[aeiou]{2}\\w*")[[1]]
## [1] "team"   "each"   "year"   "failed"

      

+1


source







All Articles