Get a specific word and the next in a line
I would like to get the name of the Genus and species in a string. Example:
"He saw a Panthera leo in the savanna"
I want to get "Panthera leo"
with the name of the genus.
I tried to use function word
(package stringr
):
my_sentence<-"He saw a Panthera leo in the savanna"
word(my_sentence,"Panthera",+1)
I know the problem is with the +1 argument. Do you have any clue?
Maybe I should use the gsub function?
source to share
Regex-Fu:
> m <- gregexpr('panthera\\W\\w\\w+', "He saw a Panthera leo in the
savanna", ignore.case = T)
> regmatches("He saw a Panthera leo in the savanna", m)
[[1]]
[1] "Panthera leo"
\W\w\w+
is one non-principal character, one word character, and then one or more word characters. This means that everything that comes after the panther must have at least two characters.
In stringr
it looks like this:
> s <- "He saw a Panthera leo in the savanna"
> pat <- regex('panthera\\W\\w\\w+', ignore_case = T)
> str_extract(s, pat)
[1] "Panthera leo"
I think I prefer this.
source to share