Get a specific word and the next in a line

I would like to get the name of the Genus and species in a string. Example:

"He saw a Panthera leo in the savanna"

I want to get "Panthera leo"

with the name of the genus.

I tried to use function word

(package stringr

):

my_sentence<-"He saw a Panthera leo in the savanna"
word(my_sentence,"Panthera",+1)

      

I know the problem is with the +1 argument. Do you have any clue?

Maybe I should use the gsub function?

+3


source to share


2 answers


my_sentence<-"He saw a Panthera leo in the savanna"
x = strsplit(my_sentence, " ")
index = grep("Panthera", x, value=F)
want =x[c(index, index+1)][[1]]

      



+1


source


Regex-Fu:

> m <- gregexpr('panthera\\W\\w\\w+', "He saw a Panthera leo in the 
savanna", ignore.case = T)
> regmatches("He saw a Panthera leo in the savanna", m)
[[1]]
[1] "Panthera leo"

      

\W\w\w+

is one non-principal character, one word character, and then one or more word characters. This means that everything that comes after the panther must have at least two characters.



In stringr

it looks like this:

> s <- "He saw a Panthera leo in the savanna"
> pat <- regex('panthera\\W\\w\\w+', ignore_case = T)
> str_extract(s, pat)
[1] "Panthera leo"

      

I think I prefer this.

0


source







All Articles