Selecting a word immediately after a keyword

I'm trying to extract a word at once with a keyword using R. I don't have much experience with regex, so everything I've found so far doesn't help me much. If I could get the function to return multiple instances that would be perfect.

For example, if my keyword was the

and my string was:

The yellow log is in the stream

He will return yellow

and stream


I found this solution for C # and it looks like what I want, but I'm having trouble implementing it in R.


Here's not a solution regex


mytext <- "The yellow log is in the stream"
mykey <- "the"

x <- unlist(strsplit(mytext," "))





You may try

str_extract_all(str1, perl('(?<=\\b(?i)The )\\w+'))[[1]]
#[1] "yellow" "stream"


Or using stringi

stri_extract_all_regex(str1, '(?<=\\b(?i)The )\\w+')[[1]]
 #[1] "yellow" "stream"


EDIT: Modified based on @ Roland's suggestion in the comments.


str1 <- 'The yellow log is in the stream'




assign key

whatever string you want and use

key <- 'the'
p <- "The yellow log is in the stream" 
regmatches(p, gregexpr(sprintf('(?i)(?<=%s\\s)\\w+', key), p, perl = TRUE))[[1]]
# [1] "yellow" "stream"


or, as @Roland points out, it would be safer to use a word boundary around your keyword to avoid this:

key <- 'the'
p <- "The yellow log is in the stream drinking absinthe and beer"
regmatches(p, gregexpr(sprintf('(?i)(?<=%s\\s)\\w+', key), p, perl = TRUE))[[1]]
# [1] "yellow" "stream" "and"   

regmatches(p, gregexpr(sprintf('(?i)(?<=\\b%s )\\w+', key), p, perl = TRUE))[[1]]
# [1] "yellow" "stream"




Try this: this returns "yellow" and "flow"

x <- "The yellow log is in the stream"

regmatches(x, gregexpr("(?:(?:T|t)he)\\s(\\w+)", x, perl = TRUE))[[1]]
## [1] "The yellow" "the stream"




The qdapRegex package that I maintain has a regular expression after_

in the dictionary regex_supplement

which is perfect for this. You can use rm_

to create your own function after_the



x<- "The yellow log is in the stream"
after_the <- rm_(pattern = S("@after_", "[Tt]he"), extract = TRUE)

## [[1]]
## [1] "yellow" "stream"


The function S

is a wrapper around sprintf

that allows you to easily pass elements (eg work "in this case") into the underlying regex, creating:

S("@after_", "the", "The")
## [1] "(?<=\\b(the|The)\\s)(\\w+)"




x<- c("The yellow log is in the stream", "I like the one box for a pack")
after_ <- rm_(extract = TRUE)

words <- c("the", "a", "one")

setNames(lapply(words, function(y){
    after_(x, pattern = S("@after_", y, TC(y)))
}), words)

## $the
## $the[[1]]
## [1] "yellow" "stream"
## $the[[2]]
## [1] "one"
## $a
## $a[[1]]
## [1] NA
## $a[[2]]
## [1] "pack"
## $one
## $one[[1]]
## [1] NA
## $one[[2]]
## [1] "box"




