Selecting a word immediately after a keyword

I'm trying to extract a word at once with a keyword using R. I don't have much experience with regex, so everything I've found so far doesn't help me much. If I could get the function to return multiple instances that would be perfect.

For example, if my keyword was the

and my string was:

The yellow log is in the stream

He will return yellow

and stream

.

I found this solution for C # and it looks like what I want, but I'm having trouble implementing it in R.

+3


source to share


5 answers


Here's not a solution regex

:



mytext <- "The yellow log is in the stream"
mykey <- "the"

x <- unlist(strsplit(mytext," "))

x[which(tolower(x)==mykey)+1]

      

+3


source


You may try

library(stringr)
str_extract_all(str1, perl('(?<=\\b(?i)The )\\w+'))[[1]]
#[1] "yellow" "stream"

      

Or using stringi

library(stringi)
stri_extract_all_regex(str1, '(?<=\\b(?i)The )\\w+')[[1]]
 #[1] "yellow" "stream"

      



EDIT: Modified based on @ Roland's suggestion in the comments.

data

str1 <- 'The yellow log is in the stream'

      

+5


source


assign key

whatever string you want and use

key <- 'the'
p <- "The yellow log is in the stream" 
regmatches(p, gregexpr(sprintf('(?i)(?<=%s\\s)\\w+', key), p, perl = TRUE))[[1]]
# [1] "yellow" "stream"

      

or, as @Roland points out, it would be safer to use a word boundary around your keyword to avoid this:

key <- 'the'
p <- "The yellow log is in the stream drinking absinthe and beer"
regmatches(p, gregexpr(sprintf('(?i)(?<=%s\\s)\\w+', key), p, perl = TRUE))[[1]]
# [1] "yellow" "stream" "and"   

regmatches(p, gregexpr(sprintf('(?i)(?<=\\b%s )\\w+', key), p, perl = TRUE))[[1]]
# [1] "yellow" "stream"

      

+4


source


Try this: this returns "yellow" and "flow"

x <- "The yellow log is in the stream"

regmatches(x, gregexpr("(?:(?:T|t)he)\\s(\\w+)", x, perl = TRUE))[[1]]
## [1] "The yellow" "the stream"

      

+2


source


The qdapRegex package that I maintain has a regular expression after_

in the dictionary regex_supplement

which is perfect for this. You can use rm_

to create your own function after_the

:

library(qdapRegex)

x<- "The yellow log is in the stream"
after_the <- rm_(pattern = S("@after_", "[Tt]he"), extract = TRUE)
after_the(x)

## [[1]]
## [1] "yellow" "stream"

      

The function S

is a wrapper around sprintf

that allows you to easily pass elements (eg work "in this case") into the underlying regex, creating:

S("@after_", "the", "The")
## [1] "(?<=\\b(the|The)\\s)(\\w+)"

      

EDIT

library(qdapRegex)

x<- c("The yellow log is in the stream", "I like the one box for a pack")
after_ <- rm_(extract = TRUE)
after_the(x)

after_ <- rm_(extract = TRUE)

words <- c("the", "a", "one")

setNames(lapply(words, function(y){
    after_(x, pattern = S("@after_", y, TC(y)))
}), words)


## $the
## $the[[1]]
## [1] "yellow" "stream"
## 
## $the[[2]]
## [1] "one"
## 
## 
## $a
## $a[[1]]
## [1] NA
## 
## $a[[2]]
## [1] "pack"
## 
## 
## $one
## $one[[1]]
## [1] NA
## 
## $one[[2]]
## [1] "box"

      

+2


source







All Articles