Stringr: extract words containing a specific word
Consider this simple example
dataframe <- data_frame(text = c('WAFF;WOFF;WIFF200;WIFF12',
'WUFF;WEFF;WIFF2;BIGWIFF'))
> dataframe
# A tibble: 2 x 1
text
<chr>
1 WAFF;WOFF;WIFF200;WIFF12
2 WUFF;WEFF;WIFF2;BIGWIFF
Here I want to extract words containing WIFF
, that is, I want to get a data file like this
> output
# A tibble: 2 x 1
text
<chr>
1 WIFF200;WIFF12
2 WIFF2;BIGWIFF
I tried to use
dataframe %>%
mutate( mystring = str_extract(text, regex('\bwiff\b', ignore_case=TRUE)))
but that only echoes NA. Any ideas?
Thank!
source to share
You seem to want to remove all the containing WIFF
and trailing words ;
, if any. Use
> dataframedataframe <- data.frame(text = c('WAFF;WOFF;WIFF200;WIFF12', 'WUFF;WEFF;WIFF2;BIGWIFF'))
> dataframe$text <- str_replace_all(dataframe$text, "(?i)\\b(?!\\w*WIFF)\\w+;?", "")
> dataframe
text
1 WIFF200;WIFF12
2 WIFF2;BIGWIFF
Sample (?i)\\b(?!\\w*WIFF)\\w+;?
matches:
-
(?i)
- case insensitive inline modifier -
\\b
- word boundary -
(?!\\w*WIFF)
- negative lookahead fails on any match where the word containsWIFF
anywhere inside it -
\\w+
- 1 or more word characters -
;?
- optional;
(?
matches 1 or 0 occurrences of the pattern it modifies)
If for some reason you want to use str_extract
, please note that your regex cannot work because it \bWIFF\b
matches a whole WIFF word and nothing else. You don't have such words in your DF. You can use "(?i)\\b\\w*WIFF\\w*\\b"
to match any word with WIFF
inside (case insensitive) and use str_extract_all
to get multiple occurrences and don't forget to join the matches in one "string":
> df <- data.frame(text = c('WAFF;WOFF;WIFF200;WIFF12', 'WUFF;WEFF;WIFF2;BIGWIFF'))
> res <- str_extract_all(df$text, "(?i)\\b\\w*WIFF\\w*\\b")
> res
[[1]]
[1] "WIFF200" "WIFF12"
[[2]]
[1] "WIFF2" "BIGWIFF"
> df$text <- sapply(res, function(s) paste(s, collapse=';'))
> df
text
1 WIFF200;WIFF12
2 WIFF2;BIGWIFF
You can "condense" the code str_extract_all
into a function sapply
, I have separated them for better visibility.
source to share