Check if there are multiple words before the current word
I have lines of the following flavor:
Random Inc
A Non-Random Inc
I would like to remove a word Inc
from all of these lines if there is more than one word left in front of it. Result for the above two examples:
Random Inc
A Non-Random
What's the correct regex to hook into gsub
for this? Specifically, how do you specify complete words in a regular expression? I thought it would be \w
, but it's a word symbol that doesn't seem right.
source to share
\w
matches a word character, but in this case you feel like you need to consider the hyphen and use quantifier .
x <- c('Random Inc', 'A Non-Random Inc', 'Another Inc', 'A Random other Inc')
sub('[\\w-]+ [\\w-]+\\K *Inc', '', x, perl=TRUE)
# [1] "Random Inc" "A Non-Random" "Another Inc" "A Random other"
First, we match any word character, a hyphen "one or more" times, followed by a space followed by a word character, a hyphen "one or more" times. The escape sequence \K
resets the origin of the reported match and any previously used characters are no longer included. We then match the spaces "zero or more" times followed by the word "Inc." Since we are using \K
, we are using empty replacement because it \K
acts like a zero-width assertion.
source to share
I think you mean one or more nonspatial characters as a complete word. If yes then you can use \S+
.
> x <- c('Random Inc', 'A Non-Random Inc', 'Another Inc', 'A Random other Inc')
> sub("^\\S+(?:\\s+\\S+)?$(*SKIP)(*F)|\\s+Inc\\b", "", x, perl=T)
[1] "Random Inc" "A Non-Random" "Another Inc" "A Random other"
-
^\\S+(?:\\s+\\S+)?$
Matches a string that has exactly one or two words. -
(*SKIP)(*F)
Causes an error match. -
|
OR (i.e. consider only the rest of the line) -
\\s+Inc\\b
MatcheInc
as well as the previous one or more spaces.
source to share