Check if there are multiple words before the current word

Question

Check if there are multiple words before the current word

I have lines of the following flavor:

Random Inc
A Non-Random Inc

I would like to remove a word Inc

from all of these lines if there is more than one word left in front of it. Result for the above two examples:

Random Inc
A Non-Random

What's the correct regex to hook into gsub

for this? Specifically, how do you specify complete words in a regular expression? I thought it would be \w

, but it's a word symbol that doesn't seem right.

+3

string regex r

Alex 23 oct. 14 at 23:51

source to share

3 answers

You can use regex like this:

([-\w]+\s+[-\w]+)\s+Inc

Working demo

enter image description here

+1

Federico Piazza 24 oct. 14 at 0:07

source to share

I think you mean one or more nonspatial characters as a complete word. If yes then you can use \S+

.

> x <- c('Random Inc', 'A Non-Random Inc', 'Another Inc', 'A Random other Inc')
> sub("^\\S+(?:\\s+\\S+)?$(*SKIP)(*F)|\\s+Inc\\b", "", x, perl=T)
[1] "Random Inc"     "A Non-Random"   "Another Inc"    "A Random other"

^\\S+(?:\\s+\\S+)?$

Matches a string that has exactly one or two words.
(*SKIP)(*F)

Causes an error match.
|

OR (i.e. consider only the rest of the line)
\\s+Inc\\b

Matche Inc

as well as the previous one or more spaces.

0

Avinash Raj 24 oct. '14 at 2:50

source to share

hwnd · Accepted Answer · 2014-10-24T00:06:59+0000

\w

matches a word character, but in this case you feel like you need to consider the hyphen and use quantifier .

x <- c('Random Inc', 'A Non-Random Inc', 'Another Inc', 'A Random other Inc')
sub('[\\w-]+ [\\w-]+\\K *Inc', '', x, perl=TRUE)
# [1] "Random Inc"     "A Non-Random"   "Another Inc"    "A Random other"

First, we match any word character, a hyphen "one or more" times, followed by a space followed by a word character, a hyphen "one or more" times. The escape sequence \K

resets the origin of the reported match and any previously used characters are no longer included. We then match the spaces "zero or more" times followed by the word "Inc." Since we are using \K

, we are using empty replacement because it \K

acts like a zero-width assertion.

Check if there are multiple words before the current word

More articles: