Regular expression to replace words separated by a pipe into one part
I am viewing text that looks like this:
I'm going to be here because I need
I would like to remove the substring before the colon and replace it with just the substring and color. So the above line should look like
I will be here because I have to
I know I can do it with a python loop as shown below, but it requires the speed of a regex
s = "I'm goin|going to be here because I hafta|have to"
for word in s.split():
if '|' in word:
word = word.split('|')[1]
print(word)
I would like to use something like re.sub
to handle this line.
source to share
You can use a regex that will match 1+ character words |
:
import re
s = "I'm goin|going to be here because I hafta|have to"
s = re.sub(r'\w+\|\b', '', s)
print(s)
# => I'm going to be here because I have to
See Python Demo
Since a character is |
always followed by the word char, it is recommended to use \b
(word boundary) after it. This way, you avoid matching one|
with spaces or punctuation (if you prefer to keep them).
See regex demo :
-
\w+
- 1 or more (due to the quantifier+
) dictionary characters (letters, numbers,_
) -
\|
- literal|
character (if not escaped, denotes an alternation operator) -
\b
- word boundary.
source to share
Something like this will work:
Code:
import re
RE_FRONT_HALF = re.compile(r'\w+\|')
sample = "I'm goin|going to be here because I hafta|have to"
print(RE_FRONT_HALF.sub('', sample))
How?
Find one or more word characters followed by a pipe |
.
Results:
I'm going to be here because I have to
source to share