Regular expression to replace words separated by a pipe into one part

I am viewing text that looks like this:

I'm going to be here because I need

I would like to remove the substring before the colon and replace it with just the substring and color. So the above line should look like

I will be here because I have to

I know I can do it with a python loop as shown below, but it requires the speed of a regex

s = "I'm goin|going to be here because I hafta|have to"
for word in s.split():
     if '|' in word:
             word = word.split('|')[1]
     print(word)

      

I would like to use something like re.sub

to handle this line.

+3


source to share


3 answers


You can use a regex that will match 1+ character words |

:

import re
s = "I'm goin|going to be here because I hafta|have to"
s = re.sub(r'\w+\|\b', '', s)
print(s)
# => I'm going to be here because I have to

      

See Python Demo



Since a character is |

always followed by the word char, it is recommended to use \b

(word boundary) after it. This way, you avoid matching one|

with spaces or punctuation (if you prefer to keep them).

See regex demo :

  • \w+

    - 1 or more (due to the quantifier +

    ) dictionary characters (letters, numbers, _

    )
  • \|

    - literal |

    character (if not escaped, denotes an alternation operator)
  • \b

    - word boundary.
+2


source


Something like this will work:

Code:

import re
RE_FRONT_HALF = re.compile(r'\w+\|')

sample = "I'm goin|going to be here because I hafta|have to"
print(RE_FRONT_HALF.sub('', sample))

      

How?



Find one or more word characters followed by a pipe |

.

Results:

I'm going to be here because I have to

      

+2


source


Note that \ w will also match 0-9 digits. If you don't want to match numbers in a word you can use:

import re

s = "I'm goin|going to be here because I hafta|have to"

s = re.sub("[a-zA-z]*\|", "", s)

print(s)

      

+2


source







All Articles