Python regex search strings that end with a colon and all text after the next line that ends with a colon

Question

Python regex search strings that end with a colon and all text after the next line that ends with a colon

I have the following text:

Test 123:

This is a blue car

Test:

This car is not blue

This car is yellow

Hello:

This is not a test

I want to concatenate a regex that finds all elements starting with Test

or Hello

and preceded by a colon and optionally a tree digit number and returns all content after that until the next line that fits that same description. So, for the text above, the return expression findall returns an array:

[("Test", "123", "\nThis is a blue car\n"),
 ("Test", "", "\nThis car is not blue\n\nThis car is yellow\n"),
 ("Hello", "", "\nThis is not a test")]

So far I got this:

r = re.findall(r'^(Test|Hello) *([^:]*):$', test, re.MULTILINE)

It matches each line as described, but I'm not sure how to grab the content up to the next line that ends with a colon. Any ideas?

+3

python regex

mart1n 30 oct. '14 at 8:49

source to share

2 answers

import re
p = re.compile(ur'(Test|Hello)\s*([^:]*):\n(\n.*?)(?=Test[^:]*:|Hello[^:]*:|$)', re.DOTALL | re.IGNORECASE)
test_str = u"Test 123:\n\nThis is a blue car\n\nTest:\n\nThis car is not blue\n\nThis car is yellow\n\nHello:\n\nThis is not a test"

re.findall(p, test_str)

You can try this. See demo.

http://regex101.com/r/eM1xP0/1

0

vks 30 oct. '14 at 9:14

source to share

Avinash Raj · Accepted Answer · 2014-10-30T08:59:43+0000

You can use the following regular expression which uses the DOTALL modifier,

(?:^|\n)(Test|Hello) *([^:]*):\n(.*?)(?=\n(?:Test|Hello)|$)

DEMO

>>> import re
>>> s = """Test 123:
... 
... This is a blue car
... 
... Test:
... 
... This car is not blue
... 
... This car is yellow
... 
... Hello:
... 
... This is not a test"""
>>> re.findall(r'(?s)(?:^|\n)(Test|Hello) *([^:]*):\n(.*?)(?=\n(?:Test|Hello)|$)', s)
[('Test', '123', '\nThis is a blue car\n'), ('Test', '', '\nThis car is not blue\n\nThis car is yellow\n'), ('Hello', '', '\nThis is not a test')]

Python regex search strings that end with a colon and all text after the next line that ends with a colon

More articles: