Python re.findall from file

I am trying to extract all the text between two keywords in a text file. The keywords appear multiple times in the file, so I'll have several blocks of good text.

The input.txt file is like this:

bad bad keyword1 GOOD DATA keyword2 bad
bad bad bad keyword1 MORE 
GOOD DATA keyword2 bad bad 

      

This does not work:

import re

f = open('input.txt', 'r')
trim = re.findall('keyword1(.+?)keyword2', f.read())
print trim

      

It returns an empty list:

[]

      

+3


source to share


2 answers


import re

s = "bad bad keyword1 GOOD DATA " \
    "keyword2 bad bad bad bad " \
    "keyword1 MORE GOOD DATA " \
    "keyword2 bad bad"

for i in re.findall('keyword1(.*?)keyword2', s, re.DOTALL):
    print(i)

      



+1


source


If you want to capture all data, you must use the re.DOTALL flag:

trim = re.findall('keyword1(.+?)keyword2', f.read(), re.DOTALL)

      

Usually a dot means getting all characters, but \ n. With the DOTALL attribute, the engine also matches \ n for the dot character.



Output:

[' GOOD DATA ', ' MORE \nGOOD DATA ']

      

+2


source







All Articles