Python re.findall from file
I am trying to extract all the text between two keywords in a text file. The keywords appear multiple times in the file, so I'll have several blocks of good text.
The input.txt file is like this:
bad bad keyword1 GOOD DATA keyword2 bad
bad bad bad keyword1 MORE
GOOD DATA keyword2 bad bad
This does not work:
import re
f = open('input.txt', 'r')
trim = re.findall('keyword1(.+?)keyword2', f.read())
print trim
It returns an empty list:
[]
+3
source to share
2 answers
If you want to capture all data, you must use the re.DOTALL flag:
trim = re.findall('keyword1(.+?)keyword2', f.read(), re.DOTALL)
Usually a dot means getting all characters, but \ n. With the DOTALL attribute, the engine also matches \ n for the dot character.
Output:
[' GOOD DATA ', ' MORE \nGOOD DATA ']
+2
source to share