Python re.findall from file

Question

Python re.findall from file

I am trying to extract all the text between two keywords in a text file. The keywords appear multiple times in the file, so I'll have several blocks of good text.

The input.txt file is like this:

bad bad keyword1 GOOD DATA keyword2 bad
bad bad bad keyword1 MORE 
GOOD DATA keyword2 bad bad

This does not work:

import re

f = open('input.txt', 'r')
trim = re.findall('keyword1(.+?)keyword2', f.read())
print trim

It returns an empty list:

[]

+3

python

Linda shaw May 20 '15 at 20:40

source to share

2 answers

If you want to capture all data, you must use the re.DOTALL flag:

trim = re.findall('keyword1(.+?)keyword2', f.read(), re.DOTALL)

Usually a dot means getting all characters, but \ n. With the DOTALL attribute, the engine also matches \ n for the dot character.

Output:

[' GOOD DATA ', ' MORE \nGOOD DATA ']

+2

Rodrigo López May 20 '15 at 20:58

source to share

nullptr · Accepted Answer · 2015-05-20T20:58:47+0000

import re

s = "bad bad keyword1 GOOD DATA " \
    "keyword2 bad bad bad bad " \
    "keyword1 MORE GOOD DATA " \
    "keyword2 bad bad"

for i in re.findall('keyword1(.*?)keyword2', s, re.DOTALL):
    print(i)

Python re.findall from file

More articles: