Python 2.7 re.MULTILINE problems

Question

Python 2.7 re.MULTILINE problems

I'm new to python and I'm trying to change my php regex to python, but I'm having some trouble with this multi-line task. I have been on the internet on the internet for the last two days and I cannot figure out if anyone can help it would be great. Here is the regex I made:

mlsTagRegex = re.compile("<td\swidth=\"13%\"\sclass=\"TopHeader\">(.*?)</td>", re.MULTILINE)
tdTags = mlsTagRegex.findall(output.getvalue())
print tdTags

Here is the HTML I would like to find:

<td width="13%" class="TopHeader">

   <span class="red">I WANT THIS PART</span>

</td>

and it just gives me an empty array. I'm pretty sure I miss it, probably pretty simple, but as I said, I'm new to python, so if anyone can help? Thank!

ps: the output in findall is the output of pycurl and that the html part is in there.

+3

python python-2.7 regex pycurl

classyhobo 18 March '12 at 3:43

source to share

2 answers

You need to use re.DOTALL

to do .

newline matching:

mlsTagRegex = re.compile(r'<td width="13%" class="TopHeader">(.*?)</td>', re.DOTALL)

But actually you should avoid using regex to parse html, use BeautifulSoup or lxml .

+2

zeekay 18 March At 3:58 am

source to share

Ceramic Pot · Accepted Answer · 2012-03-18T03:57:08+0000

Use re.DOTALL, so '.' character will match any character, including a newline.

Python 2.7 re.MULTILINE problems

More articles: