Python 2.7 re.MULTILINE problems

I'm new to python and I'm trying to change my php regex to python, but I'm having some trouble with this multi-line task. I have been on the internet on the internet for the last two days and I cannot figure out if anyone can help it would be great. Here is the regex I made:

mlsTagRegex = re.compile("<td\swidth=\"13%\"\sclass=\"TopHeader\">(.*?)</td>", re.MULTILINE)
tdTags = mlsTagRegex.findall(output.getvalue())
print tdTags

      

Here is the HTML I would like to find:

<td width="13%" class="TopHeader">

   <span class="red">I WANT THIS PART</span>

</td>

      

and it just gives me an empty array. I'm pretty sure I miss it, probably pretty simple, but as I said, I'm new to python, so if anyone can help? Thank!

ps: the output in findall is the output of pycurl and that the html part is in there.

+3


source to share


2 answers


Use re.DOTALL, so '.' character will match any character, including a newline.



+1


source


You need to use re.DOTALL

to do .

newline matching:

mlsTagRegex = re.compile(r'<td width="13%" class="TopHeader">(.*?)</td>', re.DOTALL)

      



But actually you should avoid using regex to parse html, use BeautifulSoup or lxml .

+2


source







All Articles