Regex not working in bs4

I am trying to extract some links from a specific file on the website. In the next case, I want quickvideo links, so I use regex to filter these tags with text containing quickvideo

import re
import urllib2
from bs4 import BeautifulSoup

def gethtml(link):
    req = urllib2.Request(link, headers={'User-Agent': "Magic Browser"})
    con = urllib2.urlopen(req)
    html =
    return html

def findLatest():
    url = ""
    head = ""

    soup = BeautifulSoup(gethtml(url), 'html.parser')
    latep = soup.find("a", title=re.compile('Latest Episode'))

    soup = BeautifulSoup(gethtml(head + latep['href']), 'html.parser')
    firstVod = soup.findAll("tr",text=re.compile('rapidvideo'))

    return firstVod



However, the above code returns an empty list. What am I doing wrong?


The problem is here:

firstVod = soup.findAll("tr",text=re.compile('rapidvideo'))


When BeautifulSoup

your regex pattern applies, it will use .string

value attribute of all matching elements tr

. Now .string

has this important caveat - when an element has multiple children, .string



If a tag contains multiple objects, it is not clear which .string

one should be referred to, so it .string

is defined as None


Hence, you have no results.

What you can do is check the actual text of the elements tr

using the search function and calling .get_text()


soup.find_all(lambda tag: == 'tr' and 'rapidvideo' in tag.get_text())




