Beautifulsoup cannot find tag by text
Beautifulsoup suddenly cannot find a tag from its text.
I have html where this tag appears:
<span class="date">Telefon: <b>+421 902 808 344</b></span>
BS4 cannot find this tag:
telephone = soup.find('span',{'text':re.compile('.*Telefon.*')})
print telephone
>>> None
I have tried many ways like
find('span',text='Telefon: ')
or
find('span', text=re.compile('Telefon: .*')
But nothing works. I've already tried changing html.parser
to lxml
.
What could be wrong?
source to share
BeautifulSoup treats a string Telefon:
as bs4.element.NavigableString
inside a tag span
. So you can find it with
import bs4
import re
soup = bs4.BeautifulSoup('<span class="date">Telefon: <b>+421 902 808 344</b></span>')
for span in soup.find_all('span', {'class':"date"}):
if span.find(text=re.compile('Telefon:')):
for text in span.stripped_strings:
print(text)
# Telefon:
# +421 902 808 344
Or you can use lxml directly:
import lxml.html as LH
root = LH.fromstring('<span class="date">Telefon: <b>+421 902 808 344</b></span>')
for span in root.xpath('//span[@class="date" and contains(text(), "Telefon:")]'):
print(span.text_content())
# Telefon: +421 902 808 344
source to share