Beautifulsoup splits the text in the <br/"> tag
Is it possible to split the tag text into br tags?
I have this tag content: [u'+420 777 593 531', <br/>, u'+420 776 593 531', <br/>, u'+420 775 593 531']
And I only want to get numbers. Any advice?
EDIT:
[x for x in dt.find_next_sibling('dd').contents if x!=' <br/>']
Doesn't work at all.
source to share
You need to test tags that are modeled as instances Element
. Element
objects have an attribute name
, whereas text elements are not (which are instances NavigableText
):
[x for x in dt.find_next_sibling('dd').contents if getattr(x, 'name', None) != 'br']
Since you only have text and <br />
elements in that element <dd>
, you can simply get all the contained lines instead:
list(dt.find_next_sibling('dd').stripped_strings)
Demo:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''\
... <dt>Term</dt>
... <dd>
... +420 777 593 531<br/>
... +420 776 593 531<br/>
... +420 775 593 531<br/>
... </dd>
... ''')
>>> dt = soup.dt
>>> [x for x in dt.find_next_sibling('dd').contents if getattr(x, 'name', None) != 'br']
[u'\n +420 777 593 531', u'\n +420 776 593 531', u'\n +420 775 593 531', u'\n']
>>> list(dt.find_next_sibling('dd').stripped_strings)
[u'+420 777 593 531', u'+420 776 593 531', u'+420 775 593 531']
source to share