Insert html string into BeautifulSoup object
I am trying to insert a html string into a BeautifulSoup object. If I paste it directly bs4 sanitizes the html. If I take a html string and create a soup out of it and insert that I have problems using the function find
. This post-thread on SO suggests that inserting BeautifulSoup objects can cause problems. I use the solution from this post and recreate the soup every time I paste.
But of course the best way is to insert html string into soup.
EDIT: I'll add some code as an example of what the problem is
from bs4 import BeautifulSoup
mainSoup = BeautifulSoup("""
<html>
<div class='first'></div>
<div class='second'></div>
</html>
""")
extraSoup = BeautifulSoup('<span class="first-content"></span>')
tag = mainSoup.find(class_='first')
tag.insert(1, extraSoup)
print mainSoup.find(class_='second')
# prints None
source to share
The simplest way, if you already have a html string, is to insert another BeautifulSoup object.
from bs4 import BeautifulSoup
doc = '''
<div>
test1
</div>
'''
soup = BeautifulSoup(doc, 'html.parser')
soup.div.append(BeautifulSoup('<div>insert1</div>', 'html.parser'))
print soup.prettify()
Output:
<div>
test1
<div>
insert1
</div>
</div>
Update 1
How about this? The idea is to use BeautifulSoup to generate the correct AST node (span tag). This seems to fix the No problem.
import bs4
from bs4 import BeautifulSoup
mainSoup = BeautifulSoup("""
<html>
<div class='first'></div>
<div class='second'></div>
</html>
""", 'html.parser')
extraSoup = BeautifulSoup('<span class="first-content"></span>', 'html.parser')
tag = mainSoup.find(class_='first')
tag.insert(1, extraSoup.span)
print mainSoup.find(class_='second')
Output:
<div class="second"></div>
source to share
The best way to do this is to create a new tag span
and insert it into yours mainSoup
. This is the method for. .new_tag
In [34]: from bs4 import BeautifulSoup
In [35]: mainSoup = BeautifulSoup("""
....: <html>
....: <div class='first'></div>
....: <div class='second'></div>
....: </html>
....: """)
In [36]: tag = mainSoup.new_tag('span')
In [37]: tag.attrs['class'] = 'first-content'
In [38]: mainSoup.insert(1, tag)
In [39]: print(mainSoup.find(class_='second'))
<div class="second"></div>
source to share