Problem parsing HTML with BeautifulSoup library

Question

Problem parsing HTML with BeautifulSoup library

I am working with BS library for parsing HTML. My task is to remove everything between the title tags. Therefore, if I have <head> A lot of Crap! </head>

, then the result should be <head></head>

. This is the code for it

raw_html = "entire_web_document_as_string"
soup = BeautifulSoup(raw_html)
head = soup.head
head.unwrap()
print(head)

And it works great. But I want these changes to happen on the line raw_html

containing the entire html document. How do I reflect these commands on the original line, not just the line head

? Can you share a code snippet for this?

+3

python html parsing html-parsing beautifulsoup

hnvasa Dec 27. 14 at 20:39

source to share

1 answer

Jivan · Accepted Answer · 2014-12-27T20:43:36+0000

Basically you are asking how to export HTML string from BS entitysoup

.

You can do it like this:

# Python 2.7
modified_raw_html = unicode(soup)

# Python3
modified_raw_html = str(soup)

Problem parsing HTML with BeautifulSoup library

More articles: