Problem parsing HTML with BeautifulSoup library

I am working with BS library for parsing HTML. My task is to remove everything between the title tags. Therefore, if I have <head> A lot of Crap! </head>

, then the result should be <head></head>

. This is the code for it

raw_html = "entire_web_document_as_string"
soup = BeautifulSoup(raw_html)
head = soup.head
head.unwrap()
print(head)

      

And it works great. But I want these changes to happen on the line raw_html

containing the entire html document. How do I reflect these commands on the original line, not just the line head

? Can you share a code snippet for this?

+3


source to share


1 answer


Basically you are asking how to export HTML string from BS entitysoup

.

You can do it like this:



# Python 2.7
modified_raw_html = unicode(soup)

# Python3
modified_raw_html = str(soup)

      

+2


source







All Articles