Problem parsing HTML with BeautifulSoup library
I am working with BS library for parsing HTML. My task is to remove everything between the title tags. Therefore, if I have <head> A lot of Crap! </head>
, then the result should be <head></head>
. This is the code for it
raw_html = "entire_web_document_as_string"
soup = BeautifulSoup(raw_html)
head = soup.head
head.unwrap()
print(head)
And it works great. But I want these changes to happen on the line raw_html
containing the entire html document. How do I reflect these commands on the original line, not just the line head
? Can you share a code snippet for this?
+3
source to share