Beautiful Soup - Selecting Classes from HTML File
I have an HTML file and I want to take the text from this block shown here:
<strong class="fullname js-action-profile-name">User Name</strong>
<span>‏</span>
<span class="username js-action-profile-name"><s>@</s><b>UserName</b></span>
I want it to display as:
User Name
@UserName
How do I do this with Beautiful Soup?
+3
source to share
3 answers
Use the attribute "text". Example:
>>> b = BeautifulSoup.BeautifulStoneSoup(open('/tmp/x.html'), convertEntities=BeautifulSoup.BeautifulStoneSoup.HTML_ENTITIES)
>>> print b.find(attrs={"id": "container"}).text
User Nameβ@UserName
In x.html I have a div containing the html you specified with the id "container". Note that I am converting & rlm; using BeautifulStoneSoup. To insert a new line (which will not be entered by the browser), simply replace u '\ u200f' with "\ n".
+1
source to share
from bs4 import BeautifulSoup
html = '''<strong class="fullname js-action-profile-name">User Name</strong>
<span>‏</span>
<span class="username js-action-profile-name"><s>@</s><b>UserName</b></span>'''
soup = BeautifulSoup(html)
username = soup.find(attrs={'class':'username js-action-profile-name'}).text
fullname = soup.find(attrs={'class':'fullname js-action-profile-name'}).text
print fullname
print username
Outputs:
User Name
@UserName
Two notes:
-
Use
bs4
if you're starting out / just learning BS. -
You will probably load your HTML from an external file, so replace with a
html
file object.
+1
source to share
This assumes index.html contains the markup from the question:
import BeautifulSoup
def displayUserInfo():
soup = BeautifulSoup.BeautifulSoup(open("index.html"))
fullname_ele = soup.find(attrs={"class": "fullname js-action-profile-name"})
fullname = fullname_ele.contents[0]
print fullname
username_ele = soup.find(attrs={"class": "username js-action-profile-name"})
username = ""
for child in username_ele.findChildren():
username += child.contents[0]
print username
if __name__ == '__main__':
displayUserInfo()
# prints:
# User Name
# @UserName
0
source to share