Python, lovely soup, get all class names
given the html code, one could say:
<div class="class1">
<span class="class2">some text</span>
<span class="class3">some text</span>
<span class="class4">some text</span>
</div>
How do I get all the class names? ie: ['class1', 'class2', 'class3', 'class4']
I tried:
soup.find_all(class_=True)
But it fetches the whole tag and then I need to do some regex in the string
+3
source to share
1 answer
You can handle every instance Tag
found as a dictionary when it comes to retrieving attributes. Note that the attribute value class
will be a list since it class
is a special "multi-valued" attribute :
classes = []
for element in soup.find_all(class_=True):
classes.extend(element["class"])
Or:
classes = [value
for element in soup.find_all(class_=True)
for value in element["class"]]
Demo:
In [1]: from bs4 import BeautifulSoup
In [2]: data = """
...: <div class="class1">
...: <span class="class2">some text</span>
...: <span class="class3">some text</span>
...: <span class="class4">some text</span>
...: </div>"""
In [3]: soup = BeautifulSoup(data, "html.parser")
In [4]: classes = [value
...: for element in soup.find_all(class_=True)
...: for value in element["class"]]
In [5]: print(classes)
['class1', 'class2', 'class3', 'class4']
+2
source to share