some text

Python, lovely soup, get all class names

given the html code, one could say:

 <div class="class1">
    <span class="class2">some text</span>
    <span class="class3">some text</span>
    <span class="class4">some text</span>
    </div>
      

Run codeHide result


How do I get all the class names? ie: ['class1', 'class2', 'class3', 'class4']

I tried:

soup.find_all(class_=True)

      

But it fetches the whole tag and then I need to do some regex in the string

+3


source to share


1 answer


You can handle every instance Tag

found as a dictionary
when it comes to retrieving attributes. Note that the attribute value class

will be a list since it class

is a special "multi-valued" attribute :

classes = []
for element in soup.find_all(class_=True):
    classes.extend(element["class"])

      

Or:



classes = [value 
           for element in soup.find_all(class_=True) 
           for value in element["class"]]

      

Demo:

In [1]: from bs4 import BeautifulSoup

In [2]: data = """
   ...: <div class="class1">
   ...:     <span class="class2">some text</span>
   ...:     <span class="class3">some text</span>
   ...:     <span class="class4">some text</span>
   ...: </div>"""

In [3]: soup = BeautifulSoup(data, "html.parser")

In [4]: classes = [value
   ...:            for element in soup.find_all(class_=True)
   ...:            for value in element["class"]]

In [5]: print(classes)
['class1', 'class2', 'class3', 'class4']

      

+2


source







All Articles