Getting attribute name, not value with BS4

I've managed to pull out most of the various site attributes I'm scraping, but it's time to try and extract the value of anything in the div declaration itself.

Specifically, assuming the following:

<div class="item" data-color="red" data-itemid="abc">Red Slippers</div>

      

I am after the value inside data-itemid> abc.

I can't seem to get something that doesn't look at the value inside the div: i.e. Red Slippers and that's not what I want.

I've tried the following, with no luck:

item_id = soup.find('data-itemid')

Any ideas?

+3


source to share


1 answer


You can use find_all

with a predicate to narrow down your search and then access that particular attribute with a dict indexed type.

from bs4 import BeautifulSoup

soup = BeautifulSoup(text, 'html5lib')

items = soup.find_all('div', {'class' : 'item'})
for item in items:
    print(item['data-itemid'])

      



If you want to narrow your search, you can simply add more predicates to your dict, for example:

{'class' : 'item', 'data-color' : 'red', ...} # and so on

      

+4


source







All Articles