Getting attribute name, not value with BS4
I've managed to pull out most of the various site attributes I'm scraping, but it's time to try and extract the value of anything in the div declaration itself.
Specifically, assuming the following:
<div class="item" data-color="red" data-itemid="abc">Red Slippers</div>
I am after the value inside data-itemid> abc.
I can't seem to get something that doesn't look at the value inside the div: i.e. Red Slippers and that's not what I want.
I've tried the following, with no luck:
item_id = soup.find('data-itemid')
Any ideas?
source to share
You can use find_all
with a predicate to narrow down your search and then access that particular attribute with a dict indexed type.
from bs4 import BeautifulSoup
soup = BeautifulSoup(text, 'html5lib')
items = soup.find_all('div', {'class' : 'item'})
for item in items:
print(item['data-itemid'])
If you want to narrow your search, you can simply add more predicates to your dict, for example:
{'class' : 'item', 'data-color' : 'red', ...} # and so on
source to share