Extracting data from body with BeautifulSoup
I am trying to extract some data from this HTML using BeautifulSoup. I only want to return data-buyout="1 alchemy" data-ign="DanForeverr" data-league="Standard" data-name="Sulphur Wastes Map" data-seller="NoCocent" data-sellerid="None" data-tab="~price 1 alch" data-x="6" data-y="7"
* `, but I am not getting any results. I am using the code below. Any help would be appreciated.
parsed = soup.find_all('tbody', class=re.compile('^data-'))
<tbody class="item item-live-c324ceb98e25716a0fad0727e0cd64e3" data-buyout="1 alchemy" data-ign="DanForeverr" data-league="Standard" data-name="Sulphur Wastes Map" data-seller="NoCocent" data-sellerid="None" data-tab="~price 1 alch" data-x="6" data-y="7" id="item-container-0">
<tr class="first-line">
<td class="icon-td">
<div class="icon">
<img alt="Item icon" src="https://web.poecdn.com/image/Art/2DItems/Maps/AtlasMaps/SulphurWastes3.png?scale=1&w=1&h=1&v=48802019c4a2e88af038d75ec1e4b31e3"/>
\n
<div class="sockets" style="position: absolute;">
\n
<div class="sockets-inner" style="position: relative; width:94px;">
\n
</div>
\n
</div>
</div>
</td>
<td class="item-cell">
<h5>
<a class="title itemframe0" href="#" onclick="return false;" target="_blank">
Sulphur Wastes Map
</a>
<span class="found-time-ago">
2 months ago
</span>
</h5>
<ul class="requirements proplist">
<li>
<span class="sortable" data-name="ilvl">
ilvl: 80
</span>
</li>
</ul>
<span class="sockets-raw" style="display:none">
</span>
<ul class="item-mods">
</ul>
</td>
<td class="table-stats">
<table>
<tr class="calibrate">
<th>
</th>
<th>
</th>
<th>
</th>
<th>
</th>
<th>
</th>
<th>
</th>
<th>
</th>
<th>
</th>
<th>
</th>
<th>
</th>
<th>
</th>
<th>
</th>
<th>
</th>
<th>
</th>
</tr>
<tr class="cell-first">
<th class="disabled" colspan="2">
Quality
</th>
<th class="disabled" colspan="2">
Phys.
</th>
<th class="disabled" colspan="2">
Elem.
</th>
<th class="disabled" colspan="2">
APS
</th>
<th class="disabled" colspan="2">
DPS
</th>
<th class="disabled" colspan="2">
pDPS
</th>
<th class="disabled" colspan="2">
eDPS
</th>
</tr>
<tr class="cell-first">
<td class="sortable property " colspan="2" data-name="q" data-value="0">
\xa0
</td>
<td class="sortable property " colspan="2" data-name="pd" data-value="0.0">
</td>
<td class="sortable property " colspan="2" data-ed="" data-name="ed" data-value="0.0">
</td>
<td class="sortable property " colspan="2" data-name="aps" data-value="0">
\xa0
</td>
<td class="sortable property " colspan="2" data-name="dps" data-value="0.0">
\xa0
</td>
<td class="sortable property " colspan="2" data-name="pdps" data-value="0.0">
\xa0
</td>
<td class="sortable property " colspan="2" data-name="edps" data-value="0.0">
\xa0
</td>
</tr>
<tr class="cell-second">
<th class="cell-empty">
</th>
<th class="disabled" colspan="2">
Armour
</th>
<th class="disabled" colspan="2">
Evasion
</th>
<th class="disabled" colspan="2">
Shield
</th>
<th class="disabled" colspan="2">
Block
</th>
<th class="disabled" colspan="2">
Crit.
</th>
<th colspan="2">
Tier
</th>
</tr>
<tr class="cell-second">
<td class="cell-empty">
</td>
<td class="sortable property " colspan="2" data-name="armour" data-value="0">
\xa0
</td>
<td class="sortable property " colspan="2" data-name="evasion" data-value="0">
\xa0
</td>
<td class="sortable property " colspan="2" data-name="shield" data-value="0">
\xa0
</td>
<td class="sortable property " colspan="2" data-name="block" data-value="0">
\xa0
</td>
<td class="sortable property " colspan="2" data-name="crit" data-value="0">
\xa0
</td>
<td class="sortable property " colspan="2" data-name="level" data-value="13">
13
</td>
</tr>
</table>
source to share
Well, you cannot do that, you can extract certain information from this tag for example.
Define x = html, which you posted something like this: x = '''<tbody class="item item-live-c324ceb98e25716a0fad0727e0cd64e3" data-buyout="1 alchemy" data-ign="DanForeverr" data-league="Standard" data-name="Sulphur Wastes Map" data-seller="NoCocent" data-sellerid="None" data-tab="~price 1 alch" data-x="6" data-y="7" id="item-container-0">'''
soup = BeautifulSoup(x,'lxml')
this_class = soup.findAll('tbody',{'class':'item item-live-c324ceb98e25716a0fad0727e0cd64e3'})
#This is used to pinpoint the exact tbody (you can do it your way),
# but it useful because you give it the exacty key-value. (Mostly can't miss)
for i in this_class:
print(i['data-buyout'])
print(i['data-ign'])
print(i['data-name'])
print(i['id'])
You can print every value of these attributes, but if you use soup.findAll
or sou.find
then it will NOT print (one) branch , but also the whole tree (children)
source to share