Extracting data from body with BeautifulSoup

I am trying to extract some data from this HTML using BeautifulSoup. I only want to return data-buyout="1 alchemy" data-ign="DanForeverr" data-league="Standard" data-name="Sulphur Wastes Map" data-seller="NoCocent" data-sellerid="None" data-tab="~price 1 alch" data-x="6" data-y="7"

* `, but I am not getting any results. I am using the code below. Any help would be appreciated.

parsed = soup.find_all('tbody', class=re.compile('^data-'))

      

<tbody class="item item-live-c324ceb98e25716a0fad0727e0cd64e3" data-buyout="1 alchemy" data-ign="DanForeverr" data-league="Standard" data-name="Sulphur Wastes Map" data-seller="NoCocent" data-sellerid="None" data-tab="~price 1 alch" data-x="6" data-y="7" id="item-container-0">
 <tr class="first-line">
  <td class="icon-td">
   <div class="icon">
    <img alt="Item icon" src="https://web.poecdn.com/image/Art/2DItems/Maps/AtlasMaps/SulphurWastes3.png?scale=1&amp;w=1&amp;h=1&amp;v=48802019c4a2e88af038d75ec1e4b31e3"/>
    \n
    <div class="sockets" style="position: absolute;">
     \n
     <div class="sockets-inner" style="position: relative; width:94px;">
      \n
     </div>
     \n
    </div>
   </div>
  </td>
  <td class="item-cell">
   <h5>
    <a class="title itemframe0" href="#" onclick="return false;" target="_blank">
     Sulphur Wastes Map
    </a>
    <span class="found-time-ago">
     2 months ago
    </span>
   </h5>
   <ul class="requirements proplist">
    <li>
     <span class="sortable" data-name="ilvl">
      ilvl: 80
     </span>
    </li>
   </ul>
   <span class="sockets-raw" style="display:none">
   </span>
   <ul class="item-mods">
   </ul>
  </td>
  <td class="table-stats">
   <table>
    <tr class="calibrate">
     <th>
     </th>
     <th>
     </th>
     <th>
     </th>
     <th>
     </th>
     <th>
     </th>
     <th>
     </th>
     <th>
     </th>
     <th>
     </th>
     <th>
     </th>
     <th>
     </th>
     <th>
     </th>
     <th>
     </th>
     <th>
     </th>
     <th>
     </th>
    </tr>
    <tr class="cell-first">
     <th class="disabled" colspan="2">
      Quality
     </th>
     <th class="disabled" colspan="2">
      Phys.
     </th>
     <th class="disabled" colspan="2">
      Elem.
     </th>
     <th class="disabled" colspan="2">
      APS
     </th>
     <th class="disabled" colspan="2">
      DPS
     </th>
     <th class="disabled" colspan="2">
      pDPS
     </th>
     <th class="disabled" colspan="2">
      eDPS
     </th>
    </tr>
    <tr class="cell-first">
     <td class="sortable property " colspan="2" data-name="q" data-value="0">
      \xa0
     </td>
     <td class="sortable property " colspan="2" data-name="pd" data-value="0.0">
     </td>
     <td class="sortable property " colspan="2" data-ed="" data-name="ed" data-value="0.0">
     </td>
     <td class="sortable property " colspan="2" data-name="aps" data-value="0">
      \xa0
     </td>
     <td class="sortable property " colspan="2" data-name="dps" data-value="0.0">
      \xa0
     </td>
     <td class="sortable property " colspan="2" data-name="pdps" data-value="0.0">
      \xa0
     </td>
     <td class="sortable property " colspan="2" data-name="edps" data-value="0.0">
      \xa0
     </td>
    </tr>
    <tr class="cell-second">
     <th class="cell-empty">
     </th>
     <th class="disabled" colspan="2">
      Armour
     </th>
     <th class="disabled" colspan="2">
      Evasion
     </th>
     <th class="disabled" colspan="2">
      Shield
     </th>
     <th class="disabled" colspan="2">
      Block
     </th>
     <th class="disabled" colspan="2">
      Crit.
     </th>
     <th colspan="2">
      Tier
     </th>
    </tr>
    <tr class="cell-second">
     <td class="cell-empty">
     </td>
     <td class="sortable property " colspan="2" data-name="armour" data-value="0">
      \xa0
     </td>
     <td class="sortable property " colspan="2" data-name="evasion" data-value="0">
      \xa0
     </td>
     <td class="sortable property " colspan="2" data-name="shield" data-value="0">
      \xa0
     </td>
     <td class="sortable property " colspan="2" data-name="block" data-value="0">
      \xa0
     </td>
     <td class="sortable property " colspan="2" data-name="crit" data-value="0">
      \xa0
     </td>
     <td class="sortable property " colspan="2" data-name="level" data-value="13">
      13
     </td>
    </tr>
   </table>

      

+3


source to share


3 answers


You are trying to find tag attributes in a tag class that will not work.

Why not find it by ID? Just make sure it contains the substring up to 0.



soup.select("tbody[id*=item-container-]")

      

0


source


Well, you cannot do that, you can extract certain information from this tag for example.

Define x = html, which you posted something like this: x = '''<tbody class="item item-live-c324ceb98e25716a0fad0727e0cd64e3" data-buyout="1 alchemy" data-ign="DanForeverr" data-league="Standard" data-name="Sulphur Wastes Map" data-seller="NoCocent" data-sellerid="None" data-tab="~price 1 alch" data-x="6" data-y="7" id="item-container-0">'''



soup = BeautifulSoup(x,'lxml')

this_class = soup.findAll('tbody',{'class':'item item-live-c324ceb98e25716a0fad0727e0cd64e3'})
#This is used to pinpoint the exact tbody (you can do it your way),
# but it useful because you give it the exacty key-value. (Mostly can't miss)

for i in this_class:
    print(i['data-buyout'])
    print(i['data-ign'])
    print(i['data-name'])
    print(i['id'])

      

You can print every value of these attributes, but if you use soup.findAll

or sou.find

then it will NOT print (one) branch , but also the whole tree (children)

0


source


the combination of the following solved my problem

parsed = soup.select("tbody[id*=item-container-]")
for i in parsed:
    print(i['data-buyout'])
    print(i['data-ign'])
    print(i['data-name'])
    print(i['id'])

      

0


source







All Articles