Webdriver / Selenium: How do I find an element when it doesn't have a class name, id or css selector?

Question

Webdriver / Selenium: How do I find an element when it doesn't have a class name, id or css selector?

Each of the search results "7-pack" here contains the address and phone number for each entry on the right side, thus

enter image description here

For each of them, I want to extract (i) the address and (ii) the phone number. The problem is how these elements are defined in HTML:

<div style="width:146px;float:left;color:#808080;line-height:18px"><span>Houston, TX</span><br><span>United States</span><br><nobr><span>(713) 766-6663</span></nobr></div>

So there is no class name, css selector or id from which I can use find_element_by * (), I will not know the link text, so I cannot use find_element_by_partial_link_text (), and WebDriver does not provide a style search method as far as I know. How do we get around this? I need to reliably fetch the data I want every time, for every search result, for different queries.

Binding languages with WebDriver is Python.

+2

python html selenium selenium-webdriver webdriver

Pyderman June 26. 15 at 14:15

source to share

1 answer

alecxe · Accepted Answer · 2015-06-26T14:42:18+0000

There are at least two key things you can rely on: the container container with id="lclbox"

and the elements with class="intrlu"

corresponding to each element in the result.

How to extract the address and phone number from each resulting element can be different, here is one (definitely not pretty) option associated with finding a phone number by regularly checking each element text span

:

import re

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver


driver = webdriver.Chrome()
driver.get('https://www.google.com/?gws_rd=ssl#q=plumbers%2Bhouston%2Btx')

# waiting for results to load
wait = WebDriverWait(driver, 10)
box = wait.until(EC.visibility_of_element_located((By.ID, "lclbox")))

phone_re = re.compile(r"\(\d{3}\) \d{3}-\d{4}")

for result in box.find_elements_by_class_name("intrlu"):
    for span in result.find_elements_by_tag_name("span"):
        if phone_re.search(span.text):
            parent = span.find_element_by_xpath("../..")
            print parent.text
            break
    print "-----"

I'm sure it can be improved, but I hope it gives you a starting point. Printing:

Houston, TX
(713) 812-7070
-----
Houston, TX
(713) 472-5554
-----
6646 Satsuma Dr
Houston, TX
(713) 896-9700
-----
1420 N Durham Dr
Houston, TX
(713) 868-9907
-----
5630 Edgemoor Dr
Houston, TX
(713) 665-5890
-----
5403 Kirby Dr
Houston, TX
(713) 224-3747
-----
Houston, TX
(713) 385-0349
-----

Webdriver / Selenium: How do I find an element when it doesn't have a class name, id or css selector?

More articles: