Xpath for url for import.io

Question

Xpath for url for import.io

I get a list of proposed jobs on this site: http://telekom.jobs/global-careers

I'm trying to get an XPath reference for more information working.

Here is the entire XPath for the first link:

/html/body/div[3]/div/div[2]/div[3]/table/tbody/tr[2]/td/div/a/@href

and this is what I have to insert in import.io:

tr[2]/td/div/a/@href

But it won't work, I don't know why.

Links to more information on job offer pages have XPath:

tr[2]/td/div/a/@href
tr[4]/td/div/a/@href
tr[6]/td/div/a/@href
tr[8]/td/div/a/@href

etc. Maybe why it doesn't work? Because numbers aren't 1,2,3 etc, but 2,4,6? Or am I doing something wrong?

+3

xpath web-crawler import.io

Marcin 07 jan. 15 at 20:04

source to share

1 answer

raza · Accepted Answer · 2015-01-16T10:29:16+0000

If you build the API from URL 2.0 and reload the website with JS other than CSS, you can see the collapsible menu:

The DOM is structured on this website in such a way that all odd lines have the job titles, while more information about the job is hidden in the even lines. We can use the position () XPath property for this, so you can use the following XPath to train the strings manually:

/html/body/div[3]/div/div[2]/div[3]/table/tbody/tr[position() mod 2 = 0]

Which highlights more information fields that give you access to the data inside. From here, you can simply target specific attributes of elements that have title and link access.

Xpath reference: .//a[@class=’forward jobadview’]/@href

xpath header:.//div[@class=’info’]//h3

Having said that due to heavy use of JS on the website, it may not post, so we created an API for the request and you can get the same data as here.

https://import.io/data/mine/?id=0626d49d-5233-469d-9429-707f73f1757a

Xpath for url for import.io

More articles: