Finding last line in node using XPath
I was wondering if there is a way to always select the content of a node over a specific element?
I have the following code that I want to extract from:
<div id="someDiv">
<h3>Name</h3>
Some content1
<br/>
<br/>
Address 12345
<br/>
09876 City, Country
<br/>
<span id="tel_number">12345</span>
</div>
Here is an XPath that finds the contents of everything above the range:
//div[@id="someDiv"]/span[@id="tel_number"]/preceding-sibling::node()
Now I need an XPath that always selects content right above the range and nothing else (one line). It should also work if (for some reason) <br/>
there was no over the span.
Hope someone can help with this!
source to share
I found that the best way to get the zip code is as follows:
data = page.search('(//div[@id="someDiv"]/span[@id="tel_number"]/preceding-sibling::node()').map{|data| data.text.cleanup}
data.delete("")
postcode = data.last.match(/\d{5}/).to_s
From there, it's easy to get everything after a choice or before a choice.
source to share
I want to get "09876 City, Country" stripped of any HTML tags
I think below is what you are looking for:
//div[@id="someDiv"]/span[@id="tel_number"]/preceding-sibling::text()[1]
Usage Nokogiri
:
require 'nokogiri'
doc = Nokogiri::HTML::Document.parse <<-EOT
<div id="someDiv">
<h3>Name</h3>
Some content1
<br/>
<br/>
Address 12345
<br/>
09876 City, Country
<br/>
<span id="tel_number">12345</span>
</div>
EOT
doc.xpath("normalize-space(//div[@id='someDiv']/span[@id='tel_number']/preceding-sibling::text()[1])").to_s
# => "09876 City, Country"
source to share