Finding last line in node using XPath

I was wondering if there is a way to always select the content of a node over a specific element?

I have the following code that I want to extract from:

<div id="someDiv">
   <h3>Name</h3>
   Some content1
   <br/>
   <br/>
   Address 12345
   <br/>
   09876 City, Country
   <br/>
   <span id="tel_number">12345</span>
</div>

      

Here is an XPath that finds the contents of everything above the range:

//div[@id="someDiv"]/span[@id="tel_number"]/preceding-sibling::node()

      

Now I need an XPath that always selects content right above the range and nothing else (one line). It should also work if (for some reason) <br/>

there was no over the span.

Hope someone can help with this!

+1


source to share


3 answers


I found that the best way to get the zip code is as follows:

data = page.search('(//div[@id="someDiv"]/span[@id="tel_number"]/preceding-sibling::node()').map{|data| data.text.cleanup}
data.delete("")
postcode = data.last.match(/\d{5}/).to_s

      



From there, it's easy to get everything after a choice or before a choice.

0


source


Try:

(//div[@id="someDiv"]/span[@id="tel_number"]/preceding-sibling::text())[last()]

      



or if you want to remove spaces

normalize-space((//div[@id="someDiv"]/span[@id="tel_number"]/preceding-sibling::text())[last()])

      

+1


source


I want to get "09876 City, Country" stripped of any HTML tags

I think below is what you are looking for:

//div[@id="someDiv"]/span[@id="tel_number"]/preceding-sibling::text()[1]

      

Usage Nokogiri

:

require 'nokogiri'

doc = Nokogiri::HTML::Document.parse <<-EOT
<div id="someDiv">
   <h3>Name</h3>
   Some content1
   <br/>
   <br/>
   Address 12345
   <br/>
   09876 City, Country
   <br/>
   <span id="tel_number">12345</span>
</div>
EOT

doc.xpath("normalize-space(//div[@id='someDiv']/span[@id='tel_number']/preceding-sibling::text()[1])").to_s
# => "09876 City, Country"

      

0


source







All Articles