How do I get the xpath of text between or or <br/">?
Well, you can use "//br/text()"
, but this will return all text nodes in tags <br>
. But since the above is not a well-formed xml, I'm not sure how you are going to use the xpath on it. Regular expression is usually bad for html, but there are html (not xhtml) parsers available. I hesitate to suggest one for the ruby simply because it is not "my area" and I will just search ...
source to share
There are several questions here:
-
XPath works with XML - you have HTML that is not XML (basically the tags don't match, so the XML parser throws an exception when you give it that text)
-
XPath usually works by detecting attributes within tags as well. If your tags
<br>
don't actually contain text, they just sit in between, that would be tricky too
Because of this, you probably want to use XPath (or similar) to get the content of the div, then split the string based on <br>
occurrences.
As you pointed out this question with ruby, I would suggest looking into hpricot as it is a really good and fast HTML (and XML) parsing library that should be a lot more useful than running away from XPath
source to share
Try the following, which gets all text siblings from tags <br>
as an array of strings stripped from trailing and leading spaces:
require 'rubygems'
reguire 'nokogiri'
doc = Nokogiri::HTML(DATA)
fruits =
doc.xpath('//br/following-sibling::text()
| //br/preceding-sibling::text()').map do |fruit| fruit.to_s.strip end
puts fruits
__END__
</div>
apple
<br>
banana
<br/>
watermelon
<br>
orange
Is this what you want?
source to share