How can I use Nokogiri to search for specific texts / words on a webpage?

I'm new to nokogiri, but it looks like this would be the tool I would use to clean up a webpage. I am looking for specific words on a web page. The words "Valid", "Requirements Met" and "No Requirements". I am using watir to drive through the website. I currently have:

page = Nokogiri::HTML.parse(browser.html)


to get the html, but I'm not sure where to go from here.

Thanks for the help!


source to share

3 answers

If you are using Watir for site management, I would suggest using Watir for text validation. You can get all the text on the page using:

ie.text      #Where ie is a Watir::IE


Then you can check if these words are included (versus regex):

if ie.text =~ /Valid|Requirements Met|Requirements Not/
  #Do something if the words are on the page


However, if you are looking for specific bits of text, you can use Watir to find specific elements (and not parse text or html). If you can provide a sample HTML of what you're working on, we can help find a more robust solution.



I'm not sure why you are using both. You can get the page with "net / http" or mechanize if you just want to check the text. Anyway, you can check the text in watir with browser.text.match 'Valid'

, same for nokogiri with page.text.match 'Valid'




Should you also use the .text method from Justin's answer along with the standard ruby ​​.include line? a method that returns true or false.

if browser.text.include? /Valid|Requirements Met|Requirements Not/  
  #code to execute if text found
  #code to execute if text not found


It also makes it easier to carry out the one line validation step if that's what you are after

if rspec / oucumber is used

browser.text.should include /Valid|Requirements Met|Requirements Not/


if you are using test: Unit

assert browser.text.include? /Valid|Requirements Met|Requirements Not/




All Articles