How to use a pattern to get a value between two known strings
Let me first tell you where I am from. I have a string that is html code from a website, I got this using JSOUP. Anyway, so the html is all on line and I can print it in a text file. So I am trying to get songs from inside this code and every song has the same "tags"
this is a line from a text file I typed on
<div class="title" itemprop="name">
Wrath
</div> </td>
It looks like a string in notepad, but when you copy and paste it, it looks like this. So what I want is anger in the middle, so I tried to create a template to find it using help from this other post on the stack: Java regex to extract text between tags
This is the part of my code related to this
Pattern p = Pattern.compile( "<div class=\"title\" itemprop=\"name\">(.+?)</div> </td>");
Matcher m = p.matcher( html );
while( m.find()) {
quote.add( m.group( 1 ));
}
When it runs, it shows that there is nothing in the ArrayList quote. It may not be working because it counts the gap between them. Any ideas?
source to share
You can use jsoup
to parse as well as load an HTML document:
String site = "http://example.com/";
Document doc = Jsoup.connect(site).get();
String text doc.select("div.title").first().text();
Or just use XPath if that doesn't work. Regular expressions are great for collecting data from unstructured text. However, if you have a structured document such as HTML, you can leave all the heavy lifting to a specially designed parser. Java comes with a javax.xml.xpath
library with which you can search the node tree of your document.
Let's say your document looks like this:
<html>
<body>
<div class="title">Wrath</div>
</body>
</html>
You can do this to find the text in that div:
XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "/html/body/div[@class='title']/text()";
InputSource inputSource = new InputSource("myDocument.html");
NodeList nodes = (NodeList) xpath.evaluate(expression, inputSource, XPathConstants.NODESET);
source to share
If it parses like Perl, you might need to double the value by \
Pattern p = Pattern.compile("<div class=\"title\" itemprop=\"name\">(.*?)<\\/div>");
Should be
Pattern p = Pattern.compile("<div class=\"title\" itemprop=\"name\">(.*?)<\\\\/div>");
But for this type of Regex is the wrong tool
source to share