What's the best way to search for specific tokens in a string (in Java)?
Regular expressions should work just fine for this.
Refer to your JavaDoc for
- java.langString.split ()
- java.util.regex package
- java.util.Scanner
Note. StringTokenizer is not what you want, as it breaks characters, not strings. The delim line is a list of characters, any of which will be delimited. This is useful for very simple cases such as an unambiguous comma-separated list.
source to share
Given your example, I think I'll be using a regex and in particular I'll take a look at the grouping functionality Matcher provides.
Tom
String inputString = "abc<B>def</B>ghi<B>j</B>kl";
String stringPattern = "(<B>)([a-zA-Z]+)(<\\/B>)";
Pattern pattern = Pattern.compile(stringPattern);
Matcher matcher = pattern.matcher(inputString);
if (matcher.matches()) {
String firstGroup = matcher.group(1);
String secondGroup = matcher.group(2);
String thirdGroup = matcher.group(3);
}
source to share
The StringTokenizer takes the entire string as an argument, and it's actually not a good idea for large strings. You can also use StreamTokenizer
You also need to look at Scanner .
source to share
It's a bit brute force and makes some assumptions, but it works.
public class SegmentFinder
{
public static void main(String[] args)
{
String string = "abc<B>def</B>ghi<B>j</B>kl";
String startRegExp = "<B>";
String endRegExp = "</B>";
int segmentCounter = 0;
int currentPos = 0;
String[] array = string.split(startRegExp);
for (int i = 0; i < array.length; i++)
{
if (i > 0) // Ignore the first one
{
segmentCounter++;
//this assumes that every start will have exactly one end
String[] array2 = array[i].split(endRegExp);
int elementLenght = array2[0].length();
System.out.println("segment["+segmentCounter +"] = "+ (currentPos+1) +","+ (currentPos+elementLenght) );
for(String s : array2)
{
currentPos += s.length();
}
}
else
{
currentPos += array[i].length();
}
}
}
}
source to share
Does your input look like your example and you need to get text between specific tags? Then a simple StringUtils.substringsBetween (yourString, "<B>", "</B>") using the apache commons lang package ( http://commons.apache.org/lang/ ) should do the job.
If you are going to use a more general solution, for different and possibly nested tags, you might need to look at a parser that takes html input and creates an XML document from it like NekoHTML, TagSoup, jTidy. Then you can use XPath on the XML document to access the content.
source to share