How can I find the smallest positive int efficiently?
I am reading a text where I want to find the end of the first sentence, at this stage the first index is either. ,,,,,,,,, in the string. So here is my Java code:
int next = -1;
int nextQ = text.indexOf("? ");
int nextE = text.indexOf("! ");
int nextDot = text.indexOf(". ");
if (nextDot > 0) {
next = nextDot;
if (nextQ > 0){
if (nextQ < next) {next = nextQ;}
if (nextE > 0) {
if (nextE < next) {next = nextE;}
}
} else if (nextE > 0){
if (nextE < next) {next = nextE;}
}
} else if (nextQ > 0){
next = nextQ;
if (nextE > 0 && nextE < next){next = nextE;}
} else if (nextE > 0) { next = nextE;}
I believe the code works, but a total of 10 if statements that don't look very neat. I might want to add additional clause delimiters, but I don't think this approach is very flexible. Is there a better way to do the same? Any shorter way to achieve the same result? ... or should I try another programming language for problems like this? Which one?
source to share
I suggest using a regular expression to find any of these delimiters at once.
String text = <TEXT>;
int next;
Pattern p = Pattern.compile("\\? |! |\\. ");
Matcher m = p.matcher(text);
if (m.find()) {
int next = m.start();
} else next = -1;
You can modify the regex to fine tune what matches. For example, I would suggest that instead of requiring exactly space after the delimiter, instead you need any space character, so a line break or tab will work as well. It will be as follows: "\\?\\s|!\\s|\\.\\s"
. You could add additional delimiters in a similar manner, and with a little extra work, you could find out which delimiter was triggered.
The documentation for Java regex in the template class is here and a helpful tutorial.
source to share
Use methods for drying:
int firstDelimiterIndex(String s) {
return minIndex(s.indexOf(". "), minIndex(s.indexOf("? "), s.indexOf("! ")));
}
int minIndex(int a, int b) {
if (a == -1) return b;
if (b == -1) return a;
return Math.min(a, b);
}
Or choose a faster algorithm:
for (int i = 0; i < s.length; i++) {
switch (s.charAt(i)) {
case '.':
case '?':
case '!':
if (i + 1 < s.length() && s.charAt(i + 1) == ' ')
return i;
}
}
source to share
Use Math.min and a little modification.
First, rotate -1 to large positive integers:
int largeMinusOne(int a)
{
return a==-1 ? 9999999 : a;
}
int nextQ = largeMinusOne(text.indexOf("? "));
int nextE = largeMinusOne(...);
int nextDot = largeMinuseOne(...);
And now:
int next = Math.min(Math.min(nextQ, nextE), nextDot);
source to share
You can just filter out values that don't match (== -1) (Java 8):
int nextQ = text.indexOf("? ");
int nextE = text.indexOf("! ");
int nextDot = text.indexOf(". ");
OptionalInt res = IntStream.of(nextQ, nextE, nextDot).filter(i -> i != -1).min();
if (res.isPresent())
// ok, using res.get()
else
// none of these substrings found
This is more of a joke than a real answer, in real life gandaliter's answer should be used.
source to share