How can I find the smallest positive int efficiently?

I am reading a text where I want to find the end of the first sentence, at this stage the first index is either. ,,,,,,,,, in the string. So here is my Java code:

int next = -1;
int nextQ = text.indexOf("? ");
int nextE = text.indexOf("! ");
int nextDot = text.indexOf(". ");
if (nextDot > 0) {
    next = nextDot;
    if (nextQ > 0){
        if (nextQ < next) {next = nextQ;}
        if (nextE > 0) {
            if (nextE < next) {next = nextE;}
        }
    } else if (nextE > 0){
        if (nextE < next) {next = nextE;}
    }
} else if (nextQ > 0){
    next = nextQ;
    if (nextE > 0 && nextE < next){next = nextE;}
} else if (nextE > 0) { next = nextE;}

      

I believe the code works, but a total of 10 if statements that don't look very neat. I might want to add additional clause delimiters, but I don't think this approach is very flexible. Is there a better way to do the same? Any shorter way to achieve the same result? ... or should I try another programming language for problems like this? Which one?

+3


source to share


5 answers


I suggest using a regular expression to find any of these delimiters at once.

String text = <TEXT>;
int next;
Pattern p = Pattern.compile("\\? |! |\\. ");
Matcher m = p.matcher(text);
if (m.find()) {
   int next = m.start();
} else next = -1;

      



You can modify the regex to fine tune what matches. For example, I would suggest that instead of requiring exactly space after the delimiter, instead you need any space character, so a line break or tab will work as well. It will be as follows: "\\?\\s|!\\s|\\.\\s"

. You could add additional delimiters in a similar manner, and with a little extra work, you could find out which delimiter was triggered.

The documentation for Java regex in the template class is here and a helpful tutorial.

+8


source


Use methods for drying:

int firstDelimiterIndex(String s) {
    return minIndex(s.indexOf(". "), minIndex(s.indexOf("? "), s.indexOf("! ")));
}

int minIndex(int a, int b) {
    if (a == -1) return b;
    if (b == -1) return a;
    return Math.min(a, b);
}

      



Or choose a faster algorithm:

for (int i = 0; i < s.length; i++) {
    switch (s.charAt(i)) {
    case '.':
    case '?':
    case '!':
        if (i + 1 < s.length() && s.charAt(i + 1) == ' ') 
            return i;
    }
}

      

+5


source


Use Math.min and a little modification.

First, rotate -1 to large positive integers:

int largeMinusOne(int a)
{
    return a==-1 ? 9999999 : a;
}

int nextQ = largeMinusOne(text.indexOf("? "));
int nextE = largeMinusOne(...);
int nextDot = largeMinuseOne(...);

      

And now:

int next = Math.min(Math.min(nextQ, nextE), nextDot);

      

+3


source


You can just filter out values ​​that don't match (== -1) (Java 8):

int nextQ = text.indexOf("? ");
int nextE = text.indexOf("! ");
int nextDot = text.indexOf(". ");
OptionalInt res = IntStream.of(nextQ, nextE, nextDot).filter(i -> i != -1).min();
if (res.isPresent())
    // ok, using res.get()
else
    // none of these substrings found

      

This is more of a joke than a real answer, in real life gandaliter's answer should be used.

+2


source


I would suggest to just iterate over the character of the string with a character and stop when you encounter any of those characters. What you are doing now is many times less effective.

0


source







All Articles