Explicit end of input in regex using $

I have this code using a regex to split an input string into two words, where the second word is optional (I know I can use String.split()

in this particular case, but the actual regex is a little more complex):

package com.example;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Dollar {

    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("(.*?)\\s*(?: (.*))?$");   // Works
        //Pattern pattern = Pattern.compile("(.*?)\\s*(?: (.*))?");    // Does not work

        Matcher matcher = pattern.matcher("first second");
        matcher.find();
        System.out.println("first : " + matcher.group(1));
        System.out.println("second: " + matcher.group(2));
    }
}

      

With this code I get the expected output

first : first
second: second

      

and it also works if the second word is not there.

However, if I use a different regexp (no dollar sign at the end), I get blank lines / zeros for the capture groups.

My question is, why should I explicitly put a dollar sign at the end of the regex to match the "end of the input sequence" (as the Javadoc says)? In other words, why is the end of the regex implicitly treated as the end of the input sequence?

+3


source to share


1 answer


This is due to the lazy nature of your regex, which finds and captures many empty matches.

If you use this better regex:

(\S+)(?: (.*))?

      



Then it will also work with:

(\S+)(?: (.*))?$

      

+3


source







All Articles