Splitting strings in Java: lookahead and lookbehind with variable length

I want to split String in Java using numbers as delimiters, but keep numbers. Few research has shown me that using the split () method from String would be appropriate, but I didn't figure out how to do it. To further explain my question, I'll give you an example:

Input: 20.55|50|0.5|20|20.55

Required Output: ["20.55","|","50","|","0.5","|","20","|","20.55"]

      

Calling split method like below example without lookahead and lookbehind, I get the output I expected

expression.split("([0-9]+(\\.[0-9]+)?)")

Output: ["|","|","|","|"]

      

But if I try to do it with lookahead:

expression.split("(?=([0-9]+(\\.[0-9]+)?))")

Output: ["2","0.","5","5|","5","0|","0.","5|","2","0|","2","0.","5","5"]    

      

And using lookbehind I am getting an exception:

Exception on stream "main" java.util.regex.PatternSyntaxException: The Look-behind group does not have an apparent maximum length near index 22 (& thetas; = ([0-9] + (. [0-9] +)?))

Can anyone explain this behavior to me and suggest a solution?

PS: I know I can use '|' to break a string, but this is just a silly example, I really need a much more complex regex ...

EDIT:

It seems impossible to do what I want because of the length of the delimiters. As I was looking for a solution to a smaller problem that I could use for the remainder of the exercise, I will rephrase to see if there is a twist like the one found in the second and third answers:

I want to split a String in Java containing an arithmetic expression and keep all its elements. For example:

Input: 20.55 * 0.5 ** cos(360) + sin 0 * cos 90 + 1 * sin (180 + 90) * 0
Output: ["20.55", "*", "0.5", "**", "cos", "(", "360", ")", "+", "sin", "0", "*", "cos", "90", "+", "1", "*", "sin", "(", "180", "+", "90", ")", "*", "0"] 

      

PSS: Please note that I have to use '**' for exponentiation.

EDIT 2 After the answer received by anubhava, a solution was found to split the arithmetic expression into all its elements

Pattern p = Pattern.compile( "\\*\\*|sin|cos|tan|\\d+(?:\\.\\d+)?|[-()+*/%]" );
Matcher matcher = p.matcher(expression);

while(matcher.find())
    System.out.println(matcher.group());

      

+3


source to share


3 answers


You can use this regex for breakdown:

String[] toks = "20.55|50|0.5|20|20.55".split( "(?=[^\\d.])|(?<=[^\\d.])" );

for (String tok: toks)
    System.out.printf("%s%n", tok);

      

Demo version of RegEx




Update:

You can use this regex to match your tokens:

Pattern p = Pattern.compile( "sin|cos|tan|\\d+(?:\\.\d+)?|[-()+*/%]" );

      

Then you can use the method Matcher#find()

in the while loop to get all the negotiated tokens.

+2


source


The problem is that you cannot define variable length lookbehinds. +

, *

And ?

all the same with a variable number of characters. This is a limitation of most regex engines.

However, you can have variable length lookaheads. But in your case, it won't do the job, because the search queries are not consuming the already consistent data.

You want something that does:



([0-9]+(\\.[0-9]+)?)\\K

      

What \K

does it just throws away what has already been agreed upon. Therefore, you will still be split into specific positions and will not be repeated with floats.

+1


source


Try:

(?<=\d)(?=\|)|(?<=\|)(?=\d)

      

DEMO

In Java:

public class RegexTest{
    public static void main(String[] args){
        String string = "20.55|50|0.5|20|20.55";
        System.out.println(Arrays.toString(string.split("(?<=\\d)(?=\\|)|(?<=\\|)(?=\\d)")));
    }
}

      

with the result:

[20.55, |, 50, |, 0.5, |, 20, |, 20.55]

EDIT

To use other characters as delimiters to include "*", "sin", etc., you can change the regular expression to:

(?<=[0-9a-z*])(?=\|)|(?<=\|)(?=[0-9a-z*])

      

DEMO

where [0-9a-z*]

means a number, a letter or "*". If you want to include other characters, just add it to the character class, for example [0-9a-z*E]

, etc.

+1


source







All Articles