Regular expression problem in Java

I am trying to create a regex for a method replaceAll

in Java. Test string abXYabcXYZ

and template is abc

. I want to replace any character except the pattern with +

. For example, string abXYabcXYZ

and pattern [^(abc)]

should return ++++abc+++

, but in my case it returns ab++abc+++

.

public static String plusOut(String str, String pattern) {
    pattern= "[^("+pattern+")]" + "".toLowerCase();
    return str.toLowerCase().replaceAll(pattern, "+");
}
public static void main(String[] args) {
    String text = "abXYabcXYZ";
    String pattern = "abc";
    System.out.println(plusOut(text, pattern));
}

      

When I try to replace the template +

, no problem - abXYabcXYZ

with the template it (abc)

returns abxy+xyz

. The template (^(abc))

returns a string without replacement.

Is there any other way to write NOT (regex) or wildcards as a word?

+2


source to share


6 answers


What you are trying to achieve is quite tricky with regular expressions, as there is no way to express "replace lines that do not match". You will have to use a "positive" pattern of telling what needs to be matched, not what is not.

Also, you want to replace each character with a replacement character, so you need to make sure your pattern matches a single character. Otherwise, you will replace whole strings with one character, returning a shorter string.

For the toy example, you can use negative imagery and lookbehinds to accomplish this, but this can be trickier for real-life examples with longer or more complex strings, since you have to consider each character in your string separately, as well as its context.

Here is the pattern for "not" abc ":

[^abc]|a(?!bc)|(?<!a)b|b(?!c)|(?<!ab)c

      



It consists of five submatrices associated with "or" ( |

), each of which corresponds to one character:

  • [^abc]

    matches any character except a

    , b

    orc

  • a(?!bc)

    matches a

    if not followedbc

  • (?<!a)b

    matches b

    if not preceded bya

  • b(?!c)

    matches b

    if not followedc

  • (?<!ab)c

    matches c

    if not preceded byab

The idea is to match every character that is not in your target word abc

, plus every character in the word that, according to the context, is not part of your word. Context can be viewed using negative references (?!...)

and lookbehinds (?<!...)

.

You can imagine that this method will fail if you have a target word that contains one character more than once, for example example

. It is difficult to express "match e

unless followed x

and not preceded l

".

Especially for dynamic templates, it is much easier to do a positive search and then replace every character that does not match the second blank, as others have suggested.

+11


source


[^ ...] will match a single character that is not ...

So your pattern "[^ (abc)]" says "matches a single character that is not a, b, c, or a left or right parenthesis"; and indeed, this is what happens in your test.

It's hard to say "replace all characters that are not part of the string" abc "in one trivial regex. What you could do instead to achieve what you want could be some nasty thing, like

while the input string still contains "abc"
   find the next occurrence of "abc"
   append to the output a string containing as many "+"s as there are characters before the "abc"
   append "abc" to the output string
   skip, in the input string, to a position just after the "abc" found
append to the output a string containing as many "+"s as there are characters left in the input

      



or perhaps if the alphabet entered is limited, you can use regular expressions to do something like

replace all occurrences of "abc" with a single character that does not occur anywhere in the existing string
replace all other characters with "+"
replace all occurrences of the target character with "abc"

      

which will be more readable but may not work as well

+1


source


Negation of regular expressions is usually difficult. I think you can use a negative view. Perhaps something like this:

String pattern = "(?<!ab).(?!abc)";

      

I have not tested it, so it may not work for degenerate cases. And the performance can be terrible. It is probably best to use a multi-stage algorithm.

Edit : No. I think it won't work for every case. You will most likely spend more time debugging the regular expression than you will algorithmically with some extra code.

0


source


Try to solve it without regex:

String out = "";
int i;
for(i=0; i<text.length() - pattern.length() + 1; ) {
    if (text.substring(i, i + pattern.length()).equals(pattern)) {
        out += pattern;
        i += pattern.length();
    }
    else {
        out += "+";
        i++;
    }
}
for(; i<text.length(); i++) {
    out += "+";
}

      

0


source


Instead of one replaceAll, you can always try something like:

   @Test
    public void testString() {
        final String in = "abXYabcXYabcHIH";
        final String expected = "xxxxabcxxabcxxx";
        String result = replaceUnwanted(in);
        assertEquals(expected, result);
    }

    private String replaceUnwanted(final String in) {
        final Pattern p = Pattern.compile("(.*?)(abc)([^a]*)");
        final Matcher m = p.matcher(in);
        final StringBuilder out = new StringBuilder();
        while (m.find()) {
            out.append(m.group(1).replaceAll(".", "x"));
            out.append(m.group(2));
            out.append(m.group(3).replaceAll(".", "x"));
        }
        return out.toString();
    }

      

0


source


Instead of using, replaceAll(...)

I would go for the approach Pattern/Matcher

:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static String plusOut(String str, String pattern) {
        StringBuilder builder = new StringBuilder();
        String regex = String.format("((?:(?!%s).)++)|%s", pattern, pattern);
        Matcher m = Pattern.compile(regex).matcher(str.toLowerCase());
        while(m.find()) {
            builder.append(m.group(1) == null ? pattern : m.group().replaceAll(".", "+"));
        }
        return builder.toString();
    }

    public static void main(String[] args) {
        String text = "abXYabcXYZ";
        String pattern = "abc";
        System.out.println(plusOut(text, pattern));
    }

}

      

Note what you need to use Pattern.quote(...)

if yours String pattern

contains regex metacharacters.

Edit . I haven't seen the approach Pattern/Matcher

already suggested by the toolkit (although slightly different) ...

0


source







All Articles