Remove duplicate characters from String using regex keeping first events

Question

Remove duplicate characters from String using regex keeping first events

I know how to remove duplicate characters from String and keep first occurrences without regex:

String method(String s){
  String result = "";
  for(char c : s.toCharArray()){
    result += result.contains(c+"")
     ? ""
     : c;
  }
  return result;
}

// Example input: "Type unique chars!"
// Output:        "Type uniqchars!"

I know how to remove duplicate characters from String and keep last occurrences with regex:

String method(String s){
  return s.replaceAll("(.)(?=.*\\1)", "");
}

// Example input: "Type unique chars!"
// Output:        "Typnique chars!"

As for my question, is it possible, with a regex, to remove duplicate characters from a String, but keep the first occurrences instead of the last?

Why am I asking: I came across this codegolf answer using the following function (based on the first example above):

String f(char[]s){String t="";for(char c:s)t+=t.contains(c+"")?"":c;return t;}

and I was wondering if it could be done shorter with regex and String input. But even if it's longer, I'm just curious at all if it is possible to remove duplicate characters from a String with a regex while keeping the first occurrences of each character.

+3

java string regex regex-group regex-lookarounds

Kevin Cruijssen 23 Mar 17 at 10:37

source to share

1 answer

Wiktor Stribiżew · Accepted Answer · 2017-03-23T17:19:28+0000

This is not the shortest option, and includes not only a regular expression, but a variation. You can flip the string before running the existing regex and then reverse the result back.

public static String g(StringBuilder s){
  return new StringBuilder(
   s.reverse().toString()
     .replaceAll("(?s)(.)(?=.*\\1)", ""))
     .reverse().toString();
}

View Java Online Demo

Note, I suggest adding (?s)

(= inline modifier flag Pattern.DOTALL

) to the regex so that it .

can match any character, including newline (rather .

than all line breaks by default).

Remove duplicate characters from String using regex keeping first events

More articles: