Remove pattern from string in Java
Currently I am working on a tool that helps me analyze constantly growing String
, which can look like the following: String s = "AAAAAAABBCCCDDABQ"
. I want to be able to find the sequence A and B, do something, and then remove that sequence from the original String
.
My code looks like this:
while (someBoolean){
if(Pattern.matches("A+B+", s)) {
//Do stuff
//Remove the found pattern
}
if(Pattern.matches("C+D+", s)) {
//Do other stuff
//Remove the found pattern
}
}
return s;
Also, how could I remove three sequences so that it s
just contains "Q"
at the end of the calculation without and an infinite loop?
source to share
You have to use the regex replacement loop, i.e. the appendReplacement(StringBuffer sb, String replacement)
and methods appendTail(StringBuffer sb)
.
To find one of the many patterns, use a |
regular expression and write each pattern separately.
You can then use group(int group)
to get a consistent string for each capturing group (the first group is group 1), which returns null
if that group did not match. For best performance, just check if the group matches start(int group)
, which returns -1
if that group didn't match.
Example:
String s = "AAAAAAABBCCCDDABQ";
StringBuffer buf = new StringBuffer();
Pattern p = Pattern.compile("(A+B+)|(C+D+)");
Matcher m = p.matcher(s);
while (m.find()) {
if (m.start(1) != -1) { // Group 1 found
System.out.println("Found AB: " + m.group(1));
m.appendReplacement(buf, ""); // Replace matched substring with ""
} else if (m.start(2) != -1) { // Group 2 found
System.out.println("Found CD: " + m.group(2));
m.appendReplacement(buf, ""); // Replace matched substring with ""
}
}
m.appendTail(buf);
String remain = buf.toString();
System.out.println("Remain: " + remain);
Output
Found AB: AAAAAAABB Found CD: CCCDD Found AB: AB Remain: Q
source to share
This solution assumes that the string always ends with Q.
String s="AAAAAAABBCCCDDABQ";
Pattern abPattern = Pattern.compile("A+B+");
Pattern cdPattern = Pattern.compile("C+D+");
while (s.length() > 1){
Matcher abMatcher = abPattern.matcher(s);
if (abMatcher.find()) {
s = abMatcher.replaceFirst("");
//Do other stuff
}
Matcher cdMatcher = cdPattern.matcher(s);
if (cdMatcher.find()) {
s = cdMatcher.replaceFirst("");
//Do other stuff
}
}
System.out.println(s);
source to share
You are probably looking for something like this:
String input = "AAAAAAABBCCCDDABQ";
String result = input;
String[] chars = {"A", "B", "C", "D"}; // chars to replace
for (String ch : chars) {
if (result.contains(ch)) {
String pattern = "[" + ch + "]+";
result = result.replaceAll(pattern, ch);
}
}
System.out.println(input); //"AAAAAAABBCCCDDABQ"
System.out.println(result); //"ABCDABQ"
This basically replaces the sequence of each character for one.
If you want to completely remove the sequence, just replace ch
with ""
in the method parameters replaceAll
inside the body.
source to share