How to extract only specific values ββof type from delimited string using regex in java
I have a line like below:
SOMETEXT(ABC, DEF, 5, 78.0, MNO)
I want to parse it with regex to get List<String>
ABC, DEF and MNO. i.e. I want to avoid numbers of any type and only extract text.
All in all, I have a structure as shown below:
class Detail {
String name;
String type;
}
// Sample values of name = "test1" type = "SOMETEXT(ABC,5)"
// Sample values of name = "test2" type = "SOMETEXT(ABC,DEF,2.2)"
// Sample values of name = "test3" type = "SOMETEXT(ABC,DEF)"
From List<Detail>
what I want to get Map<String, List<String>>
where the List<String>
texts are retrieved from type and key name
, in java 8-way using streams if possible.
Until now, I only needed to get the first text from a string and I did the following:
Map<String, List<String>> assignOperatorMap = details
.stream()
.collect(groupingBy(md -> md.getName(), mapping((Details m) ->
m.getType().substring(m.getType().indexOf("(") + 1,
m.getType().indexOf(")")).split("\\,")[0] ,
Collectors.toList()
)));
The above code gives me:,
{test1=[ABC], test2=[ABC], test3=[ABC]}
this is just the first value.
source to share
How about this:
List<Detail> details = new ArrayList<>();
details.add(new Detail("test1", "SOMETEXT(ABC,5)"));
details.add(new Detail("test2", "SOMETEXT(ABC,DEF,2.2)"));
details.add(new Detail("test3", "SOMETEXT(ABC,DEF)"));
Map<String, List<String>> assignOperatorMap = details.stream()
.flatMap(d -> Arrays.stream(d.getType()
.replaceAll("\\w+\\((.*)\\)", "$1")
.split(","))
.filter(s -> s.matches("[A-Za-z_]+"))
.map(s -> new SimpleEntry<>(d.getName(), s)))
.collect(groupingBy(Entry::getKey, mapping(Entry::getValue, toList())));
System.out.println(assignOperatorMap); // {test2=[ABC, DEF], test3=[ABC, DEF], test1=[ABC]}
The idea is to first capture the string between the parenthesis with:, .replaceAll("\\w+\\((.*)\\)", "$1")
then split it by ,
and filter out what doesn't match [A-Za-z_]+
.
There is also a Entry<String, String>
(Name, Type) bundle creation trick to avoid the need to stream twice, since each Detail
can now give multiple type strings, we must somehow flatten them into List<String>
(instead of a List<String[]>
). (preferably it would have been done with the Java 9 flatMapping
collector, but it's not here yet).
how can i extend this regex to ignore some texts like HOURS, MINUTES
You can create Set<String>
with the words you want to ignore and filter based on that in the second call filter
:
Set<String> ignore = new HashSet<>();
ignore.add("HOURS");
ignore.add("MINUTES");
...
.filter(s -> s.matches("[A-Za-z_]+"))
.filter(s -> !ignore.contains(s)) // <-- extra filter call
.map(s -> new SimpleEntry<>(d.getName(), s)))
...
source to share
You can try something like this if the order doesn't matter:
final List<Detail> details = Arrays.asList(
new Detail("test1", "SOMETEXT(ABC, DFD)"),
new Detail("test2", "SOMETEXT(ABC,DEF,2.2)"),
new Detail("test3", "SOMETEXT(ABC,DEF,GHF)")
);
final Map<String, List<String>> map = details
.stream()
.collect(Collectors.groupingBy(
Detail::getName,
Collectors.mapping(
detail -> {
final String[] values = detail.getType().split("[,(). 0-9]+");
return Arrays.copyOfRange(values, 1, values.length);
},
Collector.of(ArrayList::new,
(list, array) -> list.addAll(Arrays.asList(array)),
(source, target) -> {
source.addAll(target);
return source;
}
)
)
));
System.out.println(map);
// Output: {test2=[ABC, DEF], test3=[ABC, DEF, GHF], test1=[ABC, DFD]}
source to share