How to extract only specific values ​​of type from delimited string using regex in java

I have a line like below:

SOMETEXT(ABC, DEF, 5, 78.0, MNO)

      

I want to parse it with regex to get List<String>

ABC, DEF and MNO. i.e. I want to avoid numbers of any type and only extract text.

All in all, I have a structure as shown below:

class Detail {
    String name;
    String type;
}

// Sample values of name = "test1" type = "SOMETEXT(ABC,5)"
// Sample values of name = "test2" type = "SOMETEXT(ABC,DEF,2.2)"
// Sample values of name = "test3" type = "SOMETEXT(ABC,DEF)"

      

From List<Detail>

what I want to get Map<String, List<String>>

where the List<String>

texts are retrieved from type and key name

, in java 8-way using streams if possible.

Until now, I only needed to get the first text from a string and I did the following:

Map<String, List<String>> assignOperatorMap = details
    .stream()
    .collect(groupingBy(md -> md.getName(), mapping((Details m) ->
        m.getType().substring(m.getType().indexOf("(") + 1,
        m.getType().indexOf(")")).split("\\,")[0] , 
        Collectors.toList()
    )));

      

The above code gives me:, {test1=[ABC], test2=[ABC], test3=[ABC]}

this is just the first value.

+3


source to share


2 answers


How about this:

List<Detail> details = new ArrayList<>();
details.add(new Detail("test1", "SOMETEXT(ABC,5)"));
details.add(new Detail("test2", "SOMETEXT(ABC,DEF,2.2)"));
details.add(new Detail("test3", "SOMETEXT(ABC,DEF)"));

Map<String, List<String>> assignOperatorMap = details.stream()
    .flatMap(d -> Arrays.stream(d.getType()
            .replaceAll("\\w+\\((.*)\\)", "$1")
            .split(","))
            .filter(s -> s.matches("[A-Za-z_]+"))
            .map(s -> new SimpleEntry<>(d.getName(), s)))
    .collect(groupingBy(Entry::getKey, mapping(Entry::getValue, toList())));

System.out.println(assignOperatorMap); // {test2=[ABC, DEF], test3=[ABC, DEF], test1=[ABC]}

      

The idea is to first capture the string between the parenthesis with:, .replaceAll("\\w+\\((.*)\\)", "$1")

then split it by ,

and filter out what doesn't match [A-Za-z_]+

.

There is also a Entry<String, String>

(Name, Type) bundle creation trick to avoid the need to stream twice, since each Detail

can now give multiple type strings, we must somehow flatten them into List<String>

(instead of a List<String[]>

). (preferably it would have been done with the Java 9 flatMapping

collector, but it's not here yet).




how can i extend this regex to ignore some texts like HOURS, MINUTES

You can create Set<String>

with the words you want to ignore and filter based on that in the second call filter

:

Set<String> ignore = new HashSet<>();
ignore.add("HOURS");
ignore.add("MINUTES");

...
.filter(s -> s.matches("[A-Za-z_]+"))
.filter(s -> !ignore.contains(s)) // <-- extra filter call
.map(s -> new SimpleEntry<>(d.getName(), s)))
...

      

0


source


You can try something like this if the order doesn't matter:



final List<Detail> details = Arrays.asList(
    new Detail("test1", "SOMETEXT(ABC, DFD)"),
    new Detail("test2", "SOMETEXT(ABC,DEF,2.2)"),
    new Detail("test3", "SOMETEXT(ABC,DEF,GHF)")
);

final Map<String, List<String>> map = details
    .stream()
    .collect(Collectors.groupingBy(
        Detail::getName,
        Collectors.mapping(
            detail -> {
                final String[] values = detail.getType().split("[,(). 0-9]+");
                return Arrays.copyOfRange(values, 1, values.length);
            },
            Collector.of(ArrayList::new,
                (list, array) -> list.addAll(Arrays.asList(array)),
                    (source, target) -> {
                        source.addAll(target);
                        return source;
                    }
                )
            )
        ));

System.out.println(map);
// Output: {test2=[ABC, DEF], test3=[ABC, DEF, GHF], test1=[ABC, DFD]}

      

0


source







All Articles