Efficient algorithm for merging objects into ArrayList

I have an ArrayList of Custom Objects (DTO), DTO structure:

private String id;
private String text;
private String query;
private String locatorId;
private Collection<String> categories;
private Collection<String> triggers;

      

I have two tasks:

  • Remove duplicates in an array (seems OK, I should be using a HashSet)
  • Find objects in ArrayList with the same id field and combine them into one object (I have to combine field categories and triggers) and create a final list with combined objects.

What is the most efficient approach for such a task? Also I am interested in using Lambda expression in my algorithm.

+3


source to share


5 answers


It's too easy to combine objects by a specified key using the Stream API. First, define a method merge

in your class Entity

like this:

public Entity merge(Entity other) {
    this.categories.addAll(other.categories);
    this.triggers.addAll(other.triggers);
    return this;
}

      

Then you can create a custom collection collector:



import static java.util.stream.Collectors.*;

public static Collection<Entity> mergeAll(Collection<Entity> input) {
    return input.stream()
                .collect(groupingBy(Entity::getId,
                    collectingAndThen(reducing(Entity::merge), Optional::get)))
                .values();
}

      

Here we are grouping the items Entity

as a result of the method getId

and the downstream collector just calls Entity.merge()

when the same thing is encountered id

(we need to expand on Optional

further). There Entity

is no special implementation for in this solution hashCode()

or equals()

.

Note that this solution modifies existing unbound objects Entity

. If this is not desired, create a new one Entity

in the method merge()

and return it instead (as in @ Marco13's answer).

+4


source


Create Map<Integer, DTO>

and place your id as a key and object as a DTO. And before overlaying on the map, just check if it already contains this key, and if it contains this key, then pull out the DTO object for that key and change the categories and triggers with the old object.



+2


source


One possible solution, as suggested in Naman Gala's answer , is to use Map

from IDs to entities and manually combine objects when they have the same ID.

This is implemented here in a method mergeById

, with some kind of dummy / example input, where

  • need to merge two objects (because of the same ID)
  • two objects are equal (they will also be "merged", which will give the same result as one of the inputs)

...

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.HashMap;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Map;
import java.util.Objects;


public class MergeById
{
    public static void main(String[] args)
    {
        List<Entity> entities = new ArrayList<Entity>();
        entities.add(new Entity("0", "A", "X", "-1", 
            Arrays.asList("C0", "C1"), Arrays.asList("T0", "T1")));
        entities.add(new Entity("0", "A", "X", "-1", 
            Arrays.asList("C2", "C3"), Arrays.asList("T2")));
        entities.add(new Entity("1", "B", "Y", "-2", 
            Arrays.asList("C0"), Arrays.asList("T0", "T1")));
        entities.add(new Entity("1", "B", "Y", "-2", 
            Arrays.asList("C0"), Arrays.asList("T0", "T1")));
        entities.add(new Entity("2", "C", "Z", "-3", 
            Arrays.asList("C0", "C1"), Arrays.asList("T1")));

        System.out.println("Before merge:");
        for (Entity entity : entities)
        {
            System.out.println(entity);
        }

        List<Entity> merged = mergeById(entities);

        System.out.println("After  merge:");
        for (Entity entity : merged)
        {
            System.out.println(entity);
        }
    }

    private static List<Entity> mergeById(Iterable<? extends Entity> entities)
    {
        Map<String, Entity> merged = new HashMap<String, Entity>();
        for (Entity entity : entities)
        {
            String id = entity.getId();
            Entity present = merged.get(id);
            if (present == null)
            {
                merged.put(id, entity);
            }
            else
            {
                merged.put(id, Entity.merge(present, entity));
            }
        }
        return new ArrayList<Entity>(merged.values());
    }

}


class Entity
{
    private String id;
    private String text;
    private String query;
    private String locatorId;
    private Collection<String> categories;
    private Collection<String> triggers;

    Entity()
    {
        categories = new LinkedHashSet<String>();
        triggers = new LinkedHashSet<String>();
    }

    Entity(String id, String text, String query, String locatorId,
        Collection<String> categories, Collection<String> triggers)
    {
        this.id = id;
        this.text = text;
        this.query = query;
        this.locatorId = locatorId;
        this.categories = categories;
        this.triggers = triggers;
    }

    String getId()
    {
        return id;
    }

    static Entity merge(Entity e0, Entity e1)
    {
        if (!Objects.equals(e0.id, e1.id))
        {
            throw new IllegalArgumentException("Different id");
        }
        if (!Objects.equals(e0.text, e1.text))
        {
            throw new IllegalArgumentException("Different text");
        }
        if (!Objects.equals(e0.query, e1.query))
        {
            throw new IllegalArgumentException("Different query");
        }
        if (!Objects.equals(e0.locatorId, e1.locatorId))
        {
            throw new IllegalArgumentException("Different id");
        }
        Entity e = new Entity(e0.id, e0.text, e0.query, e0.locatorId, 
            new LinkedHashSet<String>(), new LinkedHashSet<String>());
        e.categories.addAll(e0.categories);
        e.categories.addAll(e1.categories);
        e.triggers.addAll(e0.triggers);
        e.triggers.addAll(e1.triggers);
        return e;
    }

    @Override
    public String toString()
    {
        return "Entity [id=" + id + ", text=" + text + ", query=" + query +
            ", locatorId=" + locatorId + ", categories=" + categories +
            ", triggers=" + triggers + "]";
    }

}

      

Output signal

Before merge:
Entity [id=0, text=A, query=X, locatorId=-1, categories=[C0, C1], triggers=[T0, T1]]
Entity [id=0, text=A, query=X, locatorId=-1, categories=[C2, C3], triggers=[T2]]
Entity [id=1, text=B, query=Y, locatorId=-2, categories=[C0], triggers=[T0, T1]]
Entity [id=1, text=B, query=Y, locatorId=-2, categories=[C0], triggers=[T0, T1]]
Entity [id=2, text=C, query=Z, locatorId=-3, categories=[C0, C1], triggers=[T1]]
After  merge:
Entity [id=0, text=A, query=X, locatorId=-1, categories=[C0, C1, C2, C3], triggers=[T0, T1, T2]]
Entity [id=1, text=B, query=Y, locatorId=-2, categories=[C0], triggers=[T0, T1]]
Entity [id=2, text=C, query=Z, locatorId=-3, categories=[C0, C1], triggers=[T1]]

      

Regarding the request to do this with lambdas: perhaps some complex application could be written entities.stream().collect(...)

. But since that wasn't the main point of the question, I'll leave that part of the answer to someone else (but I won't skip this little hint: just because you can doesn't mean you need to. Sometimes the loop is just fine).

Also note that this can be summarized easily, perhaps by providing some dictionary databases from databases. But I think the main question needs to be answered.

+2


source


Implement equals

and hashCode

field-based id

in the DTO and store the DTO in Set

. This should fix both problems; given how the equality of your DTOs is now defined, no duplicates with the same id

can exist in Set

.

EDIT:

Since your requirement is to combine the categories and triggers of the existing DTO based on the values ​​from the new one, then a more appropriate data structure to store DTO

would be Map<DTO, DTO>

(since it's cumbersome to retrieve elements from a Set

after you've put them in). Also, I think that categories and triggers in yours DTO

should be defined as Set

s, disallowing duplicates; this will simplify the merge operation:

private Set<String> categories;
private Set<String> triggers;

      

Assuming it DTO

provides accessors ( getCategories

/ getTriggers

) for the above fields (and that the fields never do null

), the concatenation can now be implemented like this:

public static void mergeOrPut(Map<DTO,DTO> dtos, DTO dto) {
    if (dtos.containsKey(dto)) {
        DTO existing = dtos.get(dto);
        existing.getCategories().addAll(dto.getCategories());
        existing.getTriggers().addAll(dto.getTriggers());
    } else {
        dtos.put(dto, dto);
    }
}

      

The above code can also be easily modified to work with Map<Integer, DTO>

, and in this case, you do not need to redefine equals

and hashCode

in the classroom DTO

.

+1


source


If you insist on using a lambda expression, you can do the following:

Set<X> x = new TreeSet<>((o1, o2) -> 
        ((X)o1).getId().equals(((X)o2).getId()) ? 0 : 1);

List<X> list = new ArrayList<>(set.addAll(x));

      

This will create a collection with unique objects according to their IDs. Then, for each object in, list

find the corresponding one from the source list and merge the inner collections.

+1


source







All Articles