In Java, what's the fastest way to check if a list contains elements from another list, both of the same type?

Let's say I have a class called MyClass as shown below:

public class MyClass
{
     //Identifier is alpha-numeric. If the identifier starts will 'ZZ'
     //is special special identifier.
     private String identifier = null;
     //Date string format YYYY-MM-DD
     private String dateString = null;
     //Just a flag (not important for this scenario)
     private boolean isCoolCat = false;
     //Default Constructor and getters/setters implemented
     //Overrides the standard Java equals() method.
     //This way, when ArrayList calls contains() for MyClass objects
     //it will only check the Date (for ZZ identifier) 
     //and identifier values against each other instead of
     //also comparing the isCoolCat indicator value.
     @Override
     public boolean equals(Object obj)
     {
          if(this == obj)
          {
               return true;
          }
          if(obj == null)
          {
               return false;
          }
          if(getClass() != obj.getClass())
          {
               return false;
          }
          MyClass other = (MyClass) obj;
          if(this.identifier == null)
          {
               if(other.identifier != null)
               {
                    return false;
               }
          } else if(!this.identifier.equals(other.identifier)) {
               return false;
          }
          if(other.identifier.startsWith("ZZ"))
          {
               if(!this.dateString.equals(other.dateString))
               {
                    return false;
               }
          }
          return true;
     }
}

      

In another class, I have two types of MyClass list, each containing 100,000 objects . I need to check if items in one list are in another list and I am currently doing the following:

`

List<MyClass> inList = new ArrayList<MyClass>();
List<MyClass> outList = new ArrayList<MyClass>();
inList = someMethodForIn();
outList = someMethodForOut();
//For loop iterates through inList and check if outList contains
//MyClass object from inList if it doesn't then it adds it.
for(MyClass inObj : inList)
{
     if(!outList.contains(inObj))
     {
          outList.add(inObj); 
     }
}

      

My question is, is this the fastest way to achieve this? If not, can you show me a better implementation that will give me a performance boost? The list size will not always be 100,000. It currently takes about 2 minutes on my platform for 100,000 sizes. Let's say it can range from 1 to 1,000,000.

+3


source to share


3 answers


For this you want to use Set

. Set

has a method contains

that can determine if an object is in a set in O (1) time.

Several things to watch out for when converting from List<MyClass>

to Set<MyClass>

:

  • You will lose the ordering of items
  • You will lose duplicate items
  • Yours MyClass

    must implement hashcode()

    and equals()

    , and must be consistent .

To convert List

to Set

, you can simply use:



Set<MyObject> s1 = new HashSet<>(inList);
Set<MyObject> s2 = new HashSet<>(outList);

      


This Java doc explains how to find the union, intersection and difference of two sets. In particular, it seems that you are interested in Union:

// transforms s2 into the union of s1 and s2. (The union of two sets 
// is the set containing all of the elements contained in either set.)
s2.addAll(s1)

      

+3


source


Hashing! Hashing is always the answer!

The current complexity of this code is: O(nm)

where n

is size inList

and m

is size outList

.

You can use HashSet

to reduce your difficulty to O(n)

. Since contains

it will now acceptO(1)

This can be done as follows:

   HashSet<MyClass> outSet = new HashSet<>(outList);
   for(MyClass inObj : inList)
   {
        if(!outSet.contains(inObj))
        {
              outList.add(inObj); 
         }
    }

      




Loans and sources.

returning difference between two lists in java

Time complexity contains (Object o), in an ArrayList of objects

HashSet.contains performance

0


source


2 minutes comparing 2 very large lists, probably not going to get a lot of time, so depending on your application, you can set a flag so that things dependent on it can't run until completion and push it to its own thread and let the user do something else (showing them this is ongoing). Or at least put a progress bar. Letting the user know that the application is busy and telling them (ish) how long something will take will only take a few minutes in a very complex calculation, how is it all right, and probably better than just saving a few seconds. users are quite tolerant of delays if they know how long they will be and you tell them there is time to go for coffee.

0


source







All Articles