Compare two lists with both exact value match and similar values ​​(using Java)

I have two very large lists to compare. I compared them using retainAll () method and got a list of common elements. But I also want to get similar matches.

ArrayList<String> list1 = new ArrayList<String>(Arrays.asList("John","Mary"," Mr. John Marsh","Mrs. Mary Dsouza","abc","xyz"));
ArrayList<String> list2 = new ArrayList<String>(Arrays.asList("John","Mary","Tim","Sam"));
list1.retainAll( list2 );
System.out.println( list1 );

      

it gives me a way out [John, Mary]

I want similar matches like [John, Mary, Mr. John Marsh, Mrs. Mary Dsouza]

How to proceed? Just an idea will suffice.

+3


source to share


3 answers


Ok, although I am afraid to post this answer as I think it is very rude, but still I will go ahead and post it. Fingers crossed :).

retainAll

uses equals

internally, and since it string

is final

class

, we cannot manipulate it, but we can create around it wrapper

and provide a custom implementation equals

. But it adds complexity to the space.

Here's what I did (used contains

in equals method).



public class FindAlike{


public static void main(String[] args) {
    ArrayList<StringWrapper> list1 = new ArrayList<StringWrapper>(Arrays.asList(new StringWrapper("John"),new StringWrapper("Mary")
    ,new StringWrapper(" Mr. John Marsh"),new StringWrapper("Mrs. Mary Dsouza"),new StringWrapper("abc"),new StringWrapper("xyz")));
    ArrayList<StringWrapper> list2 = new ArrayList<StringWrapper>(Arrays.asList(new StringWrapper("John"),new StringWrapper("Mary"),
            new StringWrapper("Tim"),new StringWrapper("Sam")));
    list1.retainAll( list2 );
    System.out.println( list1 );
}

private static class StringWrapper{

    private String value;

    public StringWrapper(String value) {
        this.value = value;
    }

    public String getValue(){
        return this.value;
    }

    @Override
    public boolean equals(Object obj) { 
        return this.value.contains(((StringWrapper)obj).getValue());
    }

    @Override
    public String toString() {
        return this.value;
    }

}
}

      

For the data data I got the following result - [John, Mary, Mr. John Marsh, Ms. Mary Zouza]

+3


source


try it

for(String s1 : list1)
{
    for (String s2: list2)
    {
       if(s1.equals(s2) || s1.contains(s2) || s2.contains(s1))
       {
           list3.add(s1);
       }
    }

}

      



list3 provides you with the items you need.

0


source


I think you don't want to do any pretty parsing on these lines. If it's just a string comparison, check out this post and analyze these similarity algorithms.

I highlight these algorithms below (in case this entry is dead)

  • Cosine similarity
  • similarity to Jaccard
  • Bone ratio
  • Similarity matching
  • Similarity over overlap

I don't think you can reduce the number of iterations as it will always (should) be list1.length * list2.lenght. The only area you can optimize is where you test for similarity. I would also like to point out that regex and contains operations are expensive. So see if you can use one of the above algorithms at this point.

Please check with us if you have come up with a better solution. Hooray!!

0


source







All Articles