Unexpected results with HashSet.Contains and custom IEqualityComparer
I must have some kind of misunderstanding regarding the use of a custom matcher with HashSet
. I am collecting many different types of data which I store as Json. To work with it I use Json.NET, in particular JObject
, JArray
and Jtoken
.
Typically I add some metadata embedded in this stuff at collection time, and it is prefixed with "tbp_". I need to know if a specific bit of data, represented as JObject
before (or not) has been collected . For this I have a custom IEqualityComparer
one that extends the implementation provided by Json.NET. It removes the metadata before checking for value equality with the provided implementation:
public class EntryComparer : JTokenEqualityComparer
{
private static string _excludedPrefix = "tbp_";
public JObject CloneForComparison(JObject obj)
{
var clone = obj.DeepClone() as JObject;
var propertiesToRemove = clone
.Properties()
.Where(p => p.Name.StartsWith(_excludedPrefix));
foreach (var property in propertiesToRemove)
{
property.Remove();
}
return clone;
}
public bool Equals(JObject obj1, JObject obj2)
{
return base.Equals(CloneForComparison(obj1), CloneForComparison(obj2));
}
public int GetHashCode(JObject obj)
{
return base.GetHashCode(CloneForComparison(obj));
}
}
I am using a HashSet to keep track of the data that I am working on, since I just need to know if it exists or not. I am initializing the HashSet with an instance EntryComparer
. My tests:
public class EntryComparerTests
{
EntryComparer comparer;
JObject j1;
JObject j2;
public EntryComparerTests()
{
comparer = new EntryComparer();
j1 = JObject.Parse(@"
{
'tbp_entry_date': '2017-03-25T21:25:53.127993-04:00',
'from_date': '1/6/2017',
'to_date': '2/7/2017',
'use': '324320',
'reading': 'act',
'kvars': '0.00',
'demand': '699.10',
'bill_amt': '$28,750.75'
}");
j2 = JObject.Parse(@"
{
'tbp_entry_date': '2017-03-10T18:59:00.537745-05:00',
'from_date': '1/6/2017',
'to_date': '2/7/2017',
'use': '324320',
'reading': 'act',
'kvars': '0.00',
'demand': '699.10',
'bill_amt': '$28,750.75'
}");
}
[Fact]
public void Test_Equality_Comparer_GetHashCode()
{
Assert.Equal(comparer.GetHashCode(j1), comparer.GetHashCode(j2));
Assert.Equal(true, comparer.Equals(j1, j2));
}
[Fact]
public void Test_Equality_Comparer_Hashset_Contains()
{
var hs = new HashSet<JObject>(comparer);
hs.Add(j1);
Assert.Equal(true, hs.Contains(j2));
}
}
Test_Equality_Comparer_GetHashCode()
passes but Test_Equality_Comparer_Hashset_Contains()
fails. j1
and j2
should be treated as equal and matches the results of the first test, so what am I missing here?
source to share
Change the class signature:
public class EntryComparer : JTokenEqualityComparer, IEqualityComparer<JObject>
otherwise are used GetHashCode()
and Equals()
are used in the base class (which has a different "signature" ... The base class implements IEqualityComparer<JToken>
, so your methods are not "t called HashSet<>
).
Then a small error appears for removing properties:
var propertiesToRemove = clone
.Properties()
.Where(p => p.Name.StartsWith(_excludedPrefix))
.ToArray();
Better would be to "hide" JTokenEqualityComparer
and make it a private field, for example:
public class EntryComparer : IEqualityComparer<JObject>
{
private static readonly JTokenEqualityComparer _comparer = new JTokenEqualityComparer();
private static readonly string _excludedPrefix = "tbp_";
public static JObject CloneForComparison(JObject obj)
{
var clone = obj.DeepClone() as JObject;
var propertiesToRemove = clone
.Properties()
.Where(p => p.Name.StartsWith(_excludedPrefix))
.ToArray();
foreach (var property in propertiesToRemove)
{
property.Remove();
}
return clone;
}
public bool Equals(JObject obj1, JObject obj2)
{
return _comparer.Equals(CloneForComparison(obj1), CloneForComparison(obj2));
}
public int GetHashCode(JObject obj)
{
return _comparer.GetHashCode(CloneForComparison(obj));
}
}
source to share