Finding duplicates in NSMutableArray
I have a class (colorClass) that contains 2 NSStrings (idNumber and favoriteColor). There is NSMutableArray (arrayColor) which contains over 50,000 colorClass objects. What's the fastest way to find all duplicate idNumbers from all colorClass objects and return them to an array? Right now I am using loop 1 for a loop that copies arrayColor and then filters the copied array using NSPredicate. It takes over 5 minutes to sort the array. How can this be done more efficiently?
source to share
The first question is, does order really matter? If not, use NSMutableSet
or NSMutableDictionary
(whichever makes sense for your application)
The easiest way to eliminate duplicates is to prevent them from appearing in the first place. Before adding anything to yours NSMutableArray
, you can check if this value exists. For example:
- (void)addColor:(NSString *)color withID:(NSString *)id {
NSArray *duplicates = [myArray filteredArrayUsingPredicate:[NSPredicate predicateWithFormat:@"id == %@", id]];
if ([duplicates count] > 0) {
// Optionally report an error/throw an exception
return;
}
}
Otherwise, your best bet is probably to get a list of ids with valueForKeyPath:
, then sort that array, and then run it once to look for duplicates. It would be like this:
- (NSSet *)checkForDuplicateIDs {
NSArray *allIDs = [myArray valueForKeyPath:@"id"];
NSArray *sortedIDs = [allIDs sortedArrayUsingSelector:@selector(compare:)];
NSString *previousID = nil;
NSMutableSet *duplicateIDs = [NSMutableSet set];
for (NSString *anID in sortedIDs) {
if ([previousID isEqualToString:anID]) {
[duplicateIDs addObject:anID];
}
previousID = anID;
}
return [[duplicateIDs copy] autorelease];
}
Remember that sorting the list is still at its best, perhaps in O(n log(n))
. If you can at least keep your objects on your list, you can avoid the cost of sorting them. Avoiding duplicates is best, so sorting the list is better, and the algorithm I gave above is probably the worst.
source to share
So, to tell a little about my previous comments: I am not clear from the context in which this data is actually used. In particular, is it necessary to store all of these objects in a large long array. If it doesn't, then a dictionary might be a better choice for a data structure instead of an array.
Since dictionaries are inherently important data structures for keywords, it could possibly be completely eliminated ColorClass
, but I'll assume here that there is another reason to keep it besides what we know from the question.
If duplicates are not to be allowed at all, then the dictionary can store individual elements, and the code might look something like this:
// colors is an NSMutableDictionary
- (ColorClass*)addColorIfPossible:(ColorClass*)color {
ColorClass *existingColor = [[colors objectForKey:[color idNumber]] retain];
if( existingColor == nil ) {
[colors setObject:color forKey:[color idNumber]];
}
return [existingColor autorelease];
}
And if duplicates are allowed, but there is a need to quickly get all objects with a common ID, then a dictionary of arrays or sets might work:
// colors is an NSMutableDictionary
- (void)addColor:(ColorClass*)color {
NSMutableSet *colorSet = [colors objectForKey:[color idNumber]];
if( !colorSet ) {
// kInitialSetCapacity is a constant with some reasonable value you choose
colorSet = [NSMutableSet setWithCapacity:kInitialSetCapacity];
[colors setObject:colorSet forKey:[color idNumber]];
}
[colorSet addObject:color];
}
- (NSSet*)findDuplicatesForID:(NSString*)idNumber {
// returns nil if no colors with that id, but could
// return an empty set instead with little effort
return [[[colors objectForKey:idNumber] copy] autorelease];
}
If your application needs both to have a giant list of colors in a common order, and to quickly find duplicates, then the classic trade-off between space and time comes: use just an array, or maintain both arrays and a dictionary like this.
source to share