Finding duplicates in NSMutableArray

I have a class (colorClass) that contains 2 NSStrings (idNumber and favoriteColor). There is NSMutableArray (arrayColor) which contains over 50,000 colorClass objects. What's the fastest way to find all duplicate idNumbers from all colorClass objects and return them to an array? Right now I am using loop 1 for a loop that copies arrayColor and then filters the copied array using NSPredicate. It takes over 5 minutes to sort the array. How can this be done more efficiently?

+2


source to share


6 answers


The first question is, does order really matter? If not, use NSMutableSet

or NSMutableDictionary

(whichever makes sense for your application)

The easiest way to eliminate duplicates is to prevent them from appearing in the first place. Before adding anything to yours NSMutableArray

, you can check if this value exists. For example:

- (void)addColor:(NSString *)color withID:(NSString *)id {
    NSArray *duplicates = [myArray filteredArrayUsingPredicate:[NSPredicate predicateWithFormat:@"id == %@", id]];
    if ([duplicates count] > 0) {
        // Optionally report an error/throw an exception
        return;
    }
}

      



Otherwise, your best bet is probably to get a list of ids with valueForKeyPath:

, then sort that array, and then run it once to look for duplicates. It would be like this:

- (NSSet *)checkForDuplicateIDs {
    NSArray *allIDs = [myArray valueForKeyPath:@"id"];
    NSArray *sortedIDs = [allIDs sortedArrayUsingSelector:@selector(compare:)];

    NSString *previousID = nil;
    NSMutableSet *duplicateIDs = [NSMutableSet set];
    for (NSString *anID in sortedIDs) {
        if ([previousID isEqualToString:anID]) {
            [duplicateIDs addObject:anID];
        }
        previousID = anID;
    }

    return [[duplicateIDs copy] autorelease];
}

      

Remember that sorting the list is still at its best, perhaps in O(n log(n))

. If you can at least keep your objects on your list, you can avoid the cost of sorting them. Avoiding duplicates is best, so sorting the list is better, and the algorithm I gave above is probably the worst.

+6


source


The "fastest" will require profiling, but I would tend to make an NSCountedSet from an array, iterate over it, and return an array of elements from the counted set that is countForObject:

greater than 1.



+5


source


Have you thought about using it NSMutableSet

instead? Sets don't allow duplication in the first place, so your problem won't exist. However, a set will not work if the order of the colors matters (since sets have no concept of ordering). I'm not sure about your specific case.

+1


source


This could be faster:

if ([theArray containsObject:theNumber]) {
// remove object
}

      

+1


source


So, to tell a little about my previous comments: I am not clear from the context in which this data is actually used. In particular, is it necessary to store all of these objects in a large long array. If it doesn't, then a dictionary might be a better choice for a data structure instead of an array.

Since dictionaries are inherently important data structures for keywords, it could possibly be completely eliminated ColorClass

, but I'll assume here that there is another reason to keep it besides what we know from the question.

If duplicates are not to be allowed at all, then the dictionary can store individual elements, and the code might look something like this:

// colors is an NSMutableDictionary
- (ColorClass*)addColorIfPossible:(ColorClass*)color {
  ColorClass *existingColor = [[colors objectForKey:[color idNumber]] retain];
  if( existingColor == nil ) {
    [colors setObject:color forKey:[color idNumber]];
  }
  return [existingColor autorelease];
}

      

And if duplicates are allowed, but there is a need to quickly get all objects with a common ID, then a dictionary of arrays or sets might work:

// colors is an NSMutableDictionary
- (void)addColor:(ColorClass*)color {
  NSMutableSet *colorSet = [colors objectForKey:[color idNumber]];
  if( !colorSet ) {
    // kInitialSetCapacity is a constant with some reasonable value you choose
    colorSet = [NSMutableSet setWithCapacity:kInitialSetCapacity];
    [colors setObject:colorSet forKey:[color idNumber]];
  }
  [colorSet addObject:color];
}

- (NSSet*)findDuplicatesForID:(NSString*)idNumber {
  // returns nil if no colors with that id, but could
  // return an empty set instead with little effort
  return [[[colors objectForKey:idNumber] copy] autorelease];
}

      

If your application needs both to have a giant list of colors in a common order, and to quickly find duplicates, then the classic trade-off between space and time comes: use just an array, or maintain both arrays and a dictionary like this.

0


source


    NSMutableSet *uniqueSet = [NSMutableSet setWithArray:arrayOfDuplicates];
    arrayOfDuplicates = [uniqueSet allObjects];

      

0


source







All Articles