Determine if more than half of the keys in an array of size n are the same key in O (n) time?

You have an array or list of keys of known size n. It is not known how many unique keys there are in this list, there can be as few as 0 and up to n and including n. The keys are not in a particular order, and they really cannot be, since these keys have no concept of more or less than just equality or inequality. Now, before you say the hashmap, here is another condition that I think throws the key into this idea: the value of each key is private. The only information you can get about a key is whether it matches another key. So basically:

class key{
    private:
        T data;
        ...
    public:
        ...
        bool operator==(const key &k){return data==k.data}
        bool operator!=(const key &k){return data!=k.data}
};

key array[n];

      

Now, is there an algorithm that can determine if more than half of the keys in an array are the same key in linear time? If not, how about O (n * log (n))? For example, for example, an array only has 3 unique keys. 60% of the array is filled with keys, where key.data == foo, 30% key.data == bar and 10% key.data == derp. The algorithm only needs to determine that more than 50% of the keys are of the same type (data keys == foo) and also return one of those keys.

According to my professor, it can be done in O (n) times, but he says that we only need to find one that can do it in O (n * log (n)) time.

+3


source to share


3 answers


If you can retrieve and hold any key for further comparisons, then your friend will be "Boy-Moore by Majority."



+3


source


If you don't want to use BM algorithm, you can use 2 following algorithm based on the same idea.

Algorithm a. On each run, maintain a set S of M (a small part of N, for example 10) of a pair of item counts, traversing the array for each item:

1.1. If element E is in the set, increase count of the corresponding pair (E,count) -> (E, count+1)
1.2. If not, drop out element with minimal count and insert new pair (E,1)

      



If an element has a frequency F> 0.5, it will be in the set at the end of this procedure with a probability (very roughly, in fact much higher). 1 - (1-F) ^ M, in the second run, calculate the actual frequencies of elements in the set S.

Algorithm b. Take N series of length M randomly selected elements of the array, select the most frequently used element by any method and calculate the frequency for it and the average frequency over the row, the maximum error in calculating the frequency will be F (realFrequency) / sqrt (N). So if you get F_e * 1 - 1.0 / sqrt (N))> 0.5 then you will find the most used item, if you get F_e (1 + 1.0 / sqrt (N)) <0.5 this item does not exist. F_e is the estimated frequency.

0


source


One solution that comes to my mind is ... Select the first element from this array and go through the list and all the matching elements that you put in a separate arraylist. Now you select the second item from the original list and compare it with the first, if they are equal leave this item and select the next one. This might be a possible solution.

-2


source







All Articles