Sum of the maximum element from all possible subsets of size 'k' of the array
I have a very large list with about 10,000 items and each item has an integer of 5 billion. I would like to find the sum of the maximum elements from all possible subsets of size "k" (user-specified) of an array whose maximum size is 10000 elements. The only solution that comes to my mind is to generate each of the subsets (using itertools) and find its maximum element. But that would take an insane amount of time! What would be a pythonic way to solve this problem?
source to share
Don't use python, use math first. This is a combinatorial problem: if you have an array of S
n numbers (n is large) and generates all possible subsets of size k, you want to compute the sum of the maximum elements of the subsets.
Assuming the numbers are different (although they also work if they aren't), you can calculate exactly how often each will appear in the subset, and continue from there without even creating the subset. You had to pass it on math.stackexchange.com
, they would take you apart in an instant. Here it is, but without the pretty math notation:
Sort the array in ascending order and let it S_1
be the smallest (first) number, the
S_2
next smallest, etc. (Note: indexing from 1).
-
S_n
, the largest element is obviously the maximal element of any subset it is a part of and there are exactly(n-1 choose k-1)
such subsets. -
Of the subsets that do not contain S_n, there are
(n-2 choose k-1)
subsets containingS_{n-1}
, in which it is the largest element. -
Continue this until you reach
S_k
the smallest numberk-th
(counting from the smallest), which is the maximum exactly one subset:(k-1 choose k-1) = 1
. Smaller numbers (fromS_1
toS_{k-1}
) can never be maximum: each set of elementsk
will contain something larger. -
Sum up the above
(n-k+1 terms)
and there your answer is:S_n*(n-1 choose k-1) + S_{n-1}*(n-2 choose k-1) + ... + S_k*(k-1 choose k-1)
Writing terms from smallest to largest is just the sum
Sum(i=k..n) S_i * (i-1 choose k-1)
If we were on math.stackexchange, you would get it in correct mathematical notation, but you get the idea.
source to share