A mix of combinations that cut with a total of 4

I still have mylist = list(itertools.product(*a))

The problem is that it creates too many tuples. I want it not to do a tuple if the sum of all tuples is> 4.eg

[(0, 0, 0, 0),
 (0, 0, 0, 1),
 (0, 0, 0, 2),
 (0, 0, 1, 0),
 (0, 0, 1, 1),
 (0, 0, 1, 2),
 (0, 1, 0, 0),
 (0, 1, 0, 1),
 (0, 1, 0, 2),
 (0, 1, 1, 0),
 (0, 1, 1, 1),
 (0, 1, 1, 2),
 (1, 0, 0, 0),
 (1, 0, 0, 1),
 (1, 0, 0, 2),
 (1, 0, 1, 0),
 (1, 0, 1, 1),
 (1, 0, 1, 2),
 (1, 1, 0, 0),
 (1, 1, 0, 1),
 (1, 1, 0, 2),
 (1, 1, 1, 0),
 (1, 1, 1, 1),
 (1, 1, 1, 2)]

      

It doesn't have to do (1, 1, 1, 2)

as it stacks with 5

; while in this example it is only one, in others it is much more.

+3


source to share


1 answer


If your dataset is large you can probably use numpy here.

numpy.indices

provides an equivalentitertools.product

that can also be efficiently filtered,

import numpy as np

arr = np.indices((4, 4, 4, 4)).reshape(4,-1).T
mask = arr.sum(axis=1) < 5
res = arr[mask]
print(res)

#[[0 0 0 0]
# [0 0 0 1]
# [0 0 0 2]
# [0 0 0 3]
# [0 0 1 0]
#  ... 
# [3 0 0 1]
# [3 0 1 0]
# [3 1 0 0]]

      

Otherwise, for small datasets, as pointed out in the comments, itertools.ifilter

pretty fast,



from itertools import product, ifilter
gen = product((0,1,2,3), repeat=4)
res = ifilter(lambda x: sum(x) < 4, gen)
res = list(res) # converting to list only at the end

      

In this particular case, both approaches give comparable performance.

If you need even better performance for this particular case, you can always write your optimized procedure in C or Cython.

+1


source







All Articles