Taking common elements from three sets
I have three numpy arrays.
[40 9 0 12 49 1 3 4 18 13 34 47]
[40 0 28 39 29 27 50 9 42 41]
[40 0 9 48 46 1 38 45 15 27 31 36 3 12 16 41 30 33 22 37 28 4 2 6 50
29 32 49 35 7 11 23 44 42 14 13]
Now I want to get all the elements that are common to two or all of the sets. As in the previous one, the first three are common to all three elements, so they will be preserved. Then you see that 12 is common to 1 and the third set, so even that needs to be kept even if it is not in 2 sets. 50 is common to 2nd and 3rd, so even this needs to be kept even if it is not in the first set.
Thus, in principle, any pair, common or common, should be preserved.
I did something like this, but as clear it will be different from the three sets.
set(list(shortlistvar_rf)) & set(list(shortlistvar_f)) & set(list(shortlistvar_rl))
source to share
Numpy has a number of set operations for 1D arrays that you could use. Before writing any code, pay attention to the general formula you are on:
(a & b) | (b & c) | (c & a)
Can be reduced using Boolean algebra to:
(b & (a | c)) | (a & c)
which requires 4 instead of 5 operations.
With that in mind, you can simply do:
>>> np.union1d(np.intersect1d(b, np.union1d(a, c)), np.intersect1d(a, c))
array([ 0, 1, 3, 4, 9, 12, 13, 27, 28, 29, 40, 41, 42, 49, 50])
source to share
>>> a = [40, 9, 0, 12 ,49 ,1 ,3 ,4 ,18 ,13 ,34 ,47]
>>> b = [40 ,0 ,28 ,39 ,29 ,27 ,50 ,9 ,42 ,41]
>>> c = [40 ,0 ,9 ,48 ,46 ,1 ,38 ,45 ,15 ,27 ,31 ,36 ,3 ,12 ,16 ,41 ,30 ,33 ,22 ,37 ,28 ,4 ,2 ,6 ,50,29 ,32 ,49 ,35 ,7 ,11 ,23 ,44 ,42 ,14 ,13]
>>> (set(a) & set(b)) | (set(a) & set(c)) | (set(b) & set(c))
{0, 1, 3, 4, 40, 9, 42, 41, 12, 13, 49, 50, 27, 28, 29}
source to share
You can combine the unique elements of these three input arrays into one array. Then sort and find the lengths of strings of the same elements. Elements corresponding to line lengths greater than 1
are elements that are in at least two of these original input arrays.
Here's the implementation -
import numpy as np
# Get unique elements versions of input arrays
unqA = np.unique(A)
unqB = np.unique(B)
unqC = np.unique(C)
# Combine them into one single array and then sort it
comb_sorted = np.sort(np.hstack((unqA,unqB,unqC)))
# Find indices where group changes, where a group means a run of idential elements.
# These identical elements basically represent those common elements between inputs.
idx = np.where(np.diff(comb_sorted))[0]
grp_change = np.hstack([ [-1],idx,[comb_sorted.size-1] ])+1
# Finally, get the runlengths of each group, detect those runlength > 1 and,
# get the corresponding elements from the combined array
common_ele = comb_sorted[grp_change[np.diff(grp_change)>1]]
Benchmarking
This section lists some runtime tests comparing the suggested approach with another approach for numpy arrays using union
and intersect
in @Jaime solution
.
Case # 1: For input arrays that already have single elements in them -
Setting up input arrays:
A = np.random.randint(0,1000,[1,1000000]) B = np.random.randint(0,1000,[1,1000000]) C = np.random.randint(0,1000,[1,1000000]) A = A.ravel() B = B.ravel() C = C.ravel() _, idx1 = np.unique(A, return_index=True) A = A[np.sort(idx1)] _, idx2 = np.unique(B, return_index=True) B = B[np.sort(idx2)] _, idx3 = np.unique(C, return_index=True) C = C[np.sort(idx3)]
Runtimes:
In [6]: %timeit concat(A,B,C)
10000 loops, best of 3: 136 µs per loop
In [7]: %timeit union_intersect(A,B,C)
1000 loops, best of 3: 315 µs per loop
Case # 2: For generic input arrays that may have duplicates -
Setting up input arrays:
A = np.random.randint(0,1000,[1,1000000])
B = np.random.randint(0,1000,[1,1000000])
C = np.random.randint(0,1000,[1,1000000])
A = A.ravel()
B = B.ravel()
C = C.ravel()
Runtimes:
In [24]: %timeit concat(A,B,C)
10 loops, best of 3: 102 ms per loop
In [25]: %timeit union_intersect(A,B,C)
10 loops, best of 3: 172 ms per loop
source to share