Moving a numpy array
I need to parse the time positions of two objects and I am getting the data in a numpy array:
data = [[0, 1, 2],
[1, 4, 3],
[2, 2, 1]]
so the first column is the position, the second is the time point A at that particular position, and the last time of the column where point B was at that position. It is guaranteed that the data is consistent, that is, if any two strings have the same times - they have the same position in the pseudocode:
data[row1,1] == data[row2,1] <=> data[row1,0] == data[row2,0]
data[row1,2] == data[row2,2] <=> data[row1,0] == data[row2,0]
I would really like to somehow rewrite this array so that it lists all available times and corresponding positions, for example:
parsed = [[1, 0, 2],
[2, 2, 0],
[3, np.nan, 1],
[4, 1, np.nan]]
Here the first column is the time, the second is the position of point A, and the third is the position of point B. np.nan should be assigned when I have no information about the position of the point. What I am currently doing is to split the data array into two separate arrays:
moments = set (data [:, 1:3].flatten())
for each in moments:
a = data[:,[1,0]][pos[:,1] == each]
b = data[:,[2,0]][pos[:,2] == each]
and I'll put it together again, as done in here, John Galt. ... This works somehow, but I really hope there might be something like a better solution. Can anyone attack me in the right direction?
source to share
Here's one approach using NumPy array initialization and assignment -
# Gather a and b indices. Get their union, that represents all posssible indices
a_idx = data[:,1]
b_idx = data[:,2]
all_idx = np.union1d(a_idx, b_idx)
# Setup o/p array
out = np.full((all_idx.size,3),np.nan)
# Assign all indices to first col
out[:,0] = all_idx
# Determine the positions of a indices in all indices and assign first col data
out[np.searchsorted(all_idx, a_idx),1] = data[:,0]
# Similarly for b
out[np.searchsorted(all_idx, b_idx),2] = data[:,0]
np.searchsorted
acts like a godsend here as it gives us places where we need to put a
and b
from data
in an already sorted array all_idx
and is known to be quite efficient.
The output for sample data is
In [104]: out
Out[104]:
array([[ 1., 0., 2.],
[ 2., 2., 0.],
[ 3., nan, 1.],
[ 4., 1., nan]])
source to share
In the absence of better ideas, let me insert a one-liner. Disclaimer: It runs 100 times slower than Divakar pure Numpy's solution:
df = pd.DataFrame(data)
pd.concat([df.set_index(ix)[0] for ix in [1,2]], axis=1).reset_index().values
#array([[ 1., 0., 2.],
# [ 2., 2., 0.],
# [ 3., nan, 1.],
# [ 4., 1., nan]])
source to share