Change 1D NumPy array from (implicit) string to primary column order

I have a 1D array in NumPy that implicitly represents some 2D data in row order. Here's a trivial example:

import numpy as np
# My data looks like [[1,2,3,4], [5,6,7,8]]
a = np.array([1,2,3,4,5,6,7,8])

      

I want to get a 1D array in primary column order (i.e. b = [1,5,2,6,3,7,4,8]

in the example above). I usually just did the following:

mat = np.reshape(a, (-1,4))
b = mat.flatten('F')

      

Unfortunately, the length of my input array is not an exact multiple of the length of the string I want (i.e. a = [1,2,3,4,5,6,7]

), so I cannot call reshape

. However, I want to store this extra data, which can be quite large since my lines are quite long. Is there an easy way to do this in NumPy?

+3


source to share


2 answers


Use some value to represent null to make the array a multiple of how you want to split it. If casting to float is acceptable, you can use nan to represent the added elements representing zeros. Then reshape to 2D, move the call and reshape to 1D. Then eliminate the zeros.



import numpy as np
a = np.array([1,2,3,4,5,6,7]) # input
b = np.concatenate( (a, [np.NaN]) ) # add a NaN to make it 8 = 4x2
c = b.reshape(2,4).transpose().reshape(8,)  # reshape to 2x4, transpose, reshape to 8x1
d = c[-np.isnan(c)]  # remove NaN
print d

[ 1.  5.  2.  6.  3.  7.  4.]

      

+1


source


The easiest way I can think of is not trying to use reshape

with methods like ravel('F')

, but just to concatenate fragmented views of your array.

For example:

>>> cols = 4
>>> a = np.array([1,2,3,4,5,6,7])
>>> np.concatenate([a[i::cols] for i in range(cols)])
array([1, 5, 2, 6, 3, 7, 4])

      

This works for any array length and any number of columns:

>>> cols = 5
>>> b = np.arange(17)
>>> np.concatenate([b[i::cols] for i in range(cols)])
array([ 0,  5, 10, 15,  1,  6, 11, 16,  2,  7, 12,  3,  8, 13,  4,  9, 14])

      




Alternatively, use as_strided

to change. The fact that the array is a

too small to fit the shape (2, 4)

doesn't matter: you just end up garbage (i.e. everything in memory) in the last place:

>>> np.lib.stride_tricks.as_strided(a, shape=(2, 4))
array([[        1,         2,         3,         4],
       [        5,         6,         7, 168430121]])

>>> _.flatten('F')[:7]
array([1, 5, 2, 6, 3, 7, 4])

      

In general, given the array b

and the number of columns required cols

, you can do this:

>>> x = np.lib.stride_tricks.as_strided(b, shape=(len(b)//cols + 1, cols)) # reshape to min 2d array needed to hold array b
>>> np.concatenate((x[:,:len(b)%cols].ravel('F'), x[:-1, len(b)%cols:].ravel('F')))

      

This unravels the "good" part of the array (those columns that do not contain spam values) and the bad part (except for the spam values, which are in the bottom row) and concatenates the two unrolled arrays. For example:

>>> cols = 5
>>> b = np.arange(17)
>>> x = np.lib.stride_tricks.as_strided(b, shape=(len(b)//cols + 1, cols))
>>> np.concatenate((x[:,:len(b)%cols].ravel('F'), x[:-1, len(b)%cols:].ravel('F')))
array([ 0,  5, 10, 15,  1,  6, 11, 16,  2,  7, 12,  3,  8, 13,  4,  9, 14])

      

+2


source







All Articles