Is there a way to resize an array that does not support the original size (or a convenience mode of operation)?
As a simplified example, let's say I have a dataset of 40 sorted values. The values ββin this example are integers, although this is not necessary for the actual dataset.
import numpy as np
data = np.linspace(1,40,40)
I am trying to find the maximum value within a dataset for certain window sizes. The formula for calculating the window sizes gives a pattern that is best done with arrays (in my opinion). For simplicity, say the indices denoting the size of the window are a list [1,2,3,4,5]
; this matches the dimensions of the window [2,4,8,16,32]
(template 2**index
).
## this code looks long because I've provided docstrings
## just in case the explanation was unclear
def shapeshifter(num_col, my_array=data):
"""
This function reshapes an array to have 'num_col' columns, where
'num_col' corresponds to index.
"""
return my_array.reshape(-1, num_col)
def looper(num_col, my_array=data):
"""
This function calls 'shapeshifter' and returns a list of the
MAXimum values of each row in 'my_array' for 'num_col' columns.
The length of each row (or the number of columns per row if you
prefer) denotes the size of each window.
EX:
num_col = 2
==> window_size = 2
==> check max( data[1], data[2] ),
max( data[3], data[4] ),
max( data[5], data[6] ),
.
.
.
max( data[39], data[40] )
for k rows, where k = len(my_array)//num_col
"""
my_array = shapeshifter(num_col=num_col, my_array=data)
rows = [my_array[index] for index in range(len(my_array))]
res = []
for index in range(len(rows)):
res.append( max(rows[index]) )
return res
So far, the code is fine. I tested it with the following:
check1 = looper(2)
check2 = looper(4)
print(check1)
>> [2.0, 4.0, ..., 38.0, 40.0]
print(len(check1))
>> 20
print(check2)
>> [4.0, 8.0, ..., 36.0, 40.0]
print(len(check2))
>> 10
So far so good. Now here's my problem.
def metalooper(col_ls, my_array=data):
"""
This function calls 'looper' - which calls
'shapeshifter' - for every 'col' in 'col_ls'.
EX:
j_list = [1,2,3,4,5]
==> col_ls = [2,4,8,16,32]
==> looper(2), looper(4),
looper(8), ..., looper(32)
==> shapeshifter(2), shapeshifter(4),
shapeshifter(8), ..., shapeshifter(32)
such that looper(2^j) ==> shapeshifter(2^j)
for j in j_list
"""
res = []
for col in col_ls:
res.append(looper(num_col=col))
return res
j_list = [2,4,8,16,32]
check3 = metalooper(j_list)
Running the above code provides this error:
ValueError: total size of new array must be unchanged
From an 40 data points
array can be changed to 2 columns
from 20 rows
or 4 columns
from 10 rows
or 8 columns
from 5 rows
, BUT in 16 columns
, an array cannot be changed without trimming the data with 40/16 β integer
. I believe this is a problem with my code, but I don't know how to fix it.
I hope there is a way to truncate the last values ββon each line that don't fit in every window. If this is not possible, I hope I can add zeros to fill in records that maintain the size of the original array so that I can remove the zeros after. Or maybe even a complex block if
- try
- break
. What are some ways to solve this problem?
source to share
I think this will give you what you want in one step:
def windowFunc(a, window, f = np.max):
return np.array([f(i) for i in np.split(a, range(window, a.size, window))])
with a default f
which will give you the maximum maximum for your windows.
Typically using np.split
and range
, this will allow you to split into a (possibly dangling) list of arrays:
def shapeshifter(num_col, my_array=data):
return np.split(my_array, range(num_col, my_array.size, num_col))
You need a list of arrays, because a 2D array cannot be torn off (each row needs the same number of columns)
If you really want to use zeros, you can use np.lib.pad
:
def shapeshifter(num_col, my_array=data):
return np.lib.pad(my_array, (0, num_col - my.array.size % num_col), 'constant', constant_values = 0).reshape(-1, num_col)
Attention:
Also technically it is possible to use, for example, a.resize(32,2)
which will create a ndArray
null padded (as you requested). But there are some big caveats:
- You will need to calculate the second axis because the
-1
tricks don't work withresize
. -
If the original array
a
references anything else, ita.resize
will fail:ValueError: cannot resize an array that references or is referenced by another array in this way. Use the resize function
-
The function is
resize
(i.e.np.resize(a)
) not equivalenta.resize
, as instead of padding with zeros, it will go back to the beginning.
Since you seem to want to reference a
the number of windows, a.resize
not very helpful. But it's a rabbit hole that's easy to fall into.
EDIT:
Scrolling through the list is slow. If your entrance is long and the windows are small, then there windowFunc
will be a swamp higher in cycles for
. This should be more efficient:
def windowFunc2(a, window, f = np.max):
tail = - (a.size % window)
if tail == 0:
return f(a.reshape(-1, window), axis = -1)
else:
body = a[:tail].reshape(-1, window)
return np.r_[f(body, axis = -1), f(a[tail:])]
source to share
Here's a generalized way of modifying with truncation:
def reshape_and_truncate(arr, shape):
desired_size_factor = np.prod([n for n in shape if n != -1])
if -1 in shape: # implicit array size
desired_size = arr.size // desired_size_factor * desired_size_factor
else:
desired_size = desired_size_factor
return arr.flat[:desired_size].reshape(shape)
Which yours shapeshifter
can be used insteadreshape
source to share