Fetch numpy array submachines whose values exceed a threshold
I have a beep imported as a numpy array and I want to cut it into chunks of numpy arrays. However, I want the chunks to only contain items above the threshold. For example:
threshold = 3
signal = [1,2,6,7,8,1,1,2,5,6,7]
should output two arrays
vec1 = [6,7,8]
vec2 = [5,6,7]
Ok, the above lists, but you get my point.
Here's what I've tried so far but it just kills my RAM
def slice_raw_audio(audio_signal, threshold=5000):
signal_slice, chunks = [], []
for idx in range(0, audio_signal.shape[0], 1000):
while audio_signal[idx] > threshold:
signal_slice.append(audio_signal[idx])
chunks.append(signal_slice)
return chunks
source to share
Here's one approach -
def split_above_threshold(signal, threshold):
mask = np.concatenate(([False], signal > threshold, [False] ))
idx = np.flatnonzero(mask[1:] != mask[:-1])
return [signal[idx[i]:idx[i+1]] for i in range(0,len(idx),2)]
Example run -
In [48]: threshold = 3
...: signal = np.array([1,1,7,1,2,6,7,8,1,1,2,5,6,7,2,8,7,2])
...:
In [49]: split_above_threshold(signal, threshold)
Out[49]: [array([7]), array([6, 7, 8]), array([5, 6, 7]), array([8, 7])]
Runtime test
Other approaches -
# @Psidom soln
def arange_diff(signal, threshold):
above_th = signal > threshold
index, values = np.arange(signal.size)[above_th], signal[above_th]
return np.split(values, np.where(np.diff(index) > 1)[0]+1)
# @Kasramvd soln
def split_diff_step(signal, threshold):
return np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[1::2]
Timing -
In [67]: signal = np.random.randint(0,9,(100000))
In [68]: threshold = 3
# @Kasramvd soln
In [69]: %timeit split_diff_step(signal, threshold)
10 loops, best of 3: 39.8 ms per loop
# @Psidom soln
In [70]: %timeit arange_diff(signal, threshold)
10 loops, best of 3: 20.5 ms per loop
In [71]: %timeit split_above_threshold(signal, threshold)
100 loops, best of 3: 8.22 ms per loop
source to share
Here's what to do for Numpythonic:
In [115]: np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)
Out[115]: [array([1, 2]), array([6, 7, 8]), array([1, 1, 2]), array([5, 6, 7])]
Note that this will give you all the bottom and top elements based on the separation logic (based on diff
and continuing paragraphs), they always alternate, which means you can just separate them by indexing:
In [121]: signal = np.array([1,2,6,7,8,1,1,2,5,6,7])
In [122]: np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[::2]
Out[122]: [array([1, 2]), array([1, 1, 2])]
In [123]: np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[1::2]
Out[123]: [array([6, 7, 8]), array([5, 6, 7])]
You can use comparing the first element of your list with threshold
to see which of the above snippets will give you the top elements.
Typically, you can use the following snippet to get the top elements:
np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[signal[0] < threshold::2]
source to share
Here's one of the options:
above_th = signal > threshold
index, values = np.arange(signal.size)[above_th], signal[above_th]
np.split(values, np.where(np.diff(index) > 1)[0]+1)
# [array([6, 7, 8]), array([5, 6, 7])]
Function wrapper:
def above_thresholds(signal, threshold):
above_th = signal > threshold
index, values = np.arange(signal.size)[above_th], signal[above_th]
return np.split(values, np.where(np.diff(index) > 1)[0]+1)
above_thresholds(signal, threshold)
# [array([6, 7, 8]), array([5, 6, 7])]
source to share