Fetch numpy array submachines whose values ​​exceed a threshold

I have a beep imported as a numpy array and I want to cut it into chunks of numpy arrays. However, I want the chunks to only contain items above the threshold. For example:

threshold = 3
signal = [1,2,6,7,8,1,1,2,5,6,7]

      

should output two arrays

vec1 = [6,7,8]
vec2 = [5,6,7]

      

Ok, the above lists, but you get my point.

Here's what I've tried so far but it just kills my RAM

def slice_raw_audio(audio_signal, threshold=5000):

    signal_slice, chunks = [], []

    for idx in range(0, audio_signal.shape[0], 1000):
        while audio_signal[idx] > threshold:
            signal_slice.append(audio_signal[idx])
         chunks.append(signal_slice)
    return chunks

      

+3


source to share


3 answers


Here's one approach -

def split_above_threshold(signal, threshold):
    mask = np.concatenate(([False], signal > threshold, [False] ))
    idx = np.flatnonzero(mask[1:] != mask[:-1])
    return [signal[idx[i]:idx[i+1]] for i in range(0,len(idx),2)]

      

Example run -

In [48]: threshold = 3
    ...: signal = np.array([1,1,7,1,2,6,7,8,1,1,2,5,6,7,2,8,7,2])
    ...: 

In [49]: split_above_threshold(signal, threshold)
Out[49]: [array([7]), array([6, 7, 8]), array([5, 6, 7]), array([8, 7])]

      

Runtime test



Other approaches -

# @Psidom soln
def arange_diff(signal, threshold):
    above_th = signal > threshold
    index, values = np.arange(signal.size)[above_th], signal[above_th]
    return np.split(values, np.where(np.diff(index) > 1)[0]+1)

# @Kasramvd soln   
def split_diff_step(signal, threshold):   
    return np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[1::2]

      

Timing -

In [67]: signal = np.random.randint(0,9,(100000))

In [68]: threshold = 3

# @Kasramvd soln 
In [69]: %timeit split_diff_step(signal, threshold)
10 loops, best of 3: 39.8 ms per loop

# @Psidom soln
In [70]: %timeit arange_diff(signal, threshold)
10 loops, best of 3: 20.5 ms per loop

In [71]: %timeit split_above_threshold(signal, threshold)
100 loops, best of 3: 8.22 ms per loop

      

+2


source


Here's what to do for Numpythonic:

In [115]: np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)
Out[115]: [array([1, 2]), array([6, 7, 8]), array([1, 1, 2]), array([5, 6, 7])]

      

Note that this will give you all the bottom and top elements based on the separation logic (based on diff

and continuing paragraphs), they always alternate, which means you can just separate them by indexing:

In [121]: signal = np.array([1,2,6,7,8,1,1,2,5,6,7])

In [122]: np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[::2]
Out[122]: [array([1, 2]), array([1, 1, 2])]

In [123]: np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[1::2]
Out[123]: [array([6, 7, 8]), array([5, 6, 7])]

      



You can use comparing the first element of your list with threshold

to see which of the above snippets will give you the top elements.

Typically, you can use the following snippet to get the top elements:

np.split(signal, np.where(np.diff(signal > threshold))[0] + 1)[signal[0] < threshold::2]

      

+2


source


Here's one of the options:

above_th = signal > threshold
index, values = np.arange(signal.size)[above_th], signal[above_th]
np.split(values, np.where(np.diff(index) > 1)[0]+1)
# [array([6, 7, 8]), array([5, 6, 7])]

      

Function wrapper:

def above_thresholds(signal, threshold):
    above_th = signal > threshold
    index, values = np.arange(signal.size)[above_th], signal[above_th]
    return np.split(values, np.where(np.diff(index) > 1)[0]+1)

above_thresholds(signal, threshold)
# [array([6, 7, 8]), array([5, 6, 7])]

      

+1


source







All Articles