Slice 1D Array in Numpy without loop

I have an array x

as shown below:

x=np.array(["83838374747412E61E4C202C004D004D004D020202C3CF",
            "8383835F6260127314A0127C078E07090705023846C59F",
            "83838384817E14231D700FAC09BC096808881E1C1BC68F",
            "8484835C535212600F860A1612B90FCF0FCF012A2AC6BF",
            "848484787A7A1A961BAC1E731086005D005D025408C6CF",
            "8484845050620C300D500A9313E613E613012A2A5CC4BF",
            "838383757C7CF18F02192653070D03180318080101BE6F",
            "8584845557570F090E830F4309E5080108012A2A2AC6DF",
            "85858453536B07D608B3124C102A102A1026010101C61F",
            "83838384848411A926791C162048204820484D4444C3BF"], dtype=object)

      

These are the concatenated hex values ​​that I need to trim to convert to integers and then apply the conversion factors. I need an array, for example:

[83,83,83,84,84,84,83,85,85,83]

      

What would be the equivalent x[:,0:2]

, but I cannot slice this array (10,)

. I am trying to do something similar to what a character array would do in MatLab. I will be doing this in millions of lines, so I am trying to avoid the loop.

+3


source to share


2 answers


If you're right after the first two characters of each hex value, one option is to convert your array to dtype

from '|S2'

:

>>> x.astype('|S2')
array(['83', '83', '83', '84', '84', '84', '83', '85', '85', '83'], 
  dtype='|S2')

      

This idea can be generalized to return the first characters n

from each line.



In NumPy, it is much more difficult to do arbitrary parsing of string arrays. The answers on this page explain why this is not the best string tool, but show what might be possible.

Alternatively, the Pandas library facilitates a fast vectorization operation (which is built on top of NumPy). It has a number of very useful string operations that make slicing much easier than plain NumPy:

>>> import pandas as pd
>>> s = pd.Series(x)
>>> s.str.slice(2, 9)
0    8383747
1    83835F6
2    8383848
3    84835C5
4    8484787
5    8484505
6    8383757
7    8484555
8    8584535
9    8383848
dtype: object

      

0


source


Here is a pythonic way to do it

Consider part of your line

x = "83838374747412E61E4C202C004D004D004D020202C3CF8383835F626012"

      

You can combine map

, join

, zip

and iter

to make it work

xArray = array(map(''.join, zip(*[iter(x)]*2)))

      



Then you can handle converting hex values ​​to integer using the vectorized int form

intHex   = vectorize(int)
xIntForm = intHex(xArray,16)

      

I'm not sure about the performance of the function vectorize

, although this is part of numpy.

Greetings

0


source







All Articles