Convert a 32-bit integer to an array of four 8-bit integers in Python
How can I efficiently convert a 32-bit integer to an array of four 8-bit integers in Python?
I currently have the following code which is very slow:
def convert(int32_val):
bin = np.binary_repr(int32_val, width = 32)
int8_arr = [int(bin[0:8],2), int(bin[8:16],2),
int(bin[16:24],2), int(bin[24:32],2)]
return int8_arr
eg:
print convert(1)
>>> [0, 0, 0, 1]
print convert(-1)
>>> [255, 255, 255, 255]
print convert(-1306918380)
>>> [178, 26, 2, 20]
I need to achieve the same behavior for 32 bit unsigned integers.
Additionally. Can I vectorize it for a large array with a numeric value of 32 bit integers?
source to share
Using dtype
as described in:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html
Subdivide int16 into 2 int8‘s, called x and y. 0 and 1 are the offsets in bytes:
np.dtype((np.int16, {'x':(np.int8,0), 'y':(np.int8,1)}))
dtype(('<i2', [('x', '|i1'), ('y', '|i1')]))
Or adapted to your case:
In [30]: x=np.arange(12,dtype=np.int32)*1000
In [39]: dt=np.dtype((np.int32, {'f0':(np.uint8,0),'f1':(np.uint8,1),'f2':(np.uint8,2), 'f3':(np.uint8,3)}))
In [40]: x1=x.view(dtype=dt)
In [41]: x1['f0']
Out[41]: array([ 0, 232, 208, 184, 160, 136, 112, 88, 64, 40, 16, 248], dtype=uint8)
In [42]: x1['f1']
Out[42]: array([ 0, 3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 42], dtype=uint8)
compare
In [38]: x%256
Out[38]: array([ 0, 232, 208, 184, 160, 136, 112, 88, 64, 40, 16, 248])
Additional documentation at http://docs.scipy.org/doc/numpy/user/basics.rec.html
2) Tuple argument: The only relevant case of a tuple that applies to record structures is when the structure is mapped to an existing data type. This is done by pairing, in a tuple, an existing datatype with an appropriate dtype definition (using any of the options described here). As an example (using a definition, using a list, see 3) for more information):
x = np.zeros (3, dtype = ('i4', [('r', 'u1'), ('g', 'u1'), ('b', 'u1'), ('a' , 'and1')]))
array ([0, 0, 0])
x ['r'] # array ([0, 0, 0], dtype = uint8)
This creates an array that looks and acts like a simple int32 array, but also has definitions for fields that only use one int32 byte (a bit like the Fortran equivalent).
One way to get a 2d array of 4 bytes:
In [46]: np.array([x1['f0'],x1['f1'],x1['f2'],x1['f3']])
Out[46]:
array([[ 0, 232, 208, 184, 160, 136, 112, 88, 64, 40, 16, 248],
[ 0, 3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 42],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)
Same idea, but more compact:
In [50]: dt1=np.dtype(('i4', [('bytes','u1',4)]))
In [53]: x2=x.view(dtype=dt1)
In [54]: x2.dtype
Out[54]: dtype([('bytes', 'u1', (4,))])
In [55]: x2['bytes']
Out[55]:
array([[ 0, 0, 0, 0],
[232, 3, 0, 0],
[208, 7, 0, 0],
[184, 11, 0, 0],
[160, 15, 0, 0],
[136, 19, 0, 0],
[112, 23, 0, 0],
[ 88, 27, 0, 0],
[ 64, 31, 0, 0],
[ 40, 35, 0, 0],
[ 16, 39, 0, 0],
[248, 42, 0, 0]], dtype=uint8)
In [56]: x2
Out[56]:
array([ 0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000,
9000, 10000, 11000])
source to share
In Python 3.2 and up, there is a new int
method to_bytes
that can also be used:
>>> convert = lambda n : [int(i) for i in n.to_bytes(4, byteorder='big', signed=True)]
>>>
>>> convert(1)
[0, 0, 0, 1]
>>>
>>> convert(-1)
[255, 255, 255, 255]
>>>
>>> convert(-1306918380)
[178, 26, 2, 20]
>>>
You can use bitwise operations:
def int32_to_int8(n):
mask = (1 << 8) - 1
return [(n >> k) & mask for k in range(0, 32, 8)]
>>> int32_to_int8(32768)
[0, 128, 0, 0]
Or, alternatively, you can use struct
package in Python:
>>> import struct
>>> int32 = struct.pack("I", 32768)
>>> struct.unpack("B" * 4, int32)
(0, 128, 0, 0)
One nice thing you can use in a package struct
is that you can efficiently accomplish this int32
before int8
:
import numpy.random
# Generate some random int32 numbers
x = numpy.random.randint(0, (1 << 31) - 1, 1000)
# Then you can convert all of them to int8 with just one command
x_int8 = struct.unpack('B' * (4*len(x)), buffer(x))
# To verify that the results are valid:
x[0]
Out[29]: 1219620060
int32_to_int8(x[0])
Out[30]: [220, 236, 177, 72]
x_int8[:4]
Out[31]: (220, 236, 177, 72)
# And it FAST!
%timeit struct.unpack('B' * (4*len(x)), buffer(x))
10000 loops, best of 3: 32 µs per loop
%timeit [int32_to_int8(i) for i in x]
100 loops, best of 3: 6.01 ms per loop
UPDATE: Compare struct.unpack
with ndarray.view
:
import numpy as np
# this is fast because it only creates the view, without involving any creation
# of objects in Python
%timeit x.view(np.int8)
1000000 loops, best of 3: 570 ns per loop
If you were to do some actual calculation:
uint8_type = "B" * len(x) * 4
%timeit sum(struct.unpack(uint8_type, buffer(x)))
10000 loops, best of 3: 52.6 µs per loop
# slow because in order to call sum(), implicitly the view object is converted to
# list.
%timeit sum(x.view(np.int8))
1000 loops, best of 3: 768 µs per loop
# use the numpy.sum() function - without creating Python objects
%timeit np.sum(x.view(np.int8))
100000 loops, best of 3: 8.55 µs per loop # <- FAST!
Take a home message: use it ndarray.view()
!
source to share