Translate each item into numpy array according to key

Question

Translate each item into numpy array according to key

I am trying to translate each element numpy.array

according to a given key:

For example:

a = np.array([[1,2,3],
              [3,2,4]])

my_dict = {1:23, 2:34, 3:36, 4:45}

I want to receive:

array([[ 23.,  34.,  36.],
       [ 36.,  34.,  45.]])

I can see how to do this with a loop:

def loop_translate(a, my_dict):
    new_a = np.empty(a.shape)
    for i,row in enumerate(a):
        new_a[i,:] = map(my_dict.get, row)
    return new_a

Is there a more efficient and / or cleaner numpy way?

Edit:

I timed this time, and the method np.vectorize

suggested by DSM is much faster for large arrays:

In [13]: def loop_translate(a, my_dict):
   ....:     new_a = np.empty(a.shape)
   ....:     for i,row in enumerate(a):
   ....:         new_a[i,:] = map(my_dict.get, row)
   ....:     return new_a
   ....: 

In [14]: def vec_translate(a, my_dict):    
   ....:     return np.vectorize(my_dict.__getitem__)(a)
   ....: 

In [15]: a = np.random.randint(1,5, (4,5))

In [16]: a
Out[16]: 
array([[2, 4, 3, 1, 1],
       [2, 4, 3, 2, 4],
       [4, 2, 1, 3, 1],
       [2, 4, 3, 4, 1]])

In [17]: %timeit loop_translate(a, my_dict)
10000 loops, best of 3: 77.9 us per loop

In [18]: %timeit vec_translate(a, my_dict)
10000 loops, best of 3: 70.5 us per loop

In [19]: a = np.random.randint(1, 5, (500,500))

In [20]: %timeit loop_translate(a, my_dict)
1 loops, best of 3: 298 ms per loop

In [21]: %timeit vec_translate(a, my_dict)
10 loops, best of 3: 37.6 ms per loop

In [22]:  %timeit loop_translate(a, my_dict)

+35

python numpy

Akavall 07 June 13 at 20:49

source to share

6 answers

Here's a different approach using numpy.unique

:

>>> a = np.array([[1,2,3],[3,2,1]])
>>> a
array([[1, 2, 3],
       [3, 2, 1]])
>>> d = {1 : 11, 2 : 22, 3 : 33}
>>> u,inv = np.unique(a,return_inverse = True)
>>> np.array([d[x] for x in u])[inv].reshape(a.shape)
array([[11, 22, 33],
       [33, 22, 11]])

+6

John vinyard 07 June 13 at 21:38

source to share

I think it would be better to iterate over the dictionary and set values in all rows and columns "at once":

>>> a = np.array([[1,2,3],[3,2,1]])
>>> a
array([[1, 2, 3],
       [3, 2, 1]])
>>> d = {1 : 11, 2 : 22, 3 : 33}
>>> for k,v in d.iteritems():
...     a[a == k] = v
... 
>>> a
array([[11, 22, 33],
       [33, 22, 11]])

Edit:

While it may not be as sexy as DSM's (really good) answer using numpy.vectorize

, my tests of all the proposed methods show that this approach (using @jamylak's suggestion) is actually slightly faster:

from __future__ import division
import numpy as np
a = np.random.randint(1, 5, (500,500))
d = {1 : 11, 2 : 22, 3 : 33, 4 : 44}

def unique_translate(a,d):
    u,inv = np.unique(a,return_inverse = True)
    return np.array([d[x] for x in u])[inv].reshape(a.shape)

def vec_translate(a, d):    
    return np.vectorize(d.__getitem__)(a)

def loop_translate(a,d):
    n = np.ndarray(a.shape)
    for k in d:
        n[a == k] = d[k]
    return n

def orig_translate(a, d):
    new_a = np.empty(a.shape)
    for i,row in enumerate(a):
        new_a[i,:] = map(d.get, row)
    return new_a


if __name__ == '__main__':
    import timeit
    n_exec = 100
    print 'orig'
    print timeit.timeit("orig_translate(a,d)", 
                        setup="from __main__ import np,a,d,orig_translate",
                        number = n_exec) / n_exec
    print 'unique'
    print timeit.timeit("unique_translate(a,d)", 
                        setup="from __main__ import np,a,d,unique_translate",
                        number = n_exec) / n_exec
    print 'vec'
    print timeit.timeit("vec_translate(a,d)",
                        setup="from __main__ import np,a,d,vec_translate",
                        number = n_exec) / n_exec
    print 'loop'
    print timeit.timeit("loop_translate(a,d)",
                        setup="from __main__ import np,a,d,loop_translate",
                        number = n_exec) / n_exec

Outputs:

orig
0.222067718506
unique
0.0472617006302
vec
0.0357889199257
loop
0.0285375618935

+5

John vinyard 07 June '13 at 21:00

source to share

The numpy_indexed package (disclaimer: I am the author) provides an elegant and efficient vectorized solution to this problem:

import numpy_indexed as npi
remapped_a = npi.remap(a, list(my_dict.keys()), list(my_dict.values()))

The implemented method is similar to the approach mentioned by John Vineyard, but even more general. For example, the elements of an array do not have to be int, but can be of any type, even the nd-subarrays themselves.

If you set the optional "missing" kwarg to "raise" (the default is "ignore"), performance will be slightly better, and you will get a KeyError if not all "a" elements are present in the keys.

+4

Eelco hoogendoorn Jul 26. 16 at 18:27

source to share

If you really should n't be using a dictionary as lookup table, a simple solution would be (for your example):

a = numpy.array([your array])
my_dict = numpy.array([0, 23, 34, 36, 45])     # your dictionary as array

def Sub (myarr, table) :
    return table[myarr] 

values = Sub(a, my_dict)

This will work, of course, only if the indices d

cover all possible values of yours a

, in other words, only for a

with integers entered.

+1

Mikhail V 15 Mar 15 at 12:34 am

source to share

Assuming your dict keys are positive integers, without huge spaces (similar to the range 0 to N), you would be better off converting your translation dict to an array in such a way that my_array[i] = my_dict[i]

and using numpy indexing to do the translation.

Code using this approach:

def direct_translate(a, d):
    src, values = d.keys(), d.values()
    d_array = np.arange(a.max() + 1)
    d_array[src] = values
    return d_array[a]

Testing with random arrays:

N = 10000
shape = (5000, 5000)
a = np.random.randint(N, size=shape)
my_dict = dict(zip(np.arange(N), np.random.randint(N, size=N)))

For these sizes, I'll go around 140 ms

for this approach. The np.get encyclopedia takes about 5.8 s

and unique_translate

around 8 s

.

Possible generalizations:

If you have negative values to translate, you can shift the values in a

and in the dictionary keys with a constant to map them back to positive integers:

def direct_translate(a, d): # handles negative source keys
    min_a = a.min()
    src, values = np.array(d.keys()) - min_a, d.values()
    d_array = np.arange(a.max() - min_a + 1)
    d_array[src] = values
    return d_array[a - min_a]

If the original keys have huge gaps, the initial creation of the array will focus on deleting memory. I would resort to cython to speed up this feature.

+1

Maxim 15 jan. '18 at 13:00

source to share

DSM · Accepted Answer · 2013-06-07T20:53:57+0000

I don't know about efficiency, but you can use np.vectorize

in .get

dictionaries method :

>>> a = np.array([[1,2,3],
              [3,2,4]])
>>> my_dict = {1:23, 2:34, 3:36, 4:45}
>>> np.vectorize(my_dict.get)(a)
array([[23, 34, 36],
       [36, 34, 45]])

Translate each item into numpy array according to key

More articles: