How to hstack arrays of numpy records?

[An earlier version of this post had an inaccurate heading "How do I add a single column to a numpy record array?" The question asked in the previous heading has already been partially answered , but this answer is not exactly what the body of an earlier version of this post asked for. I have reformulated the title and, in fact, edited the post to make the distinction clearer. I also explain why I mentioned this earlier than I want to.]


Suppose I have two arrays numpy

x

and y

, each of which consists of r "record" (aka "structured") arrays. Let the form x

be (r, c x), and the form y

be (r, c y). Let's also assume that there is no overlap between x.dtype.names

and y.dtype.names

.

For example, for r = 2, c x= 2 and c y= 1:

import numpy as np
x = np.array(zip((1, 2), (3., 4.)), dtype=[('i', 'i4'), ('f', 'f4')])
y = np.array(zip(('a', 'b')), dtype=[('s', 'a10')])

      

I would like to "horizontally" concatenate x

and y

to create a new array of records z

, having the form (r, c x + c ysub>). This operation should not change x

or at all y

.

In general, z = np.hstack((x, y))

it will not work, because dtype

in x

and y

does not necessarily match. For example, continuing with the example above:

z = np.hstack((x, y))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-def477e6c8bf> in <module>()
----> 1 z = np.hstack((x, y))
TypeError: invalid type promotion

      


Now there is a function numpy.lib.recfunctions.append_fields

that looks like it can do something close to what I'm looking for, but I couldn't get anything out of it: everything I've tried with it fails or produces something other than what I am trying to get.

Can someone please show me explicitly the code (using n.l.r.append_fields

or otherwise 1 ) that would generate from x

and y

, defined in the example above, a new array of records z

, which is equivalent to the horizontal concatenation of x

and y

, and do it without changing either x

or y

?

I am guessing it only takes one or two lines of code to do this. Of course, I'm looking for code that doesn't require creation z

, write by write, iterate over x

and y

. In addition, the code can assume that x

both y

have the same number of records and that there is no overlap between x.dtype.names

and y.dtype.names

. Other than that, the code I'm looking for doesn't need to know anything about x

and y

. Ideally, it should also be agnostic about the number of include arrays. IOW, except for error checking, the code I'm looking for could be the body of a function hstack_rec

, so a new array z

will be the result hstack_rec((x, y))

.


1... although I have to admit that after my recording this perfect failure with numpy.lib.recfunctions.append_fields

me, I got a little curious how this function could be used at all , regardless of its relevance to this post.

+3


source to share


2 answers


I never use recarrays and so someone else will come up with something anti-aliasing, but maybe it merge_arrays

will work?



>>> import numpy.lib.recfunctions as nlr
>>> x = np.array(zip((1, 2), (3., 4.)), dtype=[('i', 'i4'), ('f', 'f4')])
>>> y = np.array(zip(('a', 'b')), dtype=[('s', 'a10')])
>>> x
array([(1, 3.0), (2, 4.0)], 
      dtype=[('i', '<i4'), ('f', '<f4')])
>>> y
array([('a',), ('b',)], 
      dtype=[('s', '|S10')])
>>> z = nlr.merge_arrays([x, y], flatten=True)
>>> z
array([(1, 3.0, 'a'), (2, 4.0, 'b')], 
      dtype=[('i', '<i4'), ('f', '<f4'), ('s', '|S10')])

      

+4


source


This is a very late answer, but maybe it will be useful to someone else. I used this solution asking the same question with most criteria.

It doesn't generate a new numpy array, but with zip

and itertools.chain

it is much faster. In my case, I needed to access each value of each row in sequential order. Here is a benchmark that mimics this use case:

import numpy
from numpy.lib.recfunctions import merge_arrays
from itertools import chain

a = numpy.empty(3, [("col1", int), ("col2", float)])
b = numpy.empty(3, [("col3", int), ("col4", "U1")])

      



Results:

%timeit [i for i in (row for row in merge_arrays([a,b], flatten=True))]
52.9 µs ± 2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit [i for i in (row for row in (chain(i,k) for i,k in zip(a,b)))]
3.47 µs ± 52 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

      

0


source







All Articles