Unexplained behavior when using vlen with h5py

I am using h5py to create a dataset. Since I want to store arrays with different #of string sizes, I am using h5py special_type vlen. However, I am experiencing behavior that I cannot explain, maybe you can help me understand what is going on:

>>>> import h5py
>>>> import numpy as np
>>>> fp = h5py.File(datasource_fname, mode='w') 
>>>> dt = h5py.special_dtype(vlen=np.dtype('float32'))
>>>> train_targets = fp.create_dataset('target_sequence', shape=(9549, 5,), dtype=dt)
>>>> test
Out[130]: 
array([[ 0.,  1.,  1.,  1.,  0.,  1.,  1.,  0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.]])
>>>> train_targets[0] = test
>>>> train_targets[0]
Out[138]: 
array([ array([ 0.,  1.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.], dtype=float32),
        array([ 1.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.], dtype=float32),
        array([ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.], dtype=float32),
        array([ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.], dtype=float32),
        array([ 0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.], dtype=float32)], dtype=object)

      

I expect train_targets[0]

to have this shape, however I cannot recognize the strings in my array. It seems that they are completely messy, but this is consistent. By that I mean that every time I try to use the above code it train_targets[0]

looks the same.

To clarify, the first element in mine train_targets

, in this case test

, has a shape (5,11)

, however the second element can have a shape (5,38)

, so I use vlen.

thanks for the help

Mat

+3


source to share


1 answer


I think,

train_targets[0] = test

      

saved your array (11,5)

as an ordered array F

in string train_targets

. According to the form (9549,5)

, this is a string of 5 elements. And since that is vlen

, each element is a 1d array of length 11.

What you come back to train_targets[0]

is an array of 5 arrays, each shape (11,)

, with values ​​taken from test

(order F).

So, I think there are 2 questions - what does 2d form mean and what vlen allows.


My version h5py

is pre v2.3, so I only get the vlen line. But I suspect your problem might be that vlen

it only works with 1d arrays, like byte string expansion.

Whether 5

to shape=(9549, 5,)

do something with 5

in test.shape

? I do not think that this is, at least, not as numpy

and h5py

.

When I make a file following the vlen line example:

>>> f = h5py.File('foo.hdf5')
>>> dt = h5py.special_dtype(vlen=str)
>>> ds = f.create_dataset('VLDS', (100,100), dtype=dt)

      



and then do:

ds[0]='this one string'

      

and look ds[0]

, I am getting an array of objects with 100 elements, each of which is this string. That is, I have set the whole line ds

.

ds[0,0]='another'

      

- the correct way to install only one element.

vlen

is "variable length", not "variable form". Although the documentation at https://www.hdfgroup.org/HDF5/doc/TechNotes/VLTypes.html is not entirely clear, I think you can store 1d arrays with form (11,)

and (38,)

with vlen

, but not 2d.


In fact, the output is train_targets

reproduced with:

In [54]: test1=np.empty((5,),dtype=object)
In [55]: for i in range(5):
    test1[i]=test.T.flatten()[i:i+11]

      

These are 11 values ​​taken from the transposition (order F), but shifted for each auxiliary array.

+1


source







All Articles