Unexplained behavior when using vlen with h5py
I am using h5py to create a dataset. Since I want to store arrays with different #of string sizes, I am using h5py special_type vlen. However, I am experiencing behavior that I cannot explain, maybe you can help me understand what is going on:
>>>> import h5py
>>>> import numpy as np
>>>> fp = h5py.File(datasource_fname, mode='w')
>>>> dt = h5py.special_dtype(vlen=np.dtype('float32'))
>>>> train_targets = fp.create_dataset('target_sequence', shape=(9549, 5,), dtype=dt)
>>>> test
Out[130]:
array([[ 0., 1., 1., 1., 0., 1., 1., 0., 1., 0., 0.],
[ 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1.]])
>>>> train_targets[0] = test
>>>> train_targets[0]
Out[138]:
array([ array([ 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 1.], dtype=float32),
array([ 1., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0.], dtype=float32),
array([ 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0.], dtype=float32),
array([ 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0.], dtype=float32),
array([ 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0.], dtype=float32)], dtype=object)
I expect train_targets[0]
to have this shape, however I cannot recognize the strings in my array. It seems that they are completely messy, but this is consistent. By that I mean that every time I try to use the above code it train_targets[0]
looks the same.
To clarify, the first element in mine train_targets
, in this case test
, has a shape (5,11)
, however the second element can have a shape (5,38)
, so I use vlen.
thanks for the help
Mat
source to share
I think,
train_targets[0] = test
saved your array (11,5)
as an ordered array F
in string train_targets
. According to the form (9549,5)
, this is a string of 5 elements. And since that is vlen
, each element is a 1d array of length 11.
What you come back to train_targets[0]
is an array of 5 arrays, each shape (11,)
, with values ββtaken from test
(order F).
So, I think there are 2 questions - what does 2d form mean and what vlen allows.
My version h5py
is pre v2.3, so I only get the vlen line. But I suspect your problem might be that vlen
it only works with 1d arrays, like byte string expansion.
Whether 5
to shape=(9549, 5,)
do something with 5
in test.shape
? I do not think that this is, at least, not as numpy
and h5py
.
When I make a file following the vlen line example:
>>> f = h5py.File('foo.hdf5')
>>> dt = h5py.special_dtype(vlen=str)
>>> ds = f.create_dataset('VLDS', (100,100), dtype=dt)
and then do:
ds[0]='this one string'
and look ds[0]
, I am getting an array of objects with 100 elements, each of which is this string. That is, I have set the whole line ds
.
ds[0,0]='another'
- the correct way to install only one element.
vlen
is "variable length", not "variable form". Although the documentation at https://www.hdfgroup.org/HDF5/doc/TechNotes/VLTypes.html is not entirely clear, I think you can store 1d arrays with form (11,)
and (38,)
with vlen
, but not 2d.
In fact, the output is train_targets
reproduced with:
In [54]: test1=np.empty((5,),dtype=object)
In [55]: for i in range(5):
test1[i]=test.T.flatten()[i:i+11]
These are 11 values ββtaken from the transposition (order F), but shifted for each auxiliary array.
source to share