H5PY - How to store many 2D arrays of different dimensions

2.6k Views Asked by At

I would like to organize my collected data (from computer simulations) into a hdf5 file using Python. I measured positions and velocities [x,y,z,vx,vy,vz] of all atoms within a certain space region over many time steps. The number of atoms, of course, varies from time step to time step.

A minimal example could look as follows:

[ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2] ],
[ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2], [x3,y3,z3,vx3,vy3,vz3] ] 

(2 time steps, first time step: 2 atoms, second time step: 3 atoms)

My idea was to create a hdf5 dataset within Python which stores all the information. At each time step it should store a 2d array of alls positions/velocities of all atoms, i.e.

dataset[0] = [ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2] ]
dataset[1] = [ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2], [x3,y3,z3,vx3,vy3,vz3] ].

The idea is clear, I think. However, I struggle with the definition of the correct data type of the data set with varying array length.

My code looks like this:

import numpy as np
import h5py

file = h5py.File ('file.h5','w')

columnNo = 6    
rowtype = np.dtype("%sfloat32" % columnNo)
dt = h5py.special_dtype( vlen=np.dtype(rowtype) )

dataset = file.create_dataset("dset", (2,), dtype=dt)

print dataset.value

testarray = np.array([[1.,2.,3.,2.,3.,4.],[1.,2.,3.,2.,3.,4.]])
print testarray

dataset[0] = testarray
print dataset[0]

This, however, does not work. When I run the script I get the error message "AttributeError: 'float' object has no attribute 'dtype'." It seems that my defined dtype is wrong.

Does anybody see how it should be defined correctly?

Thanks very much, Sven


There are 2 best solutions below


Thanks for the quick answer. It helped a lot.

If I now simply change the data type of the data set to

dtype = dt,

I get what I would like to have.

Here, the Python code (for completeness):

import numpy as np
import h5py

file = h5py.File ('file.h5','w')

columnNo = 6

rowtype = np.dtype([('f0', '<f4',(6,))])
dt = h5py.special_dtype( vlen=np.dtype(rowtype) )

dataset = file.create_dataset("dset", (2,), dtype=dt)

# print('value')
# print(dataset.value[0])

arr = np.ones((3,),dtype=rowtype)
# print(repr(arr))
dataset[0] = arr
# print(dataset.value)

testarray = np.array([([1.,2.,3.,2.,3.,4.],),([2.,3.,4.,1.,2.,3.],)], dtype=rowtype)
# print(repr(testarray))

dataset[1] = testarray
for i in range(2): print dataset[i]

And to corresponding output reads

('rowtype', dtype([('f0', '<f4', (6,))]))
('dt', dtype('O'))
[ array([([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],),
       ([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],), ([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],)], 
      dtype=[('f0', '<f4', (6,))])
 array([([1.0, 2.0, 3.0, 2.0, 3.0, 4.0],), ([2.0, 3.0, 4.0, 1.0, 2.0, 3.0],)], 
      dtype=[('f0', '<f4', (6,))])]
[([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],) ([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],)
 ([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],)]
[([1.0, 2.0, 3.0, 2.0, 3.0, 4.0],) ([2.0, 3.0, 4.0, 1.0, 2.0, 3.0],)]

Just to get it right: The problem in my original code was a bad definition of my rowtype data structure, right?

Best, Sven


The error in your case is buried, though it is clear it occurs when trying to assign the testarray to the dataset:

Traceback (most recent call last):
  File "stack41465480.py", line 26, in <module>
    dataset[0] = testarray
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/build/h5py-GhwtGD/h5py-2.6.0/h5py/_objects.c:2577)
  File "h5py/_conv.pyx", line 712, in h5py._conv.ndarray2vlen (/build/h5py-GhwtGD/h5py-2.6.0/h5py/_conv.c:6171)
AttributeError: 'float' object has no attribute 'dtype'

I'm not skilled with the special_dtype and vlen, but I was able to write a numpy structured arrays to h5py.

import numpy as np
import h5py

file = h5py.File ('file.h5','w')

columnNo = 6    
# rowtype = np.dtype("%sfloat32" % columnNo)
rowtype = np.dtype([('f0', '<f4',(6,))])
dt = h5py.special_dtype( vlen=np.dtype(rowtype) )

dataset = file.create_dataset("dset", (2,), dtype=rowtype)


arr = np.ones((2,),dtype=rowtype)
dataset[0] = arr[0]

testarray = np.array([([1.,2.,3.,2.,3.,4.],),([2.,3.,4.,1.,2.,3.],)], dtype=rowtype)

dataset[1] = testarray[1]


1316:~/mypy$ python3 stack41465480.py 
rowtype [('f0', '<f4', (6,))]
dt object
([0.0, 0.0, 0.0, 0.0, 0.0, 0.0],)
array([([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],), ([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],)], 
      dtype=[('f0', '<f4', (6,))])
[([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],) ([0.0, 0.0, 0.0, 0.0, 0.0, 0.0],)]
array([([1.0, 2.0, 3.0, 2.0, 3.0, 4.0],), ([2.0, 3.0, 4.0, 1.0, 2.0, 3.0],)], 
      dtype=[('f0', '<f4', (6,))])
[([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],) ([2.0, 3.0, 4.0, 1.0, 2.0, 3.0],)]
[[ 1.  1.  1.  1.  1.  1.]
 [ 2.  3.  4.  1.  2.  3.]]