I would like to organize my collected data (from computer simulations) into a hdf5 file using Python. I measured positions and velocities [x,y,z,vx,vy,vz] of all atoms within a certain space region over many time steps. The number of atoms, of course, varies from time step to time step.
A minimal example could look as follows:
[
[ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2] ],
[ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2], [x3,y3,z3,vx3,vy3,vz3] ]
]
(2 time steps, first time step: 2 atoms, second time step: 3 atoms)
My idea was to create a hdf5 dataset within Python which stores all the information. At each time step it should store a 2d array of alls positions/velocities of all atoms, i.e.
dataset[0] = [ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2] ]
dataset[1] = [ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2], [x3,y3,z3,vx3,vy3,vz3] ].
The idea is clear, I think. However, I struggle with the definition of the correct data type of the data set with varying array length.
My code looks like this:
import numpy as np
import h5py
file = h5py.File ('file.h5','w')
columnNo = 6
rowtype = np.dtype("%sfloat32" % columnNo)
dt = h5py.special_dtype( vlen=np.dtype(rowtype) )
dataset = file.create_dataset("dset", (2,), dtype=dt)
print dataset.value
testarray = np.array([[1.,2.,3.,2.,3.,4.],[1.,2.,3.,2.,3.,4.]])
print testarray
dataset[0] = testarray
print dataset[0]
This, however, does not work. When I run the script I get the error message "AttributeError: 'float' object has no attribute 'dtype'." It seems that my defined dtype is wrong.
Does anybody see how it should be defined correctly?
Thanks very much, Sven
Thanks for the quick answer. It helped a lot.
If I now simply change the data type of the data set to
I get what I would like to have.
Here, the Python code (for completeness):
And to corresponding output reads
Just to get it right: The problem in my original code was a bad definition of my rowtype data structure, right?
Best, Sven