Resize HDF5 dataset in Julia

504 Views Asked by At

Is there a way to resize a chunked dataset in HDF5 using Julia's HDF5.jl? I didn't see anything in the documentation. Looking through the source, all I found was set_dims!(), but that cannot extend a dataset (only shrink it). Does HDF5.jl have the ability to enlarge an existing (chunked) dataset? This is a very important feature for me, and I would rather not have to call into another language.

2

There are 2 best solutions below

0
On BEST ANSWER

The docs have a brief mention of extendible dimensions in hdf5.md excerpted below.

You can use extendible dimensions,

d = d_create(parent, name, dtype, (dims, max_dims), "chunk", (chunk_dims), [lcpl, dcpl, dapl])
set_dims!(d, new_dims)

where dims is a tuple of integers. For example

b = d_create(fid, "b", Int, ((1000,),(-1,)), "chunk", (100,)) #-1 is equivalent to typemax(Hsize)
set_dims!(b, (10000,))
b[1:10000] = [1:10000] 
0
On

I believe I've got it figured out. The issue is that I forgot to give the dataspace a large enough max_dims. Doing that required digging into the lower-level API. The solution I found was:

dspace = HDF5.dataspace((6,20)::Dims, max_dims=(6,typemax(Int64)))
dtype = HDF5.datatype(Float64)
dset = HDF5.d_create(prt, "trajectory", dtype, dspace, "chunk", (6,10))

Once I created a dataset that can be resized appropriately, the set_dims! function resizes the dataset correctly.

I think I located a few minor issues with the API, which I had to work around or change in my local version. I will get in touch with the HDF5.jl owner regarding those. For those interested:

  • The constant H5S_UNLIMITED is of type Uint64, but the dataspace function will only accept tuples of Int64, hence why I used typemax(Int64) for my max_dims to imitate how H5S_UNLIMITED is derived.
  • The form of d_create which I used calls h5d_create incorrectly; it passes parent instead of checkvalid(parent).id (can be seen by comparison with other forms of d_create).