How do I use Rust to read a HDF5 string attribute of a dataset using the hdf5-rust crate?

52 Views Asked by At

I've got a HDF5 file with the following structure viewed with h5dump:

❯ h5dump -n GMTCO_npp_d20181005_t2022358_e2024003_b35959_c20181008035331888329_cspp_dev.h5
HDF5 "GMTCO_npp_d20181005_t2022358_e2024003_b35959_c20181008035331888329_cspp_dev.h5" {
FILE_CONTENTS {
 group      /
 group      /All_Data
 group      /All_Data/VIIRS-MOD-GEO-TC_All
 dataset    /All_Data/VIIRS-MOD-GEO-TC_All/Height
 dataset    /All_Data/VIIRS-MOD-GEO-TC_All/Latitude
 dataset    /All_Data/VIIRS-MOD-GEO-TC_All/Longitude
 ...
 group      /Data_Products
 group      /Data_Products/VIIRS-MOD-GEO-TC
 dataset    /Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Aggr
 dataset    /Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Gran_0
 }
}

I am interested in using Rust (via the hdf5-rust crate) to read a string attribute of the dataset /Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Gran_0, which has the signature

ATTRIBUTE "N_Granule_ID" {
   DATATYPE  H5T_STRING {
      STRSIZE 16;
      STRPAD H5T_STR_NULLTERM;
      CSET H5T_CSET_ASCII;
      CTYPE H5T_C_S1;
   }
   DATASPACE  SIMPLE { ( 1, 1 ) / ( 1, 1 ) }
   DATA {
   (0,0): "NPP002194429582"
   }
}

I tried the following...

use anyhow::{Ok, Result};
use hdf5::File;
use ndarray::{Array, Array2};

fn main() -> Result<()> {

    filename = "GMTCO_npp_d20181005_t2022358_e2024003_b35959_c20181008035331888329_cspp_dev.h5".to_string();
    let file = File::open(filename)?;
    let dataset = file.dataset("Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Gran_0")?;
    let attribute = dataset.attr("N_Granule_ID")?;

    // Don't know what to use here...
    let v: Array2<String> = attribute.read_2d::<String>()?;

    Ok(())
}

which seems to work up until I need to read the contents of the attribute object (attribute.read_2d() etc...) into a rust datatype. From the DATASPACE SIMPLE { ( 1, 1 ) / ( 1, 1 ) } entry in the attribute metadata I think the attribute is supposed to be read into a 2D array with a single entry (i.e.: (1x1)), but I'm not really sure which read method and datatype to use.

The only example provided with the hdf5-rust package reads a compound enum-based attribute using

attribute = attr.read_1d::<Color>()?

where Color is a user-defined enum datatype which is registered as a HDF5 dataset by deriving H5Type

#[derive(H5Type, Clone, PartialEq, Debug)] // register with HDF5
#[repr(u8)]
pub enum Color {
    R = 1,
    G = 2,
    B = 3,
}

How would one do this for a non-compound datatype (f32, i32, String)?

1

There are 1 best solutions below

0
geoff.cureton On

I got a tip from the one of the hdf5-rust contributors that I should be using FixedAscii<size>. For an attribute attached to the root group

let root_attr = file.attr("Mission_Name")?;

I did

let v_reader = root_attr.as_reader();
let v = v_reader.read::<FixedAscii<4>, ndarray::Dim<[usize; 2]>>()?;
println!("\tv = {:?}", v);

or alternatively

let v = root_attr.read_2d::<FixedAscii<4>>()?;
println!("\tv = {:?}", v);

and they both gave the result

v = [["NPP"]], shape=[1, 1], strides=[1, 1], layout=CFcf (0xf), const ndim=2

and I got to the attribute payload with

if let Some(x) = v.first() {
    print!("\tx = {:?}", x.to_string());
}

which is what I was after. For the dataset attribute referenced in the original question I used

let v = attribute.read_2d::<FixedAscii<16>>()?;
println!("\tv = {:?}", v);

giving

v = [["NPP002194429582"]], shape=[1, 1], strides=[1, 1], layout=CFcf (0xf), const ndim=2

Luckily the attributes I am interested in have fixed sizes which I know ahead of time.

I was also able to read in a "vector" string attribute (something like a list of filenames), with the signature

ATTRIBUTE "N_Anc_Filename" {
   DATATYPE  H5T_STRING {
      STRSIZE 104;
      STRPAD H5T_STR_NULLTERM;
      CSET H5T_CSET_ASCII;
      CTYPE H5T_C_S1;
   }
   DATASPACE  SIMPLE { ( 15, 1 ) / ( 15, 1 ) }
   DATA {
   (0,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0744_1.O.0.0",
   (1,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0745_1.O.0.0",
   (2,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0746_1.O.0.0",
   (3,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0776_1.O.0.0",
   (4,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0777_1.O.0.0",
   (5,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0778_1.O.0.0",
   (6,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0779_1.O.0.0",
   (7,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0780_1.O.0.0",
   (8,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0781_1.O.0.0",
   (9,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0810_1.O.0.0",
   (10,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0811_1.O.0.0",
   (11,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0812_1.O.0.0",
   (12,0): "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0813_1.O.0.0",
   (13,0): "off_Planet-Eph-ANC_Static_JPL_000f_20151008_200001010000Z_20000101000000Z_ee00000000000000Z_np",
   (14,0): "off_USNO-PolarWander-UT1-ANC_Ser7_USNO_000f_20181005_201810050000Z_20181005000106Z_ee20181012120000Z_np"
   }
}

where STRSIZE=104 is the length of the longest string (number of chars plus terminator?). The filenames are of differing sizes, but as long as the argument to FixedAscii<> is equal or greater than the longest filename, it works...

println!("\n\nReading dataset (15, 1) attribute...\n");

let dset_attr = dataset.attr("N_Anc_Filename")?;

let v = dset_attr.read_2d::<FixedAscii<104>>()?;

println!("\tv.shape() = {:?}", v.shape());
println!("\tv.strides() = {:?}", v.strides());
println!("\tv.ndim() = {:?}", v.ndim());

let arr = v.iter().collect::<Vec<_>>();

for (idx, val) in arr.iter().enumerate() {
    println!("\tarr[{:?}] = {:?} ({:?})", idx, val.to_string(), val.len());
}

giving

Reading dataset (15, 1) attribute...

v.shape() = [15, 1]
v.strides() = [1, 1]
v.ndim() = 2

arr[0] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0744_1.O.0.0" (74)
arr[1] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0745_1.O.0.0" (74)
arr[2] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0746_1.O.0.0" (74)
arr[3] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0776_1.O.0.0" (74)
arr[4] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0777_1.O.0.0" (74)
arr[5] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0778_1.O.0.0" (74)
arr[6] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0779_1.O.0.0" (74)
arr[7] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0780_1.O.0.0" (74)
arr[8] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0781_1.O.0.0" (74)
arr[9] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0810_1.O.0.0" (74)
arr[10] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0811_1.O.0.0" (74)
arr[11] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0812_1.O.0.0" (74)
arr[12] = "Terrain-Eco-ANC-Tile_20030125000000Z_ee00000000000000Z_NA_NA_N0813_1.O.0.0" (74)
arr[13] = "off_Planet-Eph-ANC_Static_JPL_000f_20151008_200001010000Z_20000101000000Z_ee00000000000000Z_np" (94)
arr[14] = "off_USNO-PolarWander-UT1-ANC_Ser7_USNO_000f_20181005_201810050000Z_20181005000106Z_ee20181012120000Z_np" (103)

This basically covers the most complicated use case for the files I am reading.