Casting a char* type on a Branch using mktree

65 Views Asked by At

I'm trying to recreate a root file that contains a char* type branch (which is interpreted by uproot as AsStrings())

When using mktree uproot doesn't recognize the np.dtype('string') and when trying np.dtype('S') I get:

TypeError: cannot write NumPy dtype |S0 in TTree

Is it possible to do this, or is it simply not implemented in the package?

1

There are 1 best solutions below

1
On

"Strings" is not one of the data types that WritableTTree supports. See the blue box under https://uproot.readthedocs.io/en/latest/basic.html#writing-ttrees-to-a-file for a full list.

However, it's possible to write some string-like data. Awkward Arrays of strings are just lists of uint8 type with special metadata (the __array__: "strings" parameter) indicating that it should be interpreted as a string. There are actually two types, "string" and "bytestring", in which we assume that the former is UTF-8 encoded and the latter is not.

These data can be written to ROOT files by removing the parameters from the array, so that it looks like a plain array of integers:

>>> import awkward as ak
>>> array = ak.Array(["one", "two", "three", "four", "five"])
>>> ak.without_parameters(array)
<Array [[111, 110, 101], ..., [102, 105, 118, 101]] type='5 * var * uint8'>

Here's a way to write these data into a ROOT file:

>>> import uproot
>>> file = uproot.recreate("/tmp/some.root")
>>> file["tree"] = {"branch": ak.without_parameters(array)}
>>> file["tree"].show()
name                 | typename                 | interpretation                
---------------------+--------------------------+-------------------------------
nbranch              | int32_t                  | AsDtype('>i4')
branch               | uint8_t[]                | AsJagged(AsDtype('uint8'))

When you read them back, the uint8_t* array could be cast as a char* array, but watch out! The strings are not null-terminated (end with a \x00 byte). Many string-interpreting functions in C and C++ won't be expecting that. There are some functions, like strncpy and std::string's two-argument constructor, that can be given string length information so that they don't look for a null-terminator. The string length information is the counter branch, nbranch in the above.

I recognize that that's unpleasant. I just opened a feature request on Uproot for writing string data in a natural way, using ROOT's TLeafC, rather than this hack.