DFDL decoding of enumerated binary data

627 Views Asked by At

I'm currently working on a DFDL schema for a legacy (custom) binary file format used in a system to translate to either XML or JSON.

I've got some binary data that is enumerated values, i.e. the C-struct data type looks like this (and stored as a byte):

typedef enum _SomeEnum
{
  ENUM_1 = 0x00,
  ENUM_2 = 0x01,
  ENUM_3 = 0x02
} SomeEnum;

I can decode the enumeration to a numerical value just fine with this DFDL schema code (including checks):

<xs:element name="SomeEnum" type="xs:unsignedByte>
  <xs:annotation>
    <xs:appinfo source="http://www.ogf.org/dfdl/">
        <dfdl:assert><![CDATA[{ . lt 3 }]]></dfdl:assert>
    </xs:appinfo>
   </xs:annotation>
</xs:element>

which translates to this XML with the enum field equal to 1 in this instance:

<SomeEnum>1</SomeEnum>

What I would like is to have the ability to translate the decoded enumeration value to a string so that the XML result looks like this:

<SomeEnum>ENUM_1</SomeEnum>

but I am not sure how this can be done with DFDL.

I am using Daffodil as my DFDL parser/processor (although I suspect that the IBM-integration-bus DFDL parser will also be able to do this)

1

There are 1 best solutions below

5
brandon.sloane On

Disclaimer, I am the Daffodil developer who implemented option 2 below.

I do not believe IBM DFDL has a good solution to this problem.

Daffodil offers 2 solutions:

1) Using inputValueCalc/outputValueCalc. The theory here is you first parse the enum as an integer (possibly in a hidden group), then use DFDL expressions to compute the friendly string in a big if-else statement:

<xs:group name="enum">
  <xs:sequence>
    <xs:element name="enum_int" type="xs:int" dfdl:length="1" dfdl:outputValueCalc="{if (../SomeEnum eq 'ENUM_1') then 0 else if (../SomeEnum eq 'ENUM_2') then 1 else if (../SomeEnum eq 'ENUM_3') then 2 else fn:error()}"/>
  </xs:sequence>
</xs:group>

<xs:sequence>
  <xs:sequence dfdl:hiddenGroupRef="tns:enum"/>
  <xs:element name="SomeEnum" dfdl:inputValueCalc="if(../enum eq 0) then 'ENUM_1' else if(../enum eq 1) then 'ENUM_2' else if(../enum eq 2) then 'ENUM_3' else fn:error()" />
<xs:sequence>

The benefit of this approach is that it is fully DFDL compliant. The drawback is that it quickly becomes unwieldy for large enumerations (both to maintain, and to run). Also, as far as I am aware, Daffodil is the only DFDL processor that currently supports inputValueCalc and outputValueCalc, so being spec-compliant is not worth particularly much here.

2) The newest release of Daffodil (2.4.0) includes a DFDL extension designed specifically for this problem. Some documentation available on the Daffodil wiki.

The theory here is that you can define a simple type that is a restriction on xs:string as an xsd enumeration; then supply the corresponding binary values as a DFDL annotation:

<xs:simpleType name="uint8" dfdl:length="1">
  <xs:restriction base="xs:unsignedInt"/>
</xs:simpleType>

<xs:simpleType name="SomeEnumType" dfdlx:repType="tns:uint8">
  <xs:restriction base="xs:string">
    <xs:enumeration value="ENUM_1" dfdlx:repValues="0" />
    <xs:enumeration value="ENUM_2" dfdlx:repValues="1" />
    <xs:enumeration value="ENUM_3" dfdlx:repValues="2" />
  </xs:restriction>
</xs:simpleType>

<xs:element name="SomeEnum" type="tns:SomeEnumType" />

The benefit here is that the schema is much more maintainable, and Daffodil will perform the lookup using a direct hash-table lookup, instead of needed to walk through an if-else tree.