I am attempting to use MessageToJSON and MessageToDict with the including_default_value_fields argument to generate a Pandas Dataframe that then gets written to Parquet. I have a deeply nested data structure that I am pivoting into a "flattened" FACT table that then get imported into Databricks, S3 for AWS Athena, and potentially other query engines. Parquet is much nicer to work with if you can define all of the columns you want up front and it's a real pain to add columns on the fly.

So I am looking to generate default values for every field in my message that isn't assigned for each message.

To this effect the including_default_value_fields seems like it should be the right tool for the job exceeeeept that it's not currently assigning defaults for every field in my schema that uses google.protobuf.BoolValue

Let's use my Address schema as an example.

address.proto

message Address {
    string city = 1;
    string company = 2;
    string countryIso2 = 3;
    string email = 4;
    string name = 5;
    string phone = 6;
    string state = 7;
    string street1 = 8;
    string street2 = 9;
    string zip = 10;
    int64 countryId = 11;
    string street3 = 12;
    string streetNumber = 13;
    bool isResidential = 14 [deprecated = true];
    string objectPurpose = 15;
    string objectState = 16;
    string objectSource = 17;
    bool validate = 18;
    string metadata = 19;
    string objectId = 20;
    google.protobuf.BoolValue residential = 21;
    google.protobuf.BoolValue isTest = 22;
    repeated ValidationMessage messages = 23;
}

PDB Session Showing Missing Fields

There’s 23 items contained in the Address message type. If I use include_default_value_fields=False I get 12 items in the nested address field for the test data that I'm using. If it’s set to True I get 21 keys.

(Pdb) len(MessageToDict(shipment, including_default_value_fields=False)["addressTo"])
12
(Pdb) len(MessageToDict(shipment, including_default_value_fields=True)["addressTo"])
21
(Pdb) MessageToDict(shipment, including_default_value_fields=False)["addressTo"].keys()
dict_keys(['city', 'countryIso2', 'email', 'name', 'state', 'street1', 'zip', 'countryId', 'objectPurpose', 'objectState', 'objectSource', 'objectId'])
(Pdb) MessageToDict(shipment, including_default_value_fields=True)["addressTo"].keys()
dict_keys(['city', 'countryIso2', 'email', 'name', 'state', 'street1', 'zip', 'countryId', 'objectPurpose', 'objectState', 'objectSource', 'objectId', 'company', 'phone', 'street2', 'street3', 'streetNumber', 'isResidential', 'validate', 'metadata', 'messages'])

Of these the missing keys are isTest, residential which is consistent with all of the other nested message types I have in my larger structure.

Why aren't my google.protobuf.BoolValue fields getting assigned a default and is there anyway for me to fix this?

Further Issues Trying To Manually Set Missing Fields

(Pdb) hasattr(shipment, "customsDeclaration")
True
(Pdb) MessageToDict(shipment)["customsDeclaration"]
*** KeyError: 'customsDeclaration'
(Pdb) MessageToDict(shipment, including_default_value_fields=True)["customsDeclaration"]
*** KeyError: 'customsDeclaration'
(Pdb) type(shipment.customsDeclaration)
<class 'rating.customs_pb2.CustomsDeclaration'>
(Pdb) shipment.customsDeclaration.IsInitialized()
True
(Pdb) shipment.HasField("customsDeclaration")
False

Software Versions

Python == 3.9.17

python-protobuf == 4.24.0

0

There are 0 best solutions below