In the following example:
try (ParquetWriter<Example> writer =
new ProtoParquetWriter<>(
new Path("file:/tmp/foo.parquet"),
Example.class,
SNAPPY,
DEFAULT_BLOCK_SIZE,
DEFAULT_PAGE_SIZE)) {
writer.write(
Example.newBuilder()
.setTs(System.currentTimeMillis())
.setTenantId("tenant")
.setSomeFlag(false)
.setSomeInt(1)
.setOtherInt(0)
.build());
}
}
And example .proto
file:
syntax = "proto3";
package com.example;
message Example {
uint64 ts = 1;
string tenantId = 2;
bool someFlag = 3;
int32 someInt = 4;
int32 otherInt = 2;
}
The resulting parquet file won't have the fields someFlag
and otherInt
because they are false
and 0
respectively.
Is there a way to make it write it anyway or should I handle this on the reader side?
In proto3, presence tracking was not enabled historically, and the only presence rule was around zero defaults. Fortunately this changed recently in new versions of protoc. The
optional
keyword can now be used in from of fields in proto3 to enable this. So: addoptional
, and any compliant implementation should do what you want. The defaults are still zero/false/etc, but if they are explicitly set: they are serialized.Also, the second 2 should be a 5