I trying to read parquet file via Hazelcast for that I have written below code which is working fine, but do Hazelcast provide any in-build source to read parquet file?
BatchSource<Object> csvData = SourceBuilder
.batch("parquet-source", x -> {
try {
ParquetReader<GenericData.Record> reader = AvroParquetReader.<GenericData.Record>builder(new Path("D:/test/1651070287920.parquet")).build();
return reader;
} catch (Exception e) {
return null;
}
})
.<Object>fillBufferFn((reader, buf) -> {
try {
GenericRecord record;
if ((record = reader.read()) != null) {
Map<String, String> map = new HashMap<>();
for (int i = 0; i < headers[0].length; i++) {
String value = record.get(i) == null ? "" : record.get(i).toString();
map.put(headers[0][i], value);
}
if (map != null) {
rowcount = rowcount + 1;
buf.add(map);
}
} else {
buf.close();
return;
}
} catch (Exception e) {
buf.close();
return;
}
})
.build();
Please let me know if there is already any source in Hazelcast Jet.
Parquet files using Avro for serialization can be read using the Unified File Connector. See also the code sample.