I had written c++ code to read feather file and insert the data into a arrow::Table, but it gives segmentation fault if the file contains any column with datatype arrow::large_utf8. It gives segfault for this datatype only, there are no errors for utf8/int/float datatype.
I think there is something wrong with the feather API implementation for read/write because if I do the same for a parquet file, it works fine.
I found a link which somewhat is closely related to the same problem but it was with python - https://github.com/pandas-dev/pandas/issues/24767
Does anyone have any idea on why there is such behavior ?
arrow::Status write_to_feather(std::string path,std::shared_ptr <arrow::Table> table) {
arrow::fs::LocalFileSystem file_system;
auto output = file_system.OpenOutputStream(path).ValueOrDie();
ABORT_ON_FAILURE(arrow::ipc::feather::WriteTable(*table,output.get()));
return arrow::Status::OK();
}
void read_feather_to_table(std::string path,std::shared_ptr<arrow::Table> *feather_table){
arrow::fs::LocalFileSystem file_system;
std::shared_ptr <arrow::io::RandomAccessFile> input_file = file_system.OpenInputFile(path).ValueOrDie();
std::shared_ptr <arrow::ipc::feather::Reader> feather_reader = arrow::ipc::feather::Reader::Open(input_file).ValueOrDie();
arrow::Status temp_status = feather_reader -> Read(feather_table);
if(temp_status.ok()){
std::cout << "Read feather file Successfully." << std::endl;
std::cout << ((*feather_table).get()) -> ToString() << std::endl; // this line gives segfault
}
else{
std::cout << "Feather file reading process failed." << std::endl;
}
return;
}
When i had used write_to_feather to write a table which contains any column with large_utf8 datatype, my read_feather_to_table function gives a segfault while in all other cases both functions work fine. The segfault occurs when i try to print the table content as specified in the code above.