Segmentation fault while reading feather file containing columns with datatype arrow::large_utf8()

317 Views Asked by At

I had written c++ code to read feather file and insert the data into a arrow::Table, but it gives segmentation fault if the file contains any column with datatype arrow::large_utf8. It gives segfault for this datatype only, there are no errors for utf8/int/float datatype.

I think there is something wrong with the feather API implementation for read/write because if I do the same for a parquet file, it works fine.

I found a link which somewhat is closely related to the same problem but it was with python - https://github.com/pandas-dev/pandas/issues/24767

Does anyone have any idea on why there is such behavior ?

arrow::Status write_to_feather(std::string path,std::shared_ptr <arrow::Table> table) {
  arrow::fs::LocalFileSystem file_system;
  auto output = file_system.OpenOutputStream(path).ValueOrDie();
  ABORT_ON_FAILURE(arrow::ipc::feather::WriteTable(*table,output.get()));
  return arrow::Status::OK();
}

void read_feather_to_table(std::string path,std::shared_ptr<arrow::Table> *feather_table){
    arrow::fs::LocalFileSystem file_system;
    std::shared_ptr <arrow::io::RandomAccessFile> input_file = file_system.OpenInputFile(path).ValueOrDie();
    std::shared_ptr <arrow::ipc::feather::Reader> feather_reader = arrow::ipc::feather::Reader::Open(input_file).ValueOrDie();
    arrow::Status temp_status = feather_reader -> Read(feather_table);
    if(temp_status.ok()){
        std::cout << "Read feather file Successfully." << std::endl;
        std::cout << ((*feather_table).get()) -> ToString() << std::endl; // this line gives segfault
    }
    else{
        std::cout << "Feather file reading process failed." << std::endl;
    }
    return;
}

When i had used write_to_feather to write a table which contains any column with large_utf8 datatype, my read_feather_to_table function gives a segfault while in all other cases both functions work fine. The segfault occurs when i try to print the table content as specified in the code above.

0

There are 0 best solutions below