I am trying to read parquet files from S3. This is what I have so far,
use std::fs::File;
use std::path::Path;
use parquet::file::reader::{FileReader, SerializedFileReader};
use parquet::record::RowAccessor;
pub fn read_file() {
let response = s3_client.get_object(); // Excluded connection properties here, but you get the point.
let stream = response.body.unwrap();
let content = stream.concat2().wait().unwrap();
let mut file = File::create("./mappings.pq").expect("create failed");
file.write_all(&content).expect("failed to write body");
}
pub fn process() {
let file = File::open(&Path::new("./mappings.pq")).unwrap();
let reader = SerializedFileReader::new(file).unwrap();
// use reader to get metadata
// use reader to process records
}
Is there a better way than to download file and store on filesystem. Ideally I would like to use a stream iterator to read the file.
According to the docs for parquet crate, SerializedFileReader is the entry point process parquet files. It seems to work with file objects only. Is there an alternative to this? An implementation that supports reading from streams?