AWS s3 in Rust: Get and store a file - Invalid file header when opening

506 Views Asked by At

What I want to do: Download an S3 file (pdf) in a lambda and extract its text, using Rust.

The Error:

ERROR PDF error: Invalid file header

I checked the pdf file in the bucket, downloaded it from the console and everything looks correct, so something is breaking in the way I store the file.

How I am doing it:

    let config = aws_config::load_from_env().await;
    let client = s3::Client::new(&config);

    // Get uploaded object in raw bucket (serde derived the json)
    let key = event.records.get(0).unwrap().s3.object.key.clone();
    let key = key.replace('+', " ");
    let key = percent_encoding::percent_decode_str(&key).decode_utf8().unwrap().to_string();
    let content = client
         .get_object()
         .bucket(raw_bucket_name)
         .key(&key)
         // .response_content_type("application/pdf") // this did not make any difference
         .send()
        .await?;
    let mut bytes = content.body.into_async_read();
    let file = tempfile::NamedTempFile::new()?;
    let path = file.into_temp_path();
    let mut file = tokio::fs::File::create(&path).await?;
    tokio::io::copy(&mut bytes, &mut file).await?;

    let content = pdf_extract::extract_text(path)?; // this line breaks

Versions:

tokio = { version = "1", features = ["macros"] }
aws-sdk-s3 = "0.21.0"
aws-config = "0.51.0"
pdf-extract = "0.6.4"

I feel like I misunderstood something in how to store the bytestream, but e.g. https://stackoverflow.com/a/62003659/4986655 do it in the same way afaiks.

Any help or pointers on what the issue might be or how to debug this are very welcome.

0

There are 0 best solutions below