AWS S3 - Fetch PDF as octet-stream and upload to S3 bucket

2k Views Asked by At

I'm fetching a PDF from a 3rd-party API. The response content-type is application/octet-stream. Thereafter, I upload it to S3 but if I go to S3 and download the newly written file, the content is not visible, the pages are blank, viewed in Chromium and Adobe Acrobat. The file is also not zero bytes and has the correct number of pages.

Using the binary encoding gives me a file size closest to the actual file size. But it's still not exact, it's slightly smaller.

The API request (using the request-promise module):

import { get } from 'request-promise';

const payload = await get('someUrl').catch(handleError);

const buffer = Buffer.from(payload, 'binary');
const result = await new S3().upload({
  Body: buffer,
  Bucket: 'somebucket',
  ContentType: 'application/pdf',
  ContentEncoding: 'binary',
  Key: 'somefile.pdf'
}).promise();

Additionally, downloading the file from Postman also results in a file with blank pages. Does anybody know where I am going wrong here?

1

There are 1 best solutions below

0
On BEST ANSWER

As @Micheal - sqlbot mentioned in the comments, the download was the issue. I wasn't getting the entire byte stream from the API.

Changing const payload = await get('someUrl').catch(handleError);

to

import * as request from 'request'; // notice I've imported the base request lib 

let bufferArray = [];

request.get('someUrl')
.on('response', (res) => {

  res.on('data', (chunk) => {
    bufferArray = bufferArray.concat(Buffer.from(chunk)); //save response in a temp array for now
  });

  .on('end', () => {
    const dataBuffer = Buffer.concat(bufferArray); //this now contains all my data
    //send to s3
  });
});

Note: it is not recommended to stream responses with the request-promise library - outlined in the documentation. I used the base request library instead.

https://github.com/request/request-promise#api-in-detail