Flask / postgres - display pdf with PDFJS

2.9k Views Asked by At

I have a very simple application. A user uploads a pdf file to a postgres database via the web front end. That pdf should then be rendered in the browser via pdfjs.

I'm fairly certain my issue is an encoding one, but I don't think I understand encoding well enough to answer this on my own.

My model:

class Lesson(Base):
    __tablename__ = 'lessons'

    # Name of the lesson
    lesson_order = db.Column(db.Enum(LessonIndexes), nullable=False)
    name = db.Column(db.String(128), nullable=False)
    summary = db.Column(db.String(500))
    lesson_plan_id = db.Column(db.Integer(), ForeignKey('lesson_plans.id'), nullable=False)
    pdf = db.Column(db.LargeBinary())

My Controller:

@mod_lp.route('/<lesson_plan_id>/create_lesson', methods=["POST"])
def create_lesson(lesson_plan_id):
    form = LessonForm()
    file = request.files['pdf']  # type: FileStorage

    if form.validate_on_submit():
        file = request.files['pdf']
        lesson = Lesson(form.lesson_order.data, form.name.data, form.summary.data, lesson_plan_id,
                        pdf=file.read() # this line here
                        )
        db.session.add(lesson)
        db.session.commit()
    return redirect(url_for('lesson_plan.show', lesson_plan_id=lesson_plan_id))

This stores the data to look something like:

%PDF-1.4
%����
1 0 obj
<</Creator (Mozilla/5.0 \(Macintosh; Intel Mac OS X 10_12_6\) AppleWebKit/537.36 \(KHTML, like Gecko\) Chrome/60.0.3112.113 Safari/537.36)
/Producer (Skia/PDF m60)
/CreationDate (D:20170916222407+00'00')
/ModDate (D:20170916222407+00'00')>>
endobj
2 0 obj
<</Filter /FlateDecode
/Length 1370>> stream
x���ݎ�4��<�������   qq$8�@%`aB�H�_�����T�E���ړ�c'�t�Z��[������}�{�I���@���

(etc...)

my javasript (taken from PDFJS, hello world):

var pdfString = "{{ pdf_data}}";
var pdfData = atob(pdfString);
if (pdfData) {
    var loadingTask = PDFJS.getDocument({data: pdfData});
    loadingTask.promise.then(function (pdf) {
        console.log('PDF loaded');

        // Fetch the first page
        var pageNumber = 1;
        pdf.getPage(pageNumber).then(function (page) {
            console.log('Page loaded');

            var scale = 1.5;
            var viewport = page.getViewport(scale);

            // Prepare canvas using PDF page dimensions
            var canvas = document.getElementById('pdf-canvas');
            var context = canvas.getContext('2d');
            canvas.height = viewport.height;
            canvas.width = viewport.width;

            // Render PDF page into canvas context
            var renderContext = {
                canvasContext: context,
                viewport: viewport
            };
            var renderTask = page.render(renderContext);
            renderTask.then(function () {
                console.log('Page rendered');
            });
        });
    }, function (reason) {
        // PDF loading error
        console.error(reason);
    });

The current error I have is:

6:108 Uncaught DOMException: Failed to execute 'atob' on 'Window': The string to be decoded is not correctly encoded.

things i've tried:

file.stream.getvalue()

file.stream.getvalue().decode("latin-1") # for whatever reason, this was the only 'decode' that didn't throw an error

file.stream.getvalue().decode("latin-1").encode()

base64.b64encode(file.stream.getvalue().decode("latin-1").encode())

but these all failed in various ways. UPDATE:

If I send the binary data in the database to my template:

pdf_data = lesson.pdf

and forget about calling atob on it:

var pdfData = pdfString;
        if (pdfData) {
...

I get this error:

Error: Invalid XRef stream header
pdf.worker.js:340     at error (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:340:17)
    at XRef_readXRef [as readXRef] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:20943:13)
    at XRef_parse [as parse] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:20613:28)
    at PDFDocument_setup [as setup] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:26445:17)
    at PDFDocument_parse [as parse] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:26336:12)
    at http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:36120:28
    at Promise (<anonymous>)
    at LocalPdfManager_ensure [as ensure] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:36115:14)
    at LocalPdfManager.BasePdfManager_ensureDoc [as ensureDoc] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:36067:19)
1

There are 1 best solutions below

2
On BEST ANSWER

atob expects a base64 encoded string. I got a basic example to at least get a successful call to atob. Pretty sure this is the issue that you are seeing though. You could probably just save the base64 encoded content in that postgres table so that you don't need to decode it all of the time. The 'source.pdf' is just a sample pdf I had on disk. However you can swap this in with data from your postgres table.

flask_app.py

from flask import Flask, request, render_template
import base64

app = Flask(__name__)


@app.route("/testing", methods=["GET"])
def get_test_file():
    with open("source.pdf", "rb") as data_file:
        data = data_file.read()
    encoded_data = base64.b64encode(data).decode('utf-8')
    return render_template("test.html", encoded_data=encoded_data)

test.html

<html>
<head>
</head>
<body>
  <script>
    var encoded_data = '{{ encoded_data }}';
    var pdf_data = atob(encoded_data);
  </script>
</body>
</html>