Postgresql plperlu and encodings

1.1k Views Asked by At

I want to generate a PDF in plperlu, store it in the database and then add it to an email as an attachment.

I am using PDF::Report to generate the PDF. The code looks like this:-

CREATE OR REPLACE FUNCTION workflow.make_pdf(report_template json, report_json json)
  RETURNS bytea AS
$BODY$

use strict;
use PDF::Report;
my $pdf = new PDF::ReportNG(PageSize => 'A4', PageOrientation => "Landscape");
...
 lots of tricky stuff to make PDF
...
return $pdf->Finish();
$BODY$ LANGUAGE plperlu;

This errors with invalid input syntax for type bytea which I assume is something to do with the encoding of the PDF document created.

The document itself is fine as $pdf->saveAs('/tmp/test.pdf'); creates a document that is perfectly readable.

I tried base64 encoding the result before returning it as the attachment to email will need to be in base64.

return MIME::base64::encode($pdf->Finish());

This removed the error, I can then store it in a table with:-

INSERT INTO weekly_report_pdfs(report) 
VALUES (make_pdf(report_template,report_json));

Which also works fine, and can be attached to an email, but the corruption problem persists after the base64 decode.

Exporting the file directly from the database and running base64 -d test.b64 gives an invalid input error after only one line.

This appears to be something to do with the way postgres chucks bytea, as the file looks like this:-

MDA0OTY1MSAwMDAwMCBuIAowMDAwMDQ5ODU0IDAwMDAwIG4gCjAwMDAwNTAxNjcgMDAwMDAgbiAK\\012dHJhaWxlcgo8PCAvUm9vdCAxIDAgUiAvU2l6ZSA0MCAvSW5mbyA0IDAgUiA+PgpzdGFydHhyZWYK\\012

With lots of \012 separators.

Any ideas, I've been at this for hours and am completely stumped.

1

There are 1 best solutions below

3
On

Returning binary from plperl

When declared as returning bytea, a pl/perl function must actually return a postgresql text representation for bytea.

Consider this excerpt from PL/Perl Functions and Arguments in the doc (and more specifically the last sentence):

Anything in a function argument that is not a reference is a string, which is in the standard PostgreSQL external text representation for the relevant data type. In the case of ordinary numeric or text types, Perl will just do the right thing and the programmer will normally not have to worry about it. However, in other cases the argument will need to be converted into a form that is more usable in Perl. For example, the decode_bytea function can be used to convert an argument of type bytea into unescaped binary.

Similarly, values passed back to PostgreSQL must be in the external text representation format. For example, the encode_bytea function can be used to escape binary data for a return value of type bytea.

According to this, you should do:

return encode_bytea($pdf->Finish());

and then the invalid input syntax for type bytea error would go away.

Returning base64

If returning base64, the function should normally be declared as RETURNS text and the report column should be text too. That would solve the problem of having these \012 sequences that look like a line feed (ASCII code=12 in octal) expressed in a bytea string literal with postgres escape format. The line feeds are typically added by base64 encoders every 76 characters to avoid long lines in a MIME body (RFC-4648).

If the report column stays in bytea and the function produces base64 with RETURNS text, an implicit cast should happen on INSERT and it would probably be fine. Otherwise the conversion could be made explicitly with convert_to(bytea_plperl_func(), 'US-ASCII') But storing base64 in a bytea column doesn't make much sense.