Unknown characters while reading PDF file from Azure Blobl Storage

55 Views Asked by At

I want to read PDF from azure blob storage and convert it into Base64 format. Following is the code that works fine with .txt file format but for .pdf file, it gives unknown characters which I understood is because of binary data format of PDF. How can I fix this code to get valid pdf and encode it with Base64.

            BlobAsyncClient blobAsyncClient = containerClient.getBlobAsyncClient("abc/abc.pdf");
            ByteArrayOutputStream downloaded= new ByteArrayOutputStream();
            return blobAsyncClient.downloadStream().doOnNext(piece -> {
                try {
                    downloaded.write(piece.array());
                } catch (IOException ex) {
                    throw new UncheckedIOException(ex);
                }
             }).doOnComplete(() -> {
                log.info("File downloading completed...");
                log.info(String.valueOf(downloaded)); // getting garbage value even before encoiding the bytes
            })
                .then(
Mono.defer(()->Mono.just(Base64.getEncoder().encodeToString(downloaded.toByteArray()))));

Value populated in downloaded is:

%����
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(en) /StructTreeRoot 14 0 R/MarkInfo<</Marked true>>/Metadata 28 0 R/ViewerPreferences 29 0 R>>
endobj
2 0 obj
<</Type/Pages/Count 1/Kids[ 3 0 R] >>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R/F2 12 0 R>>/ExtGState<</GS10 10 0 R/GS11 11 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 612 792] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>>
endobj
4 0 obj
<</Filter/FlateDecode/Length 257>>
stream
x��PMK�0���+4}��M%�ۏEqҊ'���Z���A���#�ޣ�>LTU龾l��aiu���~#m��^���|[r�`�����MZm�Ҏ������䭁u�L./O�����S��?۝V��eL\�M�%�,e.���2N�9,�7Q�@&���D �]|��J�V��h�[��Μ _|�^Ȯh}���}Mtj)���E�
'$���n�1H6s����\    �������tzb
endstream
endobj
5 0 obj
<</Type/Font/Subtype/Type0/BaseFont/BCDEEE+Calibri/Encoding/Identity-H/DescendantFonts 6 0 R/ToUnicode 24 0 R>>
endobj

The actual content of this pdf file is just:

Download
Test 1
0

There are 0 best solutions below