I am trying to download a word document saved in one drive as pdf using the python-o365 library but the downloaded file cannot be opened with adobe. I get an error Adobe Acrobat could not open 'Output.pdf' because it is either not a supported format...
etc. Some of my code shown below:
my_drive = storage.get_default_drive()
attachments_folder = my_drive.get_special_folder('attachments')
items = attachments_folder.get_items()
target_file = "Example.docx"
file = list(filter(lambda x: target_file == x.name, items))[0]
file.download(to_path = r"C:\Users\UserX\OneDrive WordToPdf", name="Output.pdf",convert_to_pdf=True)
The downloaded file seems to just have a pdf extension but is actually still a Word file as it opens in word.
When I remove the extension in name
to
file.download(to_path = r"C:\Users\UserX\OneDrive WordToPdf", name="Output",convert_to_pdf=True)
the resulting file has a docx
extension but does open in Adobe and not in Word
How can I get this working properly? Currently working around by changing the extension after the file is downloaded.
I was able to repro the issue. I looked little deeper on the source code at the below link.
https://github.com/O365/python-o365/blob/master/O365/drive.py
Let's focus on the below snippet - as this is responsible for converting and downloading the file in pdf.
As far as I have understood :
Then it goes and downloads the file in the PDF Format.
What is happening ?
So in our case - when you give a destination file name for instance "ABC.pdf" - it picks the destination file extension (PDF) - since pdf is not in the list of allowed_pdf_extensions - the file is downloaded as a normal docx (as the below line is not executed )
That is also the reason why if you don't give the extension - it takes the source extension for the destination file - docx - docx is in the list allowed_pdf_extensions and convert_to_pdf is set to true - it downloads the file in the pdf format. (But file is named with the docx extenstion).
Possible Worakrounds :
I was able to temporary bypass the behavior - by adding the ".pdf" to the list in the drive.py local to the machine.
For now, you could write a piece of code - manually update the file to reflect the filename.
Or Author can be reached out for the same issue.