I am trying to convert a docx file to html file using pypandoc package in python. Here's my code(removed the file paths) -
import pypandoc
filename = <filepath>
output=pypandoc.convert(filename,to='html',extra_args=['--extract-media=<foldername>'])
filename=os.path.splitext(filename)[0]
filename="{0}.html".format(filename)
with open(filename,'w') as f:
if type(output) is not str:
output=output.encode("utf-8")
f.write(output)
It doesn't insert the images present in the docx file, and colour of the texts are all changed to black and white. What should I do to place all images in the html file and keep all text formatting intact?
Maybe you could try docx2html. Code as follows: