iam new to python and currently trying to write a program that reads multiple .xml files from the file explorer and displays them in .docx file. (Formatted and colour highlighted)
I tried using python-docx and lxml to first read the xml as a formatted string and afterwards write it into the .docx file.
Unfortunately i cant figure out a way to colour highlight the text in the .docx file like for example when using notepad++.
Maybe you guys have some ideas?
My approach:
from docx import Document
from lxml import etree
store_location = "xyz.docx"
template_path = "abc.docx"
doc = Document(template_path)
xml_et = etree.parse("example.xml")
xml_string = etree.tostring(xml_et,pretty_print=True,encoding=str)
doc.add_paragraph(xml_string)
doc.save(store_location)
Basically, you have to put colour on each word or character which have a special meaning in XML. That's huge work. So firstly you'll need a syntax highlighter. In python, the game breaker one is Pygments. You'll use the XML
Lexer.Once you have your code highlighted, you have to insert it in your document. You have then 2 solutions :
Formatteryou prefer and iterate over token then add it to your document throughpython-docxinterface.Formatterfor OpenDocument, so you'll have to create a bridge with one existingFormatteror even better develop the new output forPygments.I think the best solution is the second one. Firstly, you should take a quick look at ISO/IEC 26300 OpenDocument, then open a question or an issue on the Pygments' git.