I am writing a program in C# that involves converting .docx files to plain html. I noticed that in some scenarios text in .docx appears different than its plain text representation. However, removing formatting using text editors or ctrl + shift + v results in text being pasted 'as it appears'.
I am looking for a solution to get the text 'as it appears' as extracting plain text from the .xml results in styling to be ignored.
Appearance in .docx: - PARAGRAPH
Plain text behind it: paRAgrapH
Ctrl + Shift + V: - PARAGRAPH
How is the plain text part of the copied information generated?