I have a word docx, with some coloured characters. I am trying to export this data into a dataframe and want to retain the information of the font color as well. The colors represent important information and so, I would like the output to state the colour of the character being read. Are there any R packages that would help me read this?
I have tried converting it into XML, but have had no luck trying to retrieve the text based on the font color. I have also tried the officer package but unfortunately, it doesn't read the font colors.
Sample input would be a docx with characters like this:
Sample output could look something like:
Character Underline Bold Color
O No Yes Red
% Yes Yes Black
8 Yes Yes Green
OR
Character Underline Bold Color
O No Yes Red
% Yes Yes Black
8 Yes Yes Green
OR
Red Character positions- 1
Green Character positions- 3
Underline character positions- 2,3
Bold character positions- 1,2,3
Note: my test document is about pigs, hence the variable names.
This gives you a list of all sections of the document containing text. Then iterate over them to extract the relevant text and values, e.g:
gives