Unable to read the elements in word document having content layout of 3 columns in the order in which it appears in word file

12 Views Asked by Shikar Kumar At 21 March 2024 at 11:01

I am trying to read the contents of a word document which has layout of 3 columns. It has data including text, images and tables. However when i read the file using following code i am getting a table present in second and table present in third column out of order.

def extract_text_and_images_from_word(filepath, output_file):
    with open(output_file, 'w', encoding='utf-8') as f:
        document = Document(filepath)
        for element in document.element.body:
            if isinstance(element, CT_Tbl):  # Check if the element is a table
                table = Table(element, document)            
                for row in table.rows:
                    for cell in row.cells:
                        for paragraph in cell.paragraphs:
                            cell_text = paragraph.text.strip()
                            f.write(cell_text + '\n')
            elif isinstance(element, CT_P):  # Check if the element is a paragraph
                paragraph_index = document.element.body.index(element)
                if paragraph_index < len(document.paragraphs):
                    paragraph = document.paragraphs[paragraph_index]
                    paragraph_text = paragraph.text.strip()
                    f.write(paragraph_text + '\n')
                else:
                    f.write("Index out of bounds for document.paragraphs\n")

I even checked the document.xml for the word file after converting it to .zip format but it seems even in the xml both the tables seem out of order. My understanding is word saves the xml by converting the document back to single column layout which is causing problem. Any help around this would be greatly appreciated.

Original Q&A

Unable to read the elements in word document having content layout of 3 columns in the order in which it appears in word file

There are 0 best solutions below

Related Questions in MS-WORD

Related Questions in OPENXML

Related Questions in PYTHON-DOCX

Trending Questions

Popular # Hahtags

Popular Questions