python pptx not extracting all the text

28 Views Asked by At

I am trying to use python-pptx to extract all the text in a given slide.

But it is missing some text from some text boxes.

code:

from pptx import Presentation

   def getPptContent(path):
     prs = Presentation(path)
     text_runs = []
     for slide in prs.slides:
         for shape in slide.shapes:
             if not shape.has_text_frame:
                 continue
             for paragraph in shape.text_frame.paragraphs:
                 for run in paragraph.runs:
                    text_runs.append(run.text)
     return text_runs

But it is missing some text, but when I try to print the contents of the slide from slide._element I can see the missing text in the xml

Any suggestions will be helpful

0

There are 0 best solutions below