I am trying to use python-pptx to extract all the text in a given slide.
But it is missing some text from some text boxes.
code:
from pptx import Presentation
def getPptContent(path):
prs = Presentation(path)
text_runs = []
for slide in prs.slides:
for shape in slide.shapes:
if not shape.has_text_frame:
continue
for paragraph in shape.text_frame.paragraphs:
for run in paragraph.runs:
text_runs.append(run.text)
return text_runs
But it is missing some text, but when I try to print the contents of the slide
from slide._element I can see the missing text in the xml
Any suggestions will be helpful