C# Text in Powerpoint shape not being detected

440 Views Asked by At

I'm trying to extract all the text in each slide of a powerpoint file. For some reason I'm only getting some text and not all of them. I'm looping through all shapes in the slide and checking for both textframes and tables. But some slides with text will print out nothing.

Here's a sceenshot of the slide that only printed the title and no other text. enter image description here

Code

foreach (PowerPoint.Slide _slide in pptPresentation.Slides) {
    foreach(PowerPoint.Shape _shape in _slide.Shapes) {
        //check for textframes
        if (_shape.HasTextFrame == MsoTriState.msoTrue) {
            var textFrame = _shape.TextFrame;

            if (textFrame.HasText == MsoTriState.msoTrue) {
                var textRange = textFrame.TextRange;
                PrintAllParagraphs(textRange);
            } 
        }

        //check for tables
        if(_shape.HasTable == MsoTriState.msoTrue) {
            var slideTable = _shape.Table;
            int rowCount = slideTable.Rows.Count;
            int colCount = slideTable.Columns.Count;

            for(int y = 1; y <= rowCount; y++) {
                for(int x = 1; x <= colCount; x++) {
                    var tRange = slideTable.Cell(y, x).Shape.TextFrame.TextRange;
                    PrintAllParagraphs(tRange);
                }
            }
        }
    } //loop shapes
} //loop slides

print function

public void PrintAllParagraphs(PowerPoint.TextRange textRange) {
    for (int i = 1; i <= textRange.Paragraphs().Count; i++) {
        PowerPoint.BulletFormat bulletFormat = textRange.Paragraphs(i).ParagraphFormat.Bullet;
        Console.WriteLine( (bulletFormat.Type == PowerPoint.PpBulletType.ppBulletNone) ? textRange.Paragraphs(i).Text.ToString() : "* " + textRange.Paragraphs(i).Text.ToString());
    }
}

Are there other things i should be checking within the shape of a slide? Any help would be appreciated. Thanks.

1

There are 1 best solutions below

0
On BEST ANSWER

Okay, turns out that this is a SmartArt that's the reason why checking Shapes/Tables did not detect it.

All i had to do was to loop the nodes within the Smart Art and grab the text from TextRange. I noticed the text is seperated by "\r" so by splitting it i was able to get the correct output from it.

//check for SmartArt
if(_shape.HasSmartArt == MsoTriState.msoTrue) {
    foreach( SmartArtNode node in _shape.SmartArt.AllNodes) {
        var txtRange = node.TextFrame2.TextRange;
        var txt = txtRange.Paragraphs.Text.Split(new string[] { "\r" }, StringSplitOptions.None);

        foreach(string line in txt) 
            Console.WriteLine(line);
    }
}