Word Add-in Get full Document text WITH INDICATOR?

447 Views Asked by At

There is already a question answering related to this topic: Word Add-in Get full Document text? However, this method can't extract the indicator/bullet points. Is there a way we can do this? I expect the text to be exactly the same as we manually select all then copy a Word document.

The reason behind this: I'm building a question bank from a microsoft word document. Several tools offer text extraction, however, it usually ignores the bullet point.
I use keywords like A. B. C. D. etc to detect the choices. However, if the author writing choices using indicator/bullet point, this method fails.

2

There are 2 best solutions below

1
On

You can convert the numbered lists (list paragraphs) to plain text with a simple piece of vba.

See here convert lists to text

0
On

For each paragraph in the document, you can identify whether it is a list item by calling isListItem.

If it is, you can call listItem to get the item.

The listString property in Word.ListItem class can help you get the list item bullet, number, or picture as a string.

Here is an example that how to extract the bullets in the document.

Word.run(async (context) => {
  var paragraphs = context.document.body.paragraphs;
  paragraphs.load("$none");
  await context.sync();
  for (let i = 0; i < paragraphs.items.length; i++) {
    paragraphs.items[i].load("isListItem");
    paragraphs.items[i].load("text");
    await context.sync();
    if (paragraphs.items[i].isListItem) {
      paragraphs.items[i].load("listItem");
      await context.sync();
      console.log(paragraphs.items[i].listItem.listString + " " + paragraphs.items[i].text);
    } else {
      console.log(paragraphs.items[i].text);
    }
  }
});

The document is printed to the console paragraph by paragraph with all bullets retained.