Is there any way to identify character styles with Apache POI xwpf documents?

1.4k Views Asked by At

Here we see that Apache POI for "HWPF" (MS Word 2000 .doc) files has a method CharacterRun.getStyleIndex()... by which you can, it appears, identify the character style(s) (not paragraph styles) which apply to this run...

But with the XWPF stuff (MS Word 2003+ .docx) files, I can't find any way to identify the character style(s) in an XWPFRun object.

1

There are 1 best solutions below

4
On BEST ANSWER

The following code should get all styles from all runs[1] within the XWPFDocument and print their XML if they are applied as character styles:

import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTRPr;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.STStyleType;

import java.util.List;

public class WordGetRunStyles {

 public static void main(String[] args) throws Exception {

  FileInputStream fis = new FileInputStream("This is a Test.docx");
  XWPFDocument xdoc = new XWPFDocument(fis);

  List<XWPFParagraph> paragraphs = xdoc.getParagraphs();
  for (XWPFParagraph paragraph : paragraphs) {
   List<XWPFRun> runs = paragraph.getRuns();
   for (XWPFRun run : runs) {
    CTRPr cTRPr = run.getCTR().getRPr();
    if (cTRPr != null) {
     if (cTRPr.getRStyle() != null) {
      String styleID = cTRPr.getRStyle().getVal();
      System.out.println("Style ID=====================================================");
      System.out.println(styleID);
      System.out.println("=============================================================");
      XWPFStyle xStyle = xdoc.getStyles().getStyle(styleID);
      if (xStyle.getType() == STStyleType.CHARACTER) {
       System.out.println(xStyle.getCTStyle());
      }
     }
    }
   }
  }
 }
}

[1] please don't try it with a document with much content ;-).

As mentioned in the comment from @mike rodent, if you get java.lang.NoClassDefFoundError: org/openxmlformats/schemas/*something* then you must use the full ooxml-schemas-1.3.jar as mentioned in https://poi.apache.org/faq.html#faq-N10025.

For me this code runs without this since I don't use Phonetic Guide Properties (https://msdn.microsoft.com/en-us/library/office/documentformat.openxml.wordprocessing.rubyproperties.aspx). I use Office 2007.