how can i get start line number of an element in word document using Aspose words library in java

272 Views Asked by At

i am using NodeCollection and LayoutCollector to get text and start pagenumber:

Document doc = new Document("input.docx");
LayoutCollector layoutCollector = new LayoutCollector(document);
NodeCollection nodes = doc.getChildNodes(NodeType.ANY, true);
for (Node node : (Iterable<Node>)nodes)
{
    System.out.println("Start PageNumber : " + layoutCollector.startPageIndex((node));
    switch (node.getNodeType())
    {
        case NodeType.PARAGRAPH:
            System.out.println(node.getText());
            break;
    }
}

here i want to get start line number of node along with page number. How can i achieve it

1

There are 1 best solutions below

0
Alexey Noskov On

As you know there is no concept of page or line in MS Word documents due to their flow nature. The consumer applications build document layout on the fly, the same does Aspose.Words using it's own layout engine. LayoutCollector and LayoutEnumerator classes provides a limited access to document layout information. unfortunately, there is no direct way to get the line index of some node using these classes. However, you can get layout entity of LayoutEntityType.LINE type of a particular node. For example, the following code demonstrates the basic technique of splitting document content into lines:

Document doc = new Document("C:\\Temp\\in.docx");

// Split all Run nodes in the document to make them not more than one word.
Iterable<Run> runs = doc.getChildNodes(NodeType.RUN, true);
for (Run r : runs)
{
    Run current = r;
    while (current.getText().indexOf(' ') >= 0)
        current = SplitRun(current, current.getText().indexOf(' ') + 1);
}

// Wrap all runs in the document with bookmarks to make it possible to work with LayoutCollector and LayoutEnumerator
runs = doc.getChildNodes(NodeType.RUN, true);

ArrayList<String> tmpBookmakrs = new ArrayList<String>();
int bkIndex = 0;
for (Run r : runs)
{
    // LayoutCollector and LayoutEnumerator does nto work with nodes in header/footer or in textboxes.
    if (r.getAncestor(NodeType.HEADER_FOOTER) != null || r.getAncestor(NodeType.SHAPE) != null)
        continue;

    BookmarkStart start = new BookmarkStart(doc, "r" + bkIndex);
    BookmarkEnd end = new BookmarkEnd(doc, start.getName());

    r.getParentNode().insertBefore(start, r);
    r.getParentNode().insertAfter(end, r);

    tmpBookmakrs.add(start.getName());
    bkIndex++;
}

// Now we can use collector and enumerator to get runs per line in MS Word document.
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);

Object currentLine = null;
for (String bkName : tmpBookmakrs)
{
    Bookmark bk = doc.getRange().getBookmarks().get(bkName);

    enumerator.setCurrent(collector.getEntity(bk.getBookmarkStart()));
    while (enumerator.getType() != LayoutEntityType.LINE)
        enumerator.moveParent();

    if (currentLine != enumerator.getCurrent())
    {
        currentLine = enumerator.getCurrent();

        System.out.println();
        System.out.println("-------=========Start Of Line=========-------");
    }

    if (bk.getBookmarkStart().getNextSibling().getNodeType() == NodeType.RUN)
        System.out.print(((Run)bk.getBookmarkStart().getNextSibling()).getText());
}
private static Run SplitRun(Run run, int position)
{
    Run afterRun = (Run)run.deepClone(true);
    run.getParentNode().insertAfter(afterRun, run);
    afterRun.setText(run.getText().substring(position));
    run.setText(run.getText().substring(0, position));
    return afterRun;
}