input pdf image description here first one is input and this is output pdf pic output pdf image description hereI need help regarding text positioning during recreation of pdf. I have extracted all text using text stripper and able to draw on new pdf with correct font and font size. But Not able to draw text at its correct position.
//this is how I extracting TextPosition data
protected void processTextPosition(TextPosition text) {
textPositionPDGraphicsStatesMap.put(text, getGraphicsState());
PDGraphicsState state = getGraphicsState();
PDTextState textState = state.getTextState();
float fontSize = textState.getFontSize();
float horizontalScaling = textState.getHorizontalScaling() / 100f;
float charSpacing = textState.getCharacterSpacing();
// put the text state parameters into matrix form
Matrix parameters = new Matrix(
fontSize * horizontalScaling, 0, // 0
0, fontSize, // 0
0, textState.getRise()); // 1
// text rendering matrix (text space -> device space)
Matrix ctm = state.getCurrentTransformationMatrix();
Matrix textRenderingMatrix = parameters.multiply(text.getTextMatrix()).multiply(ctm);
TextPositionsInfo txtInfo = new TextPositionsInfo();
txtInfo.xDir = text.getXDirAdj();
txtInfo.yDir = text.getYDirAdj();
txtInfo.x = textRenderingMatrix.getTranslateX();
txtInfo.y = textRenderingMatrix.getTranslateY();
txtInfo.textMatrix = textRenderingMatrix;
txtInfo.height= text.getHeightDir();
txtInfo.width = text.getWidthDirAdj();
txtInfo.unicode = text.getUnicode();
txtInfo.fontName = text.getFont().getFontDescriptor().getFontName();
txtInfo.fontSize = getActualFontSize(text, getGraphicsState());
pdfGraphicContent.textPositions.add(txtInfo);
}
//here I am placing each char and set to content stream
private void addTextCharByChar(String string, List<TextPositionsInfo> textinfoList, TextBBoxinfo textBBoxinfo,PDPage page) throws IOException {
PDResources res = page.getResources();
currentContentStream.beginText();
if (textBBoxinfo._ElementType.toLowerCase().equals("h2")) {
beginMarkedConent(COSName.P);
for(TextPositionsInfo textInfo : textinfoList) {
PDFont font = getFont(res, textInfo.fontName);
currentContentStream.setFont(font, textInfo.fontSize);
Matrix _tm = textInfo.textMatrix;
currentContentStream.newLineAtOffset(_tm.getTranslateX(), _tm.getTranslateY());
currentContentStream.setTextMatrix(_tm);
currentContentStream.showText(textInfo.unicode);
}
currentContentStream.endMarkedContent();
addContentToCurrentSection(COSName.P, StandardStructureTypes.H2);
}else if (textBBoxinfo._ElementType.toLowerCase().equals("h1")) {
beginMarkedConent(COSName.P);
for(TextPositionsInfo textInfo : textinfoList) {
PDFont font = getFont(res, textInfo.fontName);
currentContentStream.setFont(font, textInfo.fontSize);
currentContentStream.newLineAtOffset(textInfo.textMatrix.getTranslateX(),
textInfo.textMatrix.getTranslateY());
currentContentStream.setTextMatrix(textInfo.textMatrix);
currentContentStream.showText(textInfo.unicode);
}
currentContentStream.endMarkedContent();
addContentToCurrentSection(COSName.P, StandardStructureTypes.H1);
}
currentContentStream.endText();
}
}
There is an error in the code you show. But the as I couldn't reproduce the error as it shows in your screenshots, I assume that there is another error somewhere in the code you don't show.
In detail
Unfortunately you neither provided self-contained code nor your example PDF. To check it, therefore, I had to change the code a bit to make it runnable. Furthermore, I had to select a test document of my own; I actually found one looking very much like your screenshot.
The error
In
processTextPositionyou try to calculate the text rendering matrix like this:This looks like the right way to calculate the text rendering matrix from the available data. EXCEPT if you read the documentation of the
TextPosition.getTextMatrixmethod:Thus,
text.getTextMatrix()already is the matrix you want to calculate here. So you can either replace the whole block above byor (if you really want to calculate the text rendering matrix yourself) use
state.getTextMatrix()instead oftext.getTextMatrix()in that block.The changed code
I changed your
processTextPositionoverride to:(CopyFormattedPageText method
copyTextLikeNitishKumar)The main change here is that I fixed the error (see above) and added the font to the target page resources; as you do not show how you fill the target page resources and match fonts with names, this was the easiest way to improvise this. Beware, this is not a good way to improvise this, this might lose quite some font information...
I then changed your
addTextCharByCharto:The main change here is dropping code paths that depend on extra information you didn't share, like
textBBoxinfo. Furthermore, I use1as font size in thecurrentContentStream.setFontbecause you thereafter set the text matrix to the original text rendering matrix which already contains a scaling by the text font size.The result
Running the code for this file (which looks very much like your screenshot) results in:
So it looks like the code you shared works after fixing the text rendering matrix and font size.
Possible additional issues
The output you showed a screenshot of does not match the identified error. The error usually would print much too large letters, not a few letters at the correct size and dropping the others.
Thus, there appear to be other errors in the code you don't show. I can only guess their causes. My guesses would be:
textBBoxinfo._ElementTypeis only "H1" or "H2" for a few letters, the ones you see in your output. As there is no code path for drawing text with other element type values, most of the letters aren't drawn at all.getFontonly returns a useful font for the letters you see in your output.getActualFontSizereturns a sensible size only for a few letters, the ones you see in the output. The other letters are too small or not drawn at all in the output.