I an using i text 5 to generate the PDF from html as input . As part of PDF accessibility,adding pdfwriter.settagged().
But here all the empty and non-empty tags are tagging .can you please help how to avoid to tagging the non empty html tags
I an using i text 5 to generate the PDF from html as input . As part of PDF accessibility,adding pdfwriter.settagged().
But here all the empty and non-empty tags are tagging .can you please help how to avoid to tagging the non empty html tags
On
You can do it directly with pdfHTML (basically the solution for HTML to PDF conversion in iText 7).
ConverterProperties props = new ConverterProperties();
props.setTagWorkerFactory(new DefaultTagWorkerFactory() {
@Override
public ITagWorker getCustomTagWorker(
IElementNode tag, ProcessorContext context) {
if (tag.name().equals(TagConstants.TD)) {
if (!tag.childNodes().isEmpty()) {
return new TdTagWorker(tag, context);
} else {
return new SpanTagWorker(tag, context);
}
}
return null;
}
});
PdfDocument doc = new PdfDocument(new PdfWriter(DEST));
doc.setTagged();
HtmlConverter.convertToPdf(new FileInputStream(ORIG), doc, props);
On the code above, you can use setTagWorkerFactory to have a custom behavior for your tags as detailed in the documentation. In this specific case, I'm simply changing empty TD tags into a Span element, which achieves the desired behavior (the superfluous TD tag disappears).
(to be completely honest, this relies on the inability of the TR worker to parse the SPAN tag, so it just jumps ship. I'll update the answer if I come up with a more elegant solution)
I suppose one way to go around it, would be to go through the StructTree on the output PDF document, and try to find the tag you are looking for, without any kids, and remove it from the parent. I do not use iText 5 anymore, as it has been deprecated (only security fixes are issued), but with iText 7, you could do something like:
it's not the most elegant thing, but I've used pdfHTML to create an HTML file, where I had an empty td
and then I've used the code to go through it and remove the empty tags (or rather, tags without children). Maybe there is a solution to do it directly with xmlWorker (I am assuming this is what you are using to create the HTML document), or a better post processing alternative to my suggestion.