I need to create a PDF/UA compliant document in iText7. The most important requirement is tagging of all content. When tagging is enabled (by calling PdfDocument.SetTagged()
method) most elements added to the document get correct tags.
The issue is with tagging of table header cells. According to ISO 32000-1:2008, table header cells must be tagged as TH and table data cells must be tagged as TD (14.8.4.2.4. Table elements, Table 337).
iText allows to distinguish between header cells and regular cells by using Table.AddHeaderCell()
and Table.AddCell()
methods. This mechanism properly creates THead and TBody tags for the groups of rows. Unfortunately, the cells themselves are always marked as TD.
Here is sample code for generating a table:
//var pdfDoc = new PdfDocument(...)
pdfDoc.SetTagged();
var doc = new Document(pdfDoc);
var table = new Table(2);
table.AddHeaderCell("Header 0");
table.AddHeaderCell("Header 1");
table.AddCell("Data 0");
table.AddCell("Data 1");
doc.Add(table);
doc.Close();
Here is an example of tagging structure we are getting:
<Table>
<THead>
<TR>
<TD> //must be TH!
<P>
"Header 0"
<TD>
<P>
"Header 1"
<TBody>
<TR>
<TD> //TD is correct here
<P>
"Data 0"
<TD>
<P>
"Data 1"
Is it possible to have iText generating TH tags when AddHeaderCell()
method is used?
I am using iText 7.0.0 for .NET (Community edition)
EDIT: Initial answer was in mistakingly given in the context of pdfHTML and not iText7 proper.
The TH tags getting tagged as TD is a side-effect of the current implementation that treats a TH in the same way as a TD.
For iText7
Set the role of the header-cells to TH before adding them to the table:
For pdfHTML
While it's possible to access the elements after conversion and before adding them to the document, you'll need to traverse the tree of iText element to find and identify tables and their header -cells. It's easier to to overwrite the conversion behavior of tags with a CustomTagWorker. The following code is taken from the accessibility example. For a primer on custom tagworkers, have a look at the configuration blog-post.
Start by creating a custom tagworker that inherits from a
TdTagWorker
, but overwrites the role right before returning the element-result:Create a
CustomTagWorkerFactory
that maps thisTagWorker
to theTH
-tagAnd set the ConvertorProperties to use this custom factory: