I need to get the text with the <br /> tags intact. I was using getTextContent, but it strips the inner tags.
Code:
var nodeList = root.getElementsByTagName("tr");
var nodeCount = nodeList.getLength();
for (row = 0; row < nodeList.getLength(); row++) {
var node = nodeList.item(row);
// Legacy Data Key e.g "ENV"
var DOORSKey = new java.lang.String(node.getElementsByTagName("td").item(0).getTextContent().trim());
var DOORSKeyCount = DOORSKey.length();
// DOORSVal e.g. "ALL"
var DOORSVal = new java.lang.String(node.getElementsByTagName("td").item(1).getNodeValue());
Sample HTML:
<table border="1" cellpadding="3" cellspacing="0">
<tbody>
<tr>
<td>Customer</td>
<td></td>
</tr>
<tr>
<td>ENV</td>
<td>ALL</td>
</tr>
<tr>
<td>Module</td>
<td>6DOF</td>
</tr>
<tr>
<td>Object Level</td>
<td>5</td>
</tr>
<tr>
<td>XML Profile</td>
<td>DHS_CBP_HW<br />DHS_CBP_TRAIN<br />GE_B0_HW<br />GE_B0_TRAIN<br />GE_B1_HW<br />GE_B1_JSIL_TRAIN<br />GE_B1_TRAIN<br />GE-ER_HW<br />GE-ER_TRAIN<br />GTS_MQ9<br />ITALY_HW<br />ITALY_TRAIN<br />MQ1_HW<br />MQ1_PMATS_TRAIN<br />MQ1_TRAIN<br />MQ9_BLOCK5_BW_HW<br />MQ9_BLOCK5_BW_TRAIN<br />MQ9_BLOCK5_HW<br />MQ9_BLOCK5_JSIL_TRAIN<br />MQ9_BLOCK5_PMATS_TRAIN<br />MQ9_BLOCK5_TRAIN<br />MQ9_BW_HW<br />MQ9_BW_PMATS_TRAIN<br />MQ9_BW_TRAIN<br />MQ9_HW<br />MQ9_IKHANA_TRAIN<br />MQ9_JSIL_TRAIN<br />MQ9_PMATS_TRAIN<br />MQ9_SPECIAL_HW<br />MQ9_SPECIAL_TRAIN<br />MQ9_TAMLG_TRAIN<br />MQ9_TEST<br />MQ9_TRAIN<br />ORGANIC_DEPOT_HW<br />ORGANIC_DEPOT_BLOCK5_HW<br />ORGANIC_DEPOT_TRAIN<br />ORGANIC_DEPOT_BLOCK5_TRAIN<br />PREDA_ITALY_TRAIN<br />PREDB_ITALY_TRAIN<br />PREDC_AC2_HW<br />PREDC_AC2_TRAIN<br />PREDC_HW<br />PREDC_TRAIN<br />PREDEP_TRAIN<br />PREDXP_HW<br />PREDXP_TRAIN<br />RITI_HW<br />RITI_TRAIN<br />WARRIOR_A_HW<br />WARRIOR_A_JSIL_TRAIN<br />WARRIOR_A_TRAIN</td>
</tr>
</tbody>
</table>
I have tried to get the child tags using .getNodeValue. But received an error from the database.
var DOORSVal = new java.lang.String(node.getElementsByTagName("td").item(1).getNodeValue());
As you’ve discovered, getTextContent cannot be used for this. You will need to use XSLT in order to preserve both text and elements. (Only XSLT 1.0 is supposed by Java SE, currently, but this is more than sufficient for your task.)
You’ll want a template that always analyzes
<td>elements, copies only text and<br>child elements, and ignores everything else:Java uses the Transformer class to represent an XSLT document. The usage looks something like this:
By the way, there is no reason to use
new java.lang.String, since all String objects are immutable and can be safely shared.new String(otherString)accomplishes nothing.