I've got several million lines of xml to parse. For one application I am looking to extract 3 pieces of data for use in other scripts.
The xml looks something like the following (several dozen tags have been removed per grouping) I can change one of the name tags if it helps; though not desirable it will require some intermediate processing. Not all node groups have the extended attributes.
<?xml version="1.0" encoding="IBM437"?>
<topo>
<node>
<name>device1Name</name>
<extendedAttributes>
<attribute>
<name>tagCategoryName</name>
<value>tagValue</value>
</attribute>
</extendedAttributes>
</node>
<node>
<name>device2Name</name>
<extendedAttributes>
<attribute>
<name>tagCategoryName</name>
<value>tagValue</value>
</attribute>
</extendedAttributes>
</node>
<node>
<name>device3Name</name>
</node>
...
...
</topo>
The output I am looking for each node is
deviceName tagCategoryName tagValue
I've attempted several approaches and have been unable to find an elegant solution. Started with
$xml = [xml](get-content prodnodes.txt)
Tried some Select-Xml with xpath, with direct $xml.topo.node addressing piping to select object using property names. I was unable to target the names effectively with the following.
$xml.topo.node | select-object -property name, extendedAttributes.attribute.name, extendedAttributes.attribute.value
It would return only the name The following worked to get me an additional attribute but I couldn't extend it without issues.
$munge = $xml.topo.node | select-object -property name, {$_.extendedAttributes.attribute.name}
Attempting to extend it looked like this
$munge = $xml.topo.node | select-object -property name, {$_.extendedAttributes.attribute.name, $_.extendedAttributes.attribute.value}
which gave output like this
deviceName1 {tagCategoryName1, tagValue1}
deviceName2 {tagCategoryName1, tagValue2}
deviceName3 {$null, $null}
deviceName4 {tagCategoryName2, tagValue3}
...
...
Is there a way to clean this up, or another approach that is more effective?
Your first approach was nearly correct. That being said, in order to dig into properties like that, you need to use calculated properties.
Calculated properties are represented by a hashtable containing a name element, which will be your column name, and an expression element that contain a scriptblock to do something more than what you can with a simple select.
Here's how you need to do it in your scenario.
The statement
The result
More information on this topic
Microsoft - Select-Object
4sysops - Add a calculated property with select object in powershell