Powershell - XML - How do I extract multiple values from various depths per node family

1.3k Views Asked by At

I've got several million lines of xml to parse. For one application I am looking to extract 3 pieces of data for use in other scripts.

The xml looks something like the following (several dozen tags have been removed per grouping) I can change one of the name tags if it helps; though not desirable it will require some intermediate processing. Not all node groups have the extended attributes.

<?xml version="1.0" encoding="IBM437"?>
<topo>
    <node>
        <name>device1Name</name>
         <extendedAttributes>
            <attribute>
                <name>tagCategoryName</name>
                <value>tagValue</value>
            </attribute>
        </extendedAttributes>
     </node>
    <node>
        <name>device2Name</name>
        <extendedAttributes>
            <attribute>
                <name>tagCategoryName</name>
                <value>tagValue</value>
            </attribute>
        </extendedAttributes>
    </node>
    <node>
        <name>device3Name</name>
    </node>
...
...
</topo>

The output I am looking for each node is

deviceName   tagCategoryName   tagValue

I've attempted several approaches and have been unable to find an elegant solution. Started with

$xml = [xml](get-content prodnodes.txt)

Tried some Select-Xml with xpath, with direct $xml.topo.node addressing piping to select object using property names. I was unable to target the names effectively with the following.

$xml.topo.node | select-object -property name, extendedAttributes.attribute.name, extendedAttributes.attribute.value

It would return only the name The following worked to get me an additional attribute but I couldn't extend it without issues.

$munge = $xml.topo.node | select-object -property name, {$_.extendedAttributes.attribute.name}

Attempting to extend it looked like this

$munge = $xml.topo.node | select-object -property name, {$_.extendedAttributes.attribute.name, $_.extendedAttributes.attribute.value}

which gave output like this

deviceName1   {tagCategoryName1, tagValue1}
deviceName2   {tagCategoryName1, tagValue2}
deviceName3   {$null, $null}
deviceName4   {tagCategoryName2, tagValue3}
...
...

Is there a way to clean this up, or another approach that is more effective?

1

There are 1 best solutions below

0
Sage Pourpre On BEST ANSWER

Your first approach was nearly correct. That being said, in order to dig into properties like that, you need to use calculated properties.

Calculated properties are represented by a hashtable containing a name element, which will be your column name, and an expression element that contain a scriptblock to do something more than what you can with a simple select.

Here's how you need to do it in your scenario.

The statement

$xml.topo.node | select-object -property name, 
@{'Name' = 'TagName' ; 'Expression' = { $_.extendedAttributes.attribute.name } },
@{'Name' = 'TagValue' ; 'Expression' = {$_.extendedAttributes.attribute.value}}

The result

name        TagName         TagValue
----        -------         --------
device1Name tagCategoryName tagValue
device2Name tagCategoryName tagValue
device3Name

More information on this topic

Microsoft - Select-Object

4sysops - Add a calculated property with select object in powershell