How to convert an xml to csv dynamically

389 Views Asked by At

I want to build a tool that convert xml's to csv. I am using python, but can move to a different tool if better.

these xml's are not always follow the same schema, so i need to automatically convert the structure to the csv. without always knowing the tree structures. the main tags are known and always the same, some xml might use all tags and some only few. I tried using xml.etree and managed to work with the xml, but not with a dynamic xml input. is that even possible ?

here is a sample of my xml input file content:

<Process>
    <ProcessName>Vault-2-A</ProcessName>
    <ProcessEnabled>True</ProcessEnabled>
    <ProcessType>N2N</ProcessType>
    <NonDuplicationMethod>Delete</NonDuplicationMethod>
    <OnFileExistsInDest>Overwrite</OnFileExistsInDest>
    <ProcessScheduling>ExternalActivation</ProcessScheduling>
    <ExternalActivationLevel>Process</ExternalActivationLevel>
    <ProcessRecursive>True</ProcessRecursive>
    <FileSelectionPattern>*</FileSelectionPattern>
    <Rules>
      <Rule1>
         <RuleName>V2A</RuleName>
         <SourcePort>
            <Name>xxx</Name>
            <Type>Vault</Type>
            <VaultName>yyy</VaultName>
            <UserName>user</UserName>
            <FolderName>Root\</FolderName>
         </SourcePort>
         <DestPort>
            <Name>MyFileSystem</Name>
            <Type>FileSystem</Type>
            <FolderName>D:\xxx\</FolderName>
         </DestPort>
      </Rule1>
      <Rule2>
         <RuleName>A2V</RuleName>
         <SourcePort>
            <Name>xxx</Name>
            <Type>Vault</Type>
            <VaultName>yyyn</VaultName>
            <UserName>user</UserName>
            <SafeName>userTest</SafeName>
            <FolderName>Root\</FolderName>
         </SourcePort>
         <DestPort>
            <Name>sftp</Name>
            <Type>sftp</Type>
            <FolderName>D:\Accellion Tests\DCA-IN</FolderName>
            <ArchiveFolder>\arc</ArchiveFolder>
         </DestPort>
      </Rule2>
      <Rule3>
         <RuleName>Vault-2-Accellion</RuleName>
         <NOND>true</NOND>
         <SourcePort>
            <Name>A</Name>
            <Type>Vault</Type>
            <VaultName>Am</VaultName>
            <UserName>g</UserName>
            <SafeName>test</SafeName>
            <FolderName>Root\</FolderName>
         </SourcePort>
         <DestPort>
            <Name>MyFileSystem</Name>
            <Type>FileSystem</Type>
            <FolderName>D:\Tests\DCA-IN</FolderName>
         </DestPort>
      </Rule3>
   </Rules>
   <UserExits>
   </UserExits> 
</Process>

thanks david

1

There are 1 best solutions below

0
On

As per the comment; you can use xmltodict to convert xml to dictionary. Then you can uses CSV to output the results, using DictWriter()

You'll need to think about how to display the data in CSV. By default <Rules> data, will be outputted as in 1 cell as an OrdeDict. You may want to flattened the dictionary or allow for repeating data?

As an example:

import csv
import xmltodict

def save_dict_to_csv(filename, dict):
    with open(filename, 'w') as csvfile: 
        w = csv.DictWriter(csvfile, dict.keys())
        w.writeheader()
        w.writerow(dict)

xml = r"""
<Process>
    <ProcessName>Vault-2-A</ProcessName>
    <ProcessEnabled>True</ProcessEnabled>
    <ProcessType>N2N</ProcessType>
    <NonDuplicationMethod>Delete</NonDuplicationMethod>
    <OnFileExistsInDest>Overwrite</OnFileExistsInDest>
    <ProcessScheduling>ExternalActivation</ProcessScheduling>
    <ExternalActivationLevel>Process</ExternalActivationLevel>
    <ProcessRecursive>True</ProcessRecursive>
    <FileSelectionPattern>*</FileSelectionPattern>
    <Rules>
      <Rule1>
         <RuleName>V2A</RuleName>
         <SourcePort>
            <Name>xxx</Name>
            <Type>Vault</Type>
            <VaultName>yyy</VaultName>
            <UserName>user</UserName>
            <FolderName>Root\</FolderName>
         </SourcePort>
         <DestPort>
            <Name>MyFileSystem</Name>
            <Type>FileSystem</Type>
            <FolderName>D:\xxx\</FolderName>
         </DestPort>
      </Rule1>
      <Rule2>
         <RuleName>A2V</RuleName>
         <SourcePort>
            <Name>xxx</Name>
            <Type>Vault</Type>
            <VaultName>yyyn</VaultName>
            <UserName>user</UserName>
            <SafeName>userTest</SafeName>
            <FolderName>Root\</FolderName>
         </SourcePort>
         <DestPort>
            <Name>sftp</Name>
            <Type>sftp</Type>
            <FolderName>D:\Accellion Tests\DCA-IN</FolderName>
            <ArchiveFolder>\arc</ArchiveFolder>
         </DestPort>
      </Rule2>
      <Rule3>
         <RuleName>Vault-2-Accellion</RuleName>
         <NOND>true</NOND>
         <SourcePort>
            <Name>A</Name>
            <Type>Vault</Type>
            <VaultName>Am</VaultName>
            <UserName>g</UserName>
            <SafeName>test</SafeName>
            <FolderName>Root\</FolderName>
         </SourcePort>
         <DestPort>
            <Name>MyFileSystem</Name>
            <Type>FileSystem</Type>
            <FolderName>D:\Tests\DCA-IN</FolderName>
         </DestPort>
      </Rule3>
   </Rules>
   <UserExits>
   </UserExits>
</Process>"""

my_dict = xmltodict.parse(xml)
save_dict_to_csv('test.csv', next(iter(my_dict.values()))) # pass value for Process

.