How to I combine data from two XML files into the same structure?

429 Views Asked by At

I have two XML files that I'd like to merge the data into the same structure as the example below. The actual files are larger and more complex so copying and pasting is not an efficient option.

Is there any way that this can be done quickly?

File1.xml:

<part1>
<g1> abc. 
</g1></part1>
<part2>
<g2> def.
</g2></part2>

File2.xml:

<part1>
<g1> 123.
</g1></part1>
<part2>
<g2> 456.
</g2></part2>

Combined.xml

<part1>
<g1> abc. 123.
</g1></part1>
<part2>
<g2> def. 456.
</g2></part2>
1

There are 1 best solutions below

0
On

Yes, there's loads of ways to 'merge' XML. But what you're going to need to do it is an XML parser, because XML is a structured data format.

Which one you use is very much a question of which language do you prefer?

Me? I like Perl and XML::Twig:

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;
use Data::Dumper;

my $snippet1 = '<root><part1>
<g1> abc. 
</g1></part1>
<part2>
<g2> def.
</g2></part2></root>';

my $snippet2 = '<root><part1>
<g1> 123.
</g1></part1>
<part2>
<g2> 456.
</g2></part2></root>';


my $first = XML::Twig->new()->parse($snippet1);

sub merge {
    my ( $twig, $element ) = @_;
    return unless $element->tag =~ m/^g/;
    my $cur   = $element;
    my $xpath = '';
    while ( $cur->parent ) {
        $xpath = $cur->tag . "/" . $xpath;
        $cur   = $cur->parent;
    }

    # print "/",$xpath,"\n";

    if ( my $other = $first->get_xpath( $xpath, 0 ) ) {
        if (    $element->text_only
            and $other->text_only )
        {
            $element->set_text(
                ( $other->text_only . " " . $element->text_only ) =~ s/\n//rg );
        }
    }
}

my $combined = XML::Twig->new(
    pretty_print  => 'indented_a',
    twig_handlers => { '_all_' => \&merge }
)->parse($snippet2)->print;

This'll take your source text and turn it into:

<root>
  <part1>
    <g1> abc.   123.</g1>
  </part1>
  <part2>
    <g2> def.  456.</g2>
  </part2>
</root>

But I'm sure there's better routes you can take, and other languages you can use.