SSIS - Open XML file, find string between 2 strings and left the found string

158 Views Asked by At

I have an XML file that has several records. I have an implemented process that takes that file and inserts it into a table. The problem is that there is an xml record in the Description column that is almost 20000 characters long. What I wanted to do was to have a process that would open the file, look for the string longer than 255 characters between the and strings and make a left of the string for 255. And finally I saved the file with that change made.

Exemple:

I have in the file:

<Row>
    <Attributes>
        <Column>
            <Name>Column1</Name>
            <Value>111111</Value>
        </Column>
        <Column>
            <Name>Column2</Name>
            <Value>AAAAAA</Value>
        </Column>
        <Column>
            <Name>Column3</Name>
            <Value>XXXX</Value>
        </Column>
        <Column>
            <Name>Column4</Name>
            <Value>XXXXXX</Value>
        </Column>
        <Column>
            <Name>Column5</Name>
            <Value>FGGGGG</Value>
        </Column>
        <Column>
            <Name>Description</Name>
            <Value>PING10,10,10,10TESTEREMOTE,PING20,20,20,20TESTEREMOTE,PING30,30,30,30TESTEREMOTEPING10,10,10,10TESTEREMOTE,PING20,20,20,20TESTEREMOTE,PINGPING10,10,10,10TESTEREMOTE,PING20,20,20,20TESTEREMOTE,PING30,30,30,30TESTEREMOTE30,30,30,30TESTEREMOTE</Value>
        </Column>
        <Column>
            <Name>COLUMN6</Name>
            <Value>XX</Value>
        </Column>
    </Attributes>
</Row>

I want:

Find big string between < Value > and < /Value > and left string to 250 characters.

< Row >< Attributes >< Column >< Name >Column1< /Name >< Value >111111< /Value >< /Column >< Column >< Name >Column2< /Name >< Value >AAAAAA< /Value >< /Column >< Column >< Name >Column3< /Name >< Value >XXXX< /Value >< /Column >< Column >< Name >Column4< /Name >< Value >XXXXXX< /Value >< /Column >< Column >< Name >Column5< /Name >< Value >FGGGGG< /Value >< /Column >< Column >< Name >Description< /Name >< Value >PING 10,10,10,10 TESTE REMOTE, PING 20,20,20,20 TESTE REMOTE, PING 30,30,30,30 TESTE REMOTE PING 10,10,10,10 TESTE REMOTE, PING 20,20,20,20 TESTE REMOTE, PING PING 10,10,10,10 TESTE REMOTE, PING 20,20,20,20 TESTE REMOTE, PING 30,30,30,30 TESTE REMOTE< /Value >< /Column >< Column >< Name >COLUMN6< /Name >< Value >XX< /Value >< /Column >< /Attributes >< /Row >

Thank You

1

There are 1 best solutions below

3
On

I always defer to C# when dealing with XML. So here is that solution:

Create script component source: set your two output columns name/value both string length 50

Enter the code:

add the following namespaces:

using System.Xml.Serialization;
using System.Collections.Generic;

create a class object to let the XML serialize into

[XmlRoot(ElementName = "Column")]
public class Column
{
    [XmlElement(ElementName = "Name")]
    public string Name { get; set; }
    [XmlElement(ElementName = "Value")]
    public string Value { get; set; }
}

[XmlRoot(ElementName = "Attributes")]
public class Attributes
{
    [XmlElement(ElementName = "Column")]
    public List<Column> Column { get; set; }
}

[XmlRoot(ElementName = "Row")]
public class Row
{
    [XmlElement(ElementName = "Attributes")]
    public List<Attributes> Attributes { get; set; }
}

And in create new ouput rows add this:

    string xml = @"<Row>
                            <Attributes>
                                <Column>
                                    <Name>Column1</Name>
                                    <Value>111111</Value>
                                </Column>
                                <Column>
                                    <Name>Column2</Name>
                                    <Value>AAAAAA</Value>
                                </Column>
                                <Column>
                                    <Name>Column3</Name>
                                    <Value>XXXX</Value>
                                </Column>
                                <Column>
                                    <Name>Column4</Name>
                                    <Value>XXXXXX</Value>
                                </Column>
                                <Column>
                                    <Name>Column5</Name>
                                    <Value>FGGGGG</Value>
                                </Column>
                                <Column>
                                    <Name>Description</Name>
                                    <Value>PING10,10,10,10TESTEREMOTE,PING20,20,20,20TESTEREMOTE,PING30,30,30,30TESTEREMOTEPING10,10,10,10TESTEREMOTE,PING20,20,20,20TESTEREMOTE,PINGPING10,10,10,10TESTEREMOTE,PING20,20,20,20TESTEREMOTE,PING30,30,30,30TESTEREMOTE30,30,30,30TESTEREMOTE</Value>
                                </Column>
                                <Column>
                                    <Name>COLUMN6</Name>
                                    <Value>XX</Value>
                                </Column>
                            </Attributes>
                        </Row>";

    System.IO.MemoryStream stream = new System.IO.MemoryStream(System.Text.Encoding.UTF8.GetBytes(xml));

    XmlSerializer serializer = new XmlSerializer(typeof(Row));
    var x = (Row)serializer.Deserialize(stream);

    foreach (var att in x.Attributes)
        foreach (var col in att.Column)
        {
            Output0Buffer.AddRow();
            Output0Buffer.name = col.Name;
            Output0Buffer.value = col.Value.Substring(0, Math.Min(30, col.Value.Length));
        }

Notes:

  1. I chose a length of 30 you can change that to whatever you want
  2. You can easily load a file into memorystream instead of converting a string

Final output (based on 30)

enter image description here