I have so far successfully used BIML to auto generate SSIS package (from CSV to SQL Server). But I got into problems where ever I have Varchar(MAX) columns in the Flat File Format.
The problem is If I define a column of type AnsiString with size -1 in the Flat file format, the output SSIS package shows the below warning
The metadata of the following output columns does not match the metadata of the external columns with which the output columns are associated.
If I click Yes, the problem is fixed by itself, but that would be my last option as I have 150 packages.
When I checked the Advanced options of Flat File Source Component I can see a difference in data type for the column Comments, External Columns show as DT_TEXT where as the Output Columns show DT_STR. :(
What I don't understand is why the Output columns showing a different data type only for Varchar(Max) when all others are working fine. Aren't the output columns generated from External columns?
Please see the biml code below.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<FileFormats>
<FlatFileFormat Name="MetadataFileFormat" RowDelimiter="LF" ColumnNamesInFirstDataRow="true" IsUnicode="false">
<Columns>
<Column Name="Category" DataType="AnsiString" Length="128" Delimiter="|" CodePage="1252" />
<Column Name="Comments" DataType="AnsiString" Length="-1" Delimiter="|" />
<Column Name="DisplayName" DataType="AnsiString" Length="256" Delimiter="CRLF" />
</Columns>
</FlatFileFormat>
</FileFormats>
<Connections>
<FlatFileConnection Name="FF_Test" FilePath="C:\Data\Sample.csv" FileFormat="MetadataFileFormat">
</FlatFileConnection>
</Connections>
<Packages>
<Package Name="FFTest" ConstraintMode="Linear">
<Tasks>
<Dataflow Name="DFT Load Data">
<Transformations>
<FlatFileSource Name="FF_SRC" ConnectionName="FF_Test">
</FlatFileSource>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>
Within a dataflow a DT_STR is bounded between lengths of 0 to 8000. The Flat File Connection Manager is happy to let you specify a length greater than 8k.
However, when you try to use that in a data flow, the component is going to report that it's not a valid length
And it makes sense if you know the concepts of how SSIS gets the performance out of data flow. It preallocates memory and does all the transformations in that memory space. How much memory would you allocate for a MAX type? Exactly...
So, you're going to need to use one of the stream data types: DT_TEXT or DT_NTEXT. Those allow for unlimited length strings.
Biml
I'm actually stumped on this, hopefully Scott can chime in. The emitted DTSX will look as the before screenshot with a data type of DT_STR and length of zero. It runs fine, just looks bad. When you double click to let the editor fix it, it changes to DT_TEXT as it should.
I thought it was just going to be a matter of providing a data type override as we can in an Execute SQL Task, but to no avail, it's not a property on the Columns collection in the flat file source.
Perhaps this was a situation where I needed to mess with the Dataflow overrides property...
But no, that gave me no better result.
Fine, I gave up and "cheated" by using Mist/BimlOnline to reverse engineer the corrected package back into Biml.
And now I simply Generate SSIS package and... Well, I suppose it's progress. Comments is identified as DT_TEXT but I still get the warning.
Deep dive into the dtsx
In the data flow's flat file source, the external metadata collection for this column is defined as follows
In the on we let the editor adjust
and the one emitted from VS 2013 using the original code, we get
It might be distasteful but perhaps a bit of XSLT could find any of the instances where you have this named column and data type of
str
and transform it totext