Find missing files (dates) in folders using C# Script Task

132 Views Asked by At

I have some flat files with filename includes date in yyyyMMdd format. Here are some example:

Folder Path: C:\Source

FileNames:

myfile1_20220104983423.txt myfile1_20220104983423.txt myfile1_20220104983423.txt myfile1_20220104983423.txt myfile1_20220104983423.txt myfile1_20220104983423.txt

I am writing the following script in a C# Script Task to loop through the set of files and create a missing files list (lets say in yyyyMMdd format) based on the Min and Max Date range for the files and save that into a SSIS Object Type Variable.

What I have tried :

//define initial folder and extension
string initialdir = @"C:\Source";
string fileExt = "*.txt";
//needed to extract date from file name
System.Globalization.CultureInfo cu = System.Globalization.CultureInfo.InvariantCulture;

//get files and missing dates
var filesAndDates = Directory.EnumerateFiles(initialdir, fileExt, SearchOption.AllDirectories)
    //group files by directory name
    .GroupBy(x => Path.GetDirectoryName(x))
    //select files and corresponding dates from file name
    .Select(grp => new
        {
            folder = grp.Key,
            files = grp.Select(f=>Path.GetFileNameWithoutExtension(f)).ToList(),
            dates = grp.Select(f=>DateTime.ParseExact(Path.GetFileNameWithoutExtension(f).Replace("myfile_", ""), "yyyyMMdd", cu)).ToList(),
        })
    //find missing dates in folders
    .SelectMany(x =>
        //create date range between min and max in current folder - get all dates
        //exclude dates which already exist
        Enumerable.Range(0, (int)(x.dates.Max() - x.dates.Min()).TotalDays + 1)
            .Select(i => x.dates.Min().AddDays(i))
            .Except(x.dates)
            .Select(y=> new
                {
                    folder = x.folder,
                    missingDate = y
                })
        );

foreach(var fd in filesAndDates)
{
    dts.Variables["User::MissingFileList"].Value = fd.missingDate.ToString("yyyyMMdd", cu));
}

But I'm getting this error "String was not recognized as valid datetime". What I'm missing over here. Any help will be appreciated. Thanks.

1

There are 1 best solutions below

4
On

There are two problems with this code I believe:

Firstly, have to change this line:

dates = grp.Select(f=>DateTime.ParseExact(Path.GetFileNameWithoutExtension(f).Replace("myfile_", ""), "yyyyMMdd", cu)).ToList(),

to the following line:

dates = grp.Select(f => DateTime.ParseExact(DoFormat(f), "yyyyMMdd", cu)).ToList(),

which 'DoFormat' method is as below (I refactored as this way to be more clear) :

        private static ReadOnlySpan<char> DoFormat(string f)
    {
        var data =  Path.GetFileNameWithoutExtension(f);
        data = data.Replace("myfile1_", "");

        return data;
    }

This solves the problem of Replacing `myfile1_' with an empty string.

The second problem occurs when you try to format 20220104983423 by using yyyyMMdd and you have to consider changing the date format, renaming the file, or changing the way that you generate the files.