Group Files into 500MB chunks

339 Views Asked by At

I have a List<FileInfo> of files

List<FileInfo> files = GetFiles();

which has about 2 GB of files. Now I need to chunk these files into 500MB parts. In this case the result will be 4 List<FileInfo> with the Sum of all Files is below 500MB. I have no Idea how to apply Sum() on this..

List<List<FileInfo>> result = files.GroupBy(x => x.Length / 1024 / 1014 < 500)
                                   .Select(x => x.ToList()).ToList();
2

There are 2 best solutions below

1
On BEST ANSWER

Here is something that works.

List<FileInfo> files = new List<FileInfo>();
List<List<FileInfo>> listOfLists= new List<List<FileInfo>>();
files.ForEach(x => {
     var match = listOfLists.FirstOrDefault(lf => lf.Sum(f => f.Length) + x.Length < 500*1024*1024);
     if (match != null)
         match.Add(x);
     else
         listOfLists.Add(new List<FileInfo>() { x });
});
0
On

Here is a generic BatchBySize extension method that you could use:

/// <summary>
/// Batches the source sequence into sized buckets.
/// </summary>
public static IEnumerable<TSource[]> BatchBySize<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, long> sizeSelector,
    long maxSize)
{
    var buffer = new List<TSource>();
    long sumSize = 0;
    foreach (var item in source)
    {
        long itemSize = sizeSelector(item);
        if (buffer.Count > 0 && checked(sumSize + itemSize) > maxSize)
        {
            // Emit full batch before adding the new item
            yield return buffer.ToArray(); buffer.Clear(); sumSize = 0;
        }
        buffer.Add(item); sumSize += itemSize;
        if (sumSize >= maxSize)
        {
            // Emit full batch after adding the new item
            yield return buffer.ToArray(); buffer.Clear(); sumSize = 0;
        }
    }
    if (buffer.Count > 0) yield return buffer.ToArray();
}

Usage example:

List<FileInfo[]> result = files
    .BatchBySize(x => x.Length, 500_000_000)
    .ToList();