How do I compress files using powershell that are over 2 GB?

8.8k Views Asked by At

I am working on a project to compress files that range anywhere from a couple mb to several gb's big and I am trying to use powershell to compress them into a .zip. The main problem I am having is that using Compress-Archive has a 2 Gb cap to individual file size, and I was wondering if there was another method to compress files.

Edit:

So for this project we are looking to implement a system to take .pst files from outlook and compress them to a .zip and upload them to a server. Once they are uploaded they will be pulled down from a new device and extracted into a .pst file again.

1

There are 1 best solutions below

2
On

NOTE

Further updates to this function will be published to the official GitHub repo as well as to the PowerShell Gallery. The code in this answer will no longer be maintained.

Contributions are more than welcome, if you wish to contribute, fork the repo and submit a pull request with the changes.


To explain the limitation named on PowerShell Docs for Compress-Archive:

The Compress-Archive cmdlet uses the Microsoft .NET API System.IO.Compression.ZipArchive to compress files. The maximum file size is 2 GB because there's a limitation of the underlying API.

This happens because, presumably (since 5.1 version is closed source), this cmdlet uses a Memory Stream to hold all zip archive entries in memory before writing the zip archive to a file. Inspecting the InnerException produced by the cmdlet we can see:

System.IO.IOException: Stream was too long.
   at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
   at CallSite.Target(Closure , CallSite , Object , Object , Int32 , Object )

We would also see a similar issue if we attempt to read all bytes from a file larger than 2Gb:

Exception calling "ReadAllBytes" with "1" argument(s): "The file is too long.
This operation is currently limited to supporting files less than 2 gigabytes in size."

Coincidentally, we see the same limitation with System.Array:

.NET Framework only: By default, the maximum size of an Array is 2 gigabytes (GB).


There is also another limitation pointed out in this question, Compress-Archive can't compress if another process has a handle on a file.

How to reproduce?

# cd to a temporary folder and
# start a Job which will write to a file
$job = Start-Job {
    0..1000 | ForEach-Object {
        "Iteration ${_}:" + ('A' * 1kb)
        Start-Sleep -Milliseconds 200
    } | Set-Content .\temp\test.txt
}

Start-Sleep -Seconds 1
# attempt to compress
Compress-Archive .\temp\test.txt -DestinationPath test.zip
# Exception:
# The process cannot access the file '..\test.txt' because it is being used by another process.
$job | Stop-Job -PassThru | Remove-Job
Remove-Item .\temp -Recurse

To overcome this issue, and also to emulate explorer's behavior when compressing files used by another process, the function posted below will default to [FileShare] 'ReadWrite, Delete' when opening a FileStream.


To get around this issue there are two workarounds:

  • The easy workaround is to use the ZipFile.CreateFromDirectory Method. There are 3 limitations while using this static method:
    1. The source must be a directory, a single file cannot be compressed.
    2. All files (recursively) on the source folder will be compressed, we can't pick / filter files to compress.
    3. It's not possible to Update the entries of an existing Zip Archive.

Worth noting, if you need to use the ZipFile Class in Windows PowerShell (.NET Framework) there must be a reference to System.IO.Compression.FileSystem. See inline comments.

# Only needed if using Windows PowerShell (.NET Framework):
Add-Type -AssemblyName System.IO.Compression.FileSystem

[IO.Compression.ZipFile]::CreateFromDirectory($sourceDirectory, $destinationArchive)
  • The code it yourself workaround, would be using a function which does all the manual process for creating the ZipArchive and the corresponding ZipEntries.

This function should be able to handle compression same as ZipFile.CreateFromDirectory Method but also allow filtering files and folders to compress while keeping the file / folder structure untouched.

Documentation as well as usage example can be found here.

using namespace System.IO
using namespace System.IO.Compression
using namespace System.Collections.Generic

Add-Type -AssemblyName System.IO.Compression

function Compress-ZipArchive {
    [CmdletBinding(DefaultParameterSetName = 'Path')]
    [Alias('zip', 'ziparchive')]
    param(
        [Parameter(ParameterSetName = 'PathWithUpdate', Mandatory, Position = 0, ValueFromPipeline)]
        [Parameter(ParameterSetName = 'PathWithForce', Mandatory, Position = 0, ValueFromPipeline)]
        [Parameter(ParameterSetName = 'Path', Mandatory, Position = 0, ValueFromPipeline)]
        [string[]] $Path,

        [Parameter(ParameterSetName = 'LiteralPathWithUpdate', Mandatory, ValueFromPipelineByPropertyName)]
        [Parameter(ParameterSetName = 'LiteralPathWithForce', Mandatory, ValueFromPipelineByPropertyName)]
        [Parameter(ParameterSetName = 'LiteralPath', Mandatory, ValueFromPipelineByPropertyName)]
        [Alias('PSPath')]
        [string[]] $LiteralPath,

        [Parameter(Position = 1, Mandatory)]
        [string] $DestinationPath,

        [Parameter()]
        [CompressionLevel] $CompressionLevel = [CompressionLevel]::Optimal,

        [Parameter(ParameterSetName = 'PathWithUpdate', Mandatory)]
        [Parameter(ParameterSetName = 'LiteralPathWithUpdate', Mandatory)]
        [switch] $Update,

        [Parameter(ParameterSetName = 'PathWithForce', Mandatory)]
        [Parameter(ParameterSetName = 'LiteralPathWithForce', Mandatory)]
        [switch] $Force,

        [Parameter()]
        [switch] $PassThru
    )

    begin {
        $DestinationPath = $PSCmdlet.GetUnresolvedProviderPathFromPSPath($DestinationPath)
        if([Path]::GetExtension($DestinationPath) -ne '.zip') {
            $DestinationPath = $DestinationPath + '.zip'
        }

        if($Force.IsPresent) {
            $fsMode = [FileMode]::Create
        }
        elseif($Update.IsPresent) {
            $fsMode = [FileMode]::OpenOrCreate
        }
        else {
            $fsMode = [FileMode]::CreateNew
        }

        $ExpectingInput = $null
    }
    process {
        $isLiteral  = $false
        $targetPath = $Path

        if($PSBoundParameters.ContainsKey('LiteralPath')) {
            $isLiteral  = $true
            $targetPath = $LiteralPath
        }

        if(-not $ExpectingInput) {
            try {
                $destfs = [File]::Open($DestinationPath, $fsMode)
                $zip    = [ZipArchive]::new($destfs, [ZipArchiveMode]::Update)
                $ExpectingInput = $true
            }
            catch {
                $zip, $destfs | ForEach-Object Dispose
                $PSCmdlet.ThrowTerminatingError($_)
            }
        }

        $queue = [Queue[FileSystemInfo]]::new()

        foreach($item in $ExecutionContext.InvokeProvider.Item.Get($targetPath, $true, $isLiteral)) {
            $queue.Enqueue($item)

            $here = $item.Parent.FullName
            if($item -is [FileInfo]) {
                $here = $item.Directory.FullName
            }

            while($queue.Count) {
                try {
                    $current = $queue.Dequeue()
                    if($current -is [DirectoryInfo]) {
                        $current = $current.EnumerateFileSystemInfos()
                    }
                }
                catch {
                    $PSCmdlet.WriteError($_)
                    continue
                }

                foreach($item in $current) {
                    try {
                        if($item.FullName -eq $DestinationPath) {
                            continue
                        }

                        $relative = $item.FullName.Substring($here.Length + 1)
                        $entry    = $zip.GetEntry($relative)

                        if($item -is [DirectoryInfo]) {
                            $queue.Enqueue($item)
                            if(-not $entry) {
                                $entry = $zip.CreateEntry($relative + '\', $CompressionLevel)
                            }
                            continue
                        }

                        if(-not $entry) {
                            $entry = $zip.CreateEntry($relative, $CompressionLevel)
                        }

                        $sourcefs = $item.Open([FileMode]::Open, [FileAccess]::Read, [FileShare] 'ReadWrite, Delete')
                        $entryfs  = $entry.Open()
                        $sourcefs.CopyTo($entryfs)
                    }
                    catch {
                        $PSCmdlet.WriteError($_)
                    }
                    finally {
                        $entryfs, $sourcefs | ForEach-Object Dispose
                    }
                }
            }
        }
    }
    end {
        $zip, $destfs | ForEach-Object Dispose

        if($PassThru.IsPresent) {
            $DestinationPath -as [FileInfo]
        }
    }
}