Create a small file(.txt or .TMP) from a huge .TMP file

223 Views Asked by At

'System.OutOfMemoryException' error occured while creating a small file from big file.

I usually use the below PowerShell command to create a small version of a huge file,

Get-Content input_file_name.Tmp -TotalCount 100 | Out-File -Encoding Default "output_file_name_100.Tmp"

However, this is throwing a 'System.OutOfMemoryException' error. Any advise on this?

Note : It has worked earlier for bigger files. I think the size of the file is not the problem.

1

There are 1 best solutions below

0
On

I know that you personally think the size of the file may not be the actual problem, but it's worth revisiting the fundamentals for the benefit of other readers.

Get-Content, when used in a pipeline, reads lines from a file one at a time.

This object-by-object processing is a core feature of PowerShell's pipeline and acts as a memory throttle (no need to read all input into memory at once.

There are only three scenarios where Get-Content reads the whole file into memory:

  • If you capture Get-Content's output in a variable ($content = Get-Content ...), in which case the variable receives an array comprising all lines.

  • If you enclose the Get-Content call in (...), $(...), or @(...), which also returns an array of all lines.

  • If you use the -Raw switch, which makes Get-Content return a single, multi-line string.


Using -TotalCount 100 (or -First 100) doesn't change this fundamental behavior: after 100 lines have been read, Get-Content stops reading and closes the file.

The code in your question therefore doesn't explain your symptom - you shouldn't run out of memory - at least not because the input file is large; if it still happens, you may be seeing a bug.

If you have a reproducible case, I encourage you to file a bug in the Windows PowerShell UserVoice forum or, if you can (also) reproduce the bug in PowerShell [Core] v6+, at the PowerShell Core GitHub repo.


In the meantime, you can consider using .NET directly, which also generally faster than using PowerShell's cmdlets:

[Linq.Enumerable]::Take([IO.File]::ReadLines("$PWD/input_file_name.Tmp"), 100) |
  Out-File -Encoding Default output_file_name_100.Tmp

Note:
• The use of "$PWD/" as part of the input file path, because .NET's working directory typically differs from PowerShell's.
• In PowerShell type literals ([...]), the System. part of the full type name can be omitted; thus [Linq.Enumerable] refers to System.Linq.Enumerable, and [IO.File] to System.IO.File