How can I modify this Powershell Script?

77 Views Asked by At

I have a single text file that contains 60K+ lines in it. Those 60K+ lines are actually around 50 or so programs written in Natural. I need to break them apart into individual programs. I have a script that works perfectly with a single flaw. The naming of the output files.

Every program starts with "Module Name=", followed by the actual name of the program. I need to split the programs and save them using the actual program names.

Using the example below, I would like to create two files called Program1.txt and Program2.txt each containing the lines belonging to them. I have a script, also below, that separates the files correctly, but I am unable to discern the correct way to capture the Program name and use that as the name of the output file.

Example:

Module Name=Program1
....
....
....
END

Module Name=Program2
....
....
....
END

Code:

$InputFile = "C:\Natural.txt"
$Reader = New-Object System.IO.StreamReader($InputFile)
$a = 1
While (($Line = $Reader.ReadLine()) -ne $null) {
    If ($Line -match "Module Name=") {
        $OutputFile = "MySplittedFileNumber$a.txt"
        $a++
    }    
    Add-Content $OutputFile $Line
}
2

There are 2 best solutions below

0
On BEST ANSWER

Combine a switch statement, which can read files line by line efficiently with -File and can match each line against regex(es) with -Regex, and use a System.IO.StreamWriter instance to write the output files efficiently:

$outStream = $null

switch -Regex -File C:\Natural.txt {
  '\bModule Name=(\w+)' {   # a module start line
    if ($outStream) { $outStream.Close() }
    $programName = $Matches[1] # Extract the program name.
    # Create a new output file.
    # Important: use a *full* path.
    $outStream = [System.IO.StreamWriter] "C:\$programName.txt"
    # Write the line at hand.
    $outStream.WriteLine($_)
  }
  default {                 # all other lines
    # Write the line at hand to the current output file.
    $outStream.WriteLine($_)    
  }
}
if ($outStream) { $outStream.Close() }

Note:

  • The code assumes that the very first line in the input file is a Module Name=... line.

  • The regex matching is case-insensitive by default, as PowerShell generally is; add -CaseSensitive, if needed.

  • The automatic $Matches variable is used to extract the program name from the matching result.

3
On

Thank you Jeff!

Here is my solution using the Split Command

$InputFile = "C:\Temp\EMNCP\Natural.txt"
$Reader = New-Object System.IO.StreamReader($InputFile)

$OPName = @()
While (($Line = $Reader.ReadLine()) -ne $null) {
    If ($Line -match "Module Name=") {
        $OPName = $Line.Split("=")
        $FileName = $OPName[1].Trim()
        Write-Host "Found ... $FileName" -foregroundcolor green
        $OutputFile = "$FileName.txt"

    }    
    Add-Content $OutputFile $Line
}