Powershell: Erase sections in a file - matched with a regex

87 Views Asked by At

I want to edit plain text files (MT940 Standard).

Here is an example file with dummy data

-
:20:296535/00000010
:21:ABNADK2AXXX
:25:ABNADK2AXXX/DK88ABNA0496434500
:28C:42/00002
:60M:C230228EUR124792,65
:61:2302280228C1750,88NTRFC1165-23-00120//656
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK47ABNA0243508514/BIC/ABNADK2A/NAME/
LOOP BV/REMI/AV-RUN 24022023/202301918/EREF/C1165-23-00120
:61:2302280228C4695,98NTRF6381310605374038//656
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK14ABNA0456766324/BIC/ABNADK2A/NAME/
DEV BV/REMI/ID16145 DEB. 1657139 FACT. 202303668 20
2303685 202303689/EREF/638131060537403857-311-2
:61:2302280228C1349,25NTRFNOTPROVIDED//658
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK46ABNA0513892443/BIC/ABNADK2A/NAME/
EXAMPLE COM/REMI/202303656/EREF/NOTPROVIDED
:61:2302280228C55845,96NTRFNOTPROVIDED//658
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK35ABNA0442867689/BIC/ABNADK2A/NAME/
BATH COMPANY DK/REMI/INV. 202228255-8426, OUR REF 2022611
73-79/EREF/NOTPROVIDED
:61:2302280228D105000,NTRFNOTPROVIDED//658
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK98INGB0657624985/BIC/INGBDK2A/NAME/
TEST/REMI/OVERBOEKING/EREF/NOTPROVIDED
:62F:C230228EUR83434,72
:64:C230228EUR83434,72
:86:/ACSI/ABNADK2AXXX
-
:20:STARTUMS TA FW
:25:28020050/0521322890
:28C:017/01
:60F:C230228GBP1473111,27
:61:2302280228D1919,29N020NONREF
:86:206?00AUSL-ZAHLUNG?100004649?20COMPANY?21S LT
D?22TRN AZV2023022800746?23URSP.-BETR.1.900,00 GBP?24KURS 0,87716
0 EUR ZU GBP?25GEGENWERT      2.00,08 EUR?26PROVISION FIX      7
,50 EUR?27SWIFT-/TELE-SPESEN 2,00 EUR?28FREMDE GEB.       12,50 E
UR?2917.02 413337?3028020050?310537246190?32HOMETESTEXAMPLE?33S 
LTD?34003
:61:2302280228D16988,81N020NONREF
:86:206?00AUSL-ZAHLUNG?100004649?20BODO GU COM?21NOT A TEST?22TRN
AZV2023022800749?23URSP.-BETR.16.980,48 GBP?24
KURS 0,877160 EUR ZU GBP?25GEGENWERT     19.358,48 EUR?26PROVISIO
N FIX      7,50 EUR?27SWIFT-/TELE-SPESEN 2,00 EUR?2830.01 INV-278
0?29*LOREM*?3028020050?310537246190?32GOLL
?33GOL COM?34003?60INFO 0800-1234
*GEB-FREI*
:61:2302280228D867,06N020NONREF
:86:206?00AUSL-ZAHLUNG?100004649?20NOTACOMPANY?21LTD?2
2TRN AZV2023022800752?23URSP.-BETR.858,73 GBP?24KURS 0,877160 EUR
 ZU GBP?25GEGENWERT        978,99 EUR?26PROVISION FIX      7,50 E
UR?27SWIFT-/TELE-SPESEN 2,00 EUR?2828.01 A221322?3028020050?31053
7246190?32KOLL?33LTD?34003
:62F:C230228GBP1453336,11
-

The script should search for lines that start with :86: and have not a slash then 4 characters and another slash following.

The regex for this is: ^:86:(?!/..../)

From this matched line the script should go up and find the next line with just a "-" and mark this as the start of the section, that should be erased. And from the matched regex line it should also go further in the file, to find the next line with only a "-" and use this (including the -) als end marker for the section, that should be erased.

this algorithm should loop through the whole file.

I have this script. And it works almost perfectly. BUT, I does not use the "-" before the matched pattern. Instead it uses the pattern-line itself as start for the section, that should be erased.

Can someone tell me what the problem is?

# Specify the path to the input file
$inputFilePath = "V:\Temp\finance\TestKopie.A01"

# Specify the path to the output file
$outputFilePath = "V:\Temp\finance\Hacked.A01"

# Function to remove sections based on pattern and "-"
function RemoveSections($content) {
    $outputContent = @()
    $eraseMode = $false
    $previousLine = ""

    for ($i = 0; $i -lt $content.Length; $i++) {
        $line = $content[$i]

        if ($line -match "^:86:(?!/..../)") {
            $eraseMode = $true

            # Find the previous "-" line
            $previousLineIndex = $i - 1
            while ($previousLineIndex -ge 0 -and $content[$previousLineIndex] -ne "-") {
                $previousLineIndex--
            }
            if ($previousLineIndex -ge 0) {
                $outputContent += $content[$previousLineIndex]
            }
        }

        if ($eraseMode -and $line -eq "-") {
            $eraseMode = $false

            # Find the next "-" line
            $nextLineIndex = $i + 1
            while ($nextLineIndex -lt $content.Length -and $content[$nextLineIndex] -ne "-") {
                $nextLineIndex++
            }
            if ($nextLineIndex -lt $content.Length) {
                $i = $nextLineIndex + 1  # Skip the section between "-" lines, including the next "-"
                continue
            }
        }

        if (!$eraseMode) {
            $outputContent += $line
        }
    }

    return $outputContent
}

# Read the input file content
$inputContent = Get-Content $inputFilePath

# Initialize variables
$iteration = 0
$linesRemoved = 0

# Remove sections based on pattern and "-" until no more changes occur
do {
    $iteration++
    Write-Host "Iteration: $iteration"
    Write-Host "Lines removed: $linesRemoved"
    $linesRemoved = 0

    # Remove sections and count the lines removed
    $outputContent = RemoveSections $inputContent
    $linesRemoved = ($inputContent.Length - $outputContent.Length)

    # Output progress
    Write-Host "Lines removed in this iteration: $linesRemoved"
    Write-Host "----------------------------"

    # Update the input content for the next iteration
    $inputContent = $outputContent
} while ($linesRemoved -gt 0)

# Save the modified content to the output file
$outputContent | Out-File $outputFilePath -Force
Write-Host "Process complete. Modified content saved to $outputFilePath"

EDIT: Here is the working script based on the regex-pattern of @wiktor-stribiżew :-)

# Specify the path to the input file
$inputFilePath = "V:\Temp\finance\Test.A01"

# Specify the path to the output file
$outputFilePath = "V:\Temp\finance\Hacked.A01"

# Read the input file content
$inputContent = Get-Content $inputFilePath -Raw

# Perform the replacement
$modifiedContent = $inputContent -replace '(?sm)^-(?:(?!^-\r?$).)*?^:86:(?!/..../)(?:(?!^-\r?$).)*'

# Save the modified content to the output file
$modifiedContent | Set-Content $outputFilePath -Force
2

There are 2 best solutions below

0
On BEST ANSWER

You can use

(?sm)^-(?:(?!^-\r?$).)*?^:86:(?!/..../)(?:(?!^-\r?$).)*

to simply remove the whole block of text from the entire text contents once you load it into memory as a single string.

See the regex demo. Details:

  • (?sm) - regex flags that tell the regex engine to make ^ and $ match start/end of any line (m) and to make the . match newlines, too
  • ^ - matches start of a line
  • - - a - char
  • (?:(?!^-\r?$).)*? - any char, zero or more but as few as possible occurrences, that is not a single - on an entire line
  • ^:86: - start of a line and :86:
  • (?!/..../) - immediately to the right, there must be no / + four any chars + /
  • (?:(?!^-\r?$).)* - any char, zero or more but as many as possible occurrences, that is not a single - on an entire line.

In PowerShell, you can use

# Specify the path to the input file
$inputFilePath = "V:\Temp\finance\Test.A01"

# Specify the path to the output file
$outputFilePath = "V:\Temp\finance\Hacked.A01"

# Read the input file content
$inputContent = Get-Content $inputFilePath -Raw

# Perform the replacement
$modifiedContent = $inputContent -replace '(?sm)^-(?:(?!^-\r?$).)*?^:86:(?!/..../)(?:(?!^-\r?$).)*'

# Save the modified content to the output file
$modifiedContent | Set-Content $outputFilePath -Force

NOTE: Since the s flag is in use, you probably want to replace /..../ with /[^\r\n]{4}/ to match any four chars that are not carriage returns nor line feed chars.

4
On

I modified your script, that it iterates through every file in a folder, runs the replacement and then saves the file with the same name in anoter folder.

It runs forever by just 2 files with 92 KB and 23KB. Is this a problem with the -raw import function again?

# Specify the path to the input folder
$inputFolderPath = "V:\Temp\finance\input"

# Specify the path to the output folder
$outputFolderPath = "V:\Temp\finance\output"

# Get all files in the input folder
$inputFiles = Get-ChildItem -Path $inputFolderPath -File

foreach ($inputFile in $inputFiles) {
    # Construct the output file path
    $outputFilePath = Join-Path -Path $outputFolderPath -ChildPath $inputFile.Name

    # Read the input file content
    $inputContent = Get-Content -Path $inputFile.FullName -Raw

    # Perform the replacement
    $modifiedContent = $inputContent -replace '(?m)^-(?:\r?\n(?!-\r?$).*)*?^:86:(?!/..../).*(?:\n(?!-\r?$).*)*'

    # Save the modified content to the output file
    $modifiedContent | Set-Content -Path $outputFilePath -Force

    Write-Output "Modified content saved to: $outputFilePath"
}

Write-Output "Process complete."

Here are 2 sample files i used. Example1.A01

-
:20:296535/00000010
:21:ABNADK2AXXX
:25:ABNADK2AXXX/DK88ABNA0496434500
:28C:42/00002
:60M:C230228EUR124792,65
:61:2302280228C1750,88NTRFC1165-23-00120//656
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK47ABNA0243508514/BIC/ABNADK2A/NAME/
LOOP BV/REMI/AV-RUN 24022023/202301918/EREF/C1165-23-00120
:61:2302280228C4695,98NTRF6381310605374038//656
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK14ABNA0456766324/BIC/ABNADK2A/NAME/
DEV BV/REMI/ID16145 DEB. 1657139 FACT. 202303668 20
2303685 202303689/EREF/638131060537403857-311-2
:61:2302280228C1349,25NTRFNOTPROVIDED//658
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK46ABNA0513892443/BIC/ABNADK2A/NAME/
EXAMPLE COM/REMI/202303656/EREF/NOTPROVIDED
:61:2302280228C55845,96NTRFNOTPROVIDED//658
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK35ABNA0442867689/BIC/ABNADK2A/NAME/
BATH COMPANY DK/REMI/INV. 202228255-8426, OUR REF 2022611
73-79/EREF/NOTPROVIDED
:61:2302280228D105000,NTRFNOTPROVIDED//658
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK98INGB0657624985/BIC/INGBDK2A/NAME/
TEST/REMI/OVERBOEKING/EREF/NOTPROVIDED
:62F:C230228EUR83434,72
:64:C230228EUR83434,72
:86:/ACSI/ABNADK2AXXX
-
:20:STARTUMS TA FW
:25:28020050/0521322890
:28C:017/01
:60F:C230228GBP1473111,27
:61:2302280228D1919,29N020NONREF
:86:206?00AUSL-ZAHLUNG?100004649?20COMPANY?21S LT
D?22TRN AZV2023022800746?23URSP.-BETR.1.900,00 GBP?24KURS 0,87716
0 EUR ZU GBP?25GEGENWERT      2.00,08 EUR?26PROVISION FIX      7
,50 EUR?27SWIFT-/TELE-SPESEN 2,00 EUR?28FREMDE GEB.       12,50 E
UR?2917.02 413337?3028020050?310537246190?32HOMETESTEXAMPLE?33S 
LTD?34003
:61:2302280228D16988,81N020NONREF
:86:206?00AUSL-ZAHLUNG?100004649?20BODO GU COM?21NOT A TEST?22TRN
AZV2023022800749?23URSP.-BETR.16.980,48 GBP?24
KURS 0,877160 EUR ZU GBP?25GEGENWERT     19.358,48 EUR?26PROVISIO
N FIX      7,50 EUR?27SWIFT-/TELE-SPESEN 2,00 EUR?2830.01 INV-278
0?29*LOREM*?3028020050?310537246190?32GOLL
?33GOL COM?34003?60INFO 0800-1234
*GEB-FREI*
:61:2302280228D867,06N020NONREF
:86:206?00AUSL-ZAHLUNG?100004649?20NOTACOMPANY?21LTD?2
2TRN AZV2023022800752?23URSP.-BETR.858,73 GBP?24KURS 0,877160 EUR
 ZU GBP?25GEGENWERT        978,99 EUR?26PROVISION FIX      7,50 E
UR?27SWIFT-/TELE-SPESEN 2,00 EUR?2828.01 A221322?3028020050?31053
7246190?32KOLL?33LTD?34003
:62F:C230228GBP1453336,11
-

Example1.A02

:20:2303191/10060276
:25:TERPPLPP123/PL97150011711211700657640000
:28C:2382/1
:60F:C230317PLN47131,36
:62F:C230317PLN47131,36
:64:C230317PLN47131,36
:65:C230317PLN47131,36
-
:20:6576249850000001
:25:KOP123/DK98INGB0657624985EUR
:28C:78/1
:60F:D230318EUR294657,59
:62F:D230319EUR294657,59
:64:D230319EUR294657,59
:65:D230320EUR294657,59
:65:D230321EUR294657,59
:86:/NAME/ROK//BIC/KOP//SUM/0/0/0,00/0,00/
-
:20:2303201/10060276
:25:TERPPLPP123/PL97150011711211700657640000
:28C:2383/1
:60F:C230319PLN47131,36
:62F:C230319PLN47131,36
:64:C230319PLN47131,36
:65:C230319PLN47131,36
-
:20:0096803070000001
:25:KOP123/DK12INGB0009680307EUR
:28C:78/1
:60F:C230318EUR536088,75
:62F:C230319EUR536088,75
:64:C230319EUR536088,75
:65:C230320EUR536088,75
:65:C230321EUR536088,75
:86:/NAME/NO COMP//BIC/KOP//SUM/0/0/0,00/0,00
/
-
:20:2303191/10060276
:25:TERPPLPP123/PL55150011711211700657510000
:28C:2382/1
:60F:C230317PLN4202368,10
:61:2303150317DN566,25NTRFNONREF//074/23031900001
VB LOREM IPSUM
:86: VB ELECTR. 5151468 JUST A BANK TEST
Ref:2033249
:61:230317DN8709,91NTRFNONREF//074/23031900002
z/389/02/2023
:86:82105017641000002272840402 10501764 BANK
Eurotrans Sp. z o.o. z/389/02/2023 Ref:806492323
:61:230317DN25533,32NTRFNONREF//074/23031900003
fc2301166,2301169,2301170,2301
:86:72175011520000000020335181 17501152 BNPPL O./GEPP
fc2301166,2301169,2301170,2301176 Ref:806492325
:61:230317DN140,22NTRFNONREF//074/23031900004
31/12521324
:86:15175013125650000001578188 17501312 EXAMPLE COMP 31/12521324
Ref:806492326
:62M:C230317PLN4167418,40
:64:C230317PLN4167418,40
:65:C230317PLN4167418,40
-