I have a text file with a large number of log messages. I want to extract the messages between two string patterns. I want the extracted message to appear as it is in the text file.
I tried the following methods. It works, but doesn't support Get-Content's -Wait and -Tail options. Also, the extracted results are displayed in one line, but not like the text file. Inputs are welcome :-)
Sample Code
function GetTextBetweenTwoStrings($startPattern, $endPattern, $filePath){
# Get content from the input file
$fileContent = Get-Content $filePath
# Regular expression (Regex) of the given start and end patterns
$pattern = "$startPattern(.*?)$endPattern"
# Perform the Regex opperation
$result = [regex]::Match($fileContent,$pattern).Value
# Finally return the result to the caller
return $result
}
# Clear the screen
Clear-Host
$input = "THE-LOG-FILE.log"
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'
# Call the function
GetTextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $input
Improved script based on Theo's answer. The following points need to be improved:
- The beginning and end of the output is somehow trimmed despite I adjusted the buffer size in the script.
- How to wrap each matched result into START and END string?
- Still I could not figure out how to use the
-Wait
and-Tail
options
Updated Script
# Clear the screen
Clear-Host
# Adjust the buffer size of the window
$bw = 10000
$bh = 300000
if ($host.name -eq 'ConsoleHost') # or -notmatch 'ISE'
{
[console]::bufferwidth = $bw
[console]::bufferheight = $bh
}
else
{
$pshost = get-host
$pswindow = $pshost.ui.rawui
$newsize = $pswindow.buffersize
$newsize.height = $bh
$newsize.width = $bw
$pswindow.buffersize = $newsize
}
function Get-TextBetweenTwoStrings ([string]$startPattern, [string]$endPattern, [string]$filePath){
# Get content from the input file
$fileContent = Get-Content -Path $filePath -Raw
# Regular expression (Regex) of the given start and end patterns
$pattern = '(?is){0}(.*?){1}' -f [regex]::Escape($startPattern), [regex]::Escape($endPattern)
# Perform the Regex operation and output
[regex]::Match($fileContent,$pattern).Groups[1].Value
}
# Input file path
$inputFile = "THE-LOG-FILE.log"
# The patterns
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'
Get-TextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $inputFile
You need to perform streaming processing of your
Get-Content
call, in a pipeline, such as withForEach-Object
, if you want to process lines as they're being read.Get-Content -Wait
, because such a call doesn't terminate by itself (it keeps waiting for new lines to be added to the file, indefinitely), but inside a pipeline its output can be processed as it is being received, even before the command terminates.You're trying to match across multiple lines, which with
Get-Content
output would only work if you used the-Raw
switch - by default,Get-Content
reads its input file(s) line by line.-Raw
is incompatible with-Wait
.Here's a proof of concept, but note the following:
-Tail 100
is hard-coded - adjust as needed or make it another parameter.The use of
-Wait
means that the function will run indefinitely - waiting for new lines to be added to$filePath
- so you'll need to use Ctrl-C to stop it.While you can use a
Get-TextBetweenTwoStrings
call itself in a pipeline for object-by-object processing, assigning its result to a variable ($result = ...
) won't work when terminating with Ctrl-C, because this method of termination also aborts the assignment operation.To work around this limitation, the function below is defined as an advanced function, which automatically enables support for the common
-OutVariable
parameter, which is populated even in the event of termination with Ctrl-C; your sample call would then look as follows (as Theo notes, don't use the automatic$input
variable as a custom variable):Per your feedback you want the block of lines to encompass the full lines on which the start and end patterns match, so the regexes below are enclosed in
.*
The word pattern in your
$startPattern
and$endPattern
parameters is a bit ambiguous in that it suggests that they themselves are regexes that can therefore be used as-is or embedded as-is in a larger regex on the RHS of the-match
operator.However, in the solution below I am assuming that they are be treated as literal strings, which is why they are escaped with
[regex]::Escape()
; simply omit these calls if these parameters are indeed regexes themselves; i.e.:The solution assumes there is no overlap between blocks and that, in a given block, the start and end patterns are on separate lines.
Each block found is output as a single, multi-line string, using LF (
"`n"
) as the newline character; if you want a CRLF newline sequences instead, use"`r`n"
; for the platform-native newline format (CRLF on Windows, LF on Unix-like platforms), use[Environment]::NewLine
.