Pipeline in Powershell

587 Views Asked by At

I was reading about how pipeline works in PowerShell at about_Pipelines, and got to know that pipeline delivers one object at a time.

So, this

Get-Service | Format-Table -Property Name, DependentServices

Is different from this

Format-Table -InputObject (Get-Service) -Property Name, DependentServices

So here, going by the explanation, in the first case, the Format-Table works on one object at at time and in the second example, Format-Table works on an array of objects. Please correct me if I am wrong.

If this is the case, then I wonder how does Sort-Object and other cmdlets that need to work on collections of data work with pipe character.

When I do :

Get-Service | Sort-Object

How is Sort-Object able to sort if it just gets to work with one object at a time. So, assume there are 100 service objects that are to be passed to Sort-Object. Will Sort-Object be called 100 times (each for one object) ? And, How will that yield in Sorted results that I see on the screen.

2

There are 2 best solutions below

4
On BEST ANSWER

Sort-Object (and other cmdlets that need to evaluate all input objects before outputting anything) work by collecting the input objects one by one, and then not doing any actual work until the upstream cmdlet (Get-Service in this case) is done sending input.

How does this work? Well, let's try and recreate Sort-Object with a PowerShell function.

To do so, we first need to understand that a cmdlet consists of 3 separate routines:

  • Begin - the Begin routines of each cmdlet in a pipeline are invoked once before anything else occurs
  • Process - this routine is invoked on each cmdlet every time input is received from an upstream command
  • End - this is invoked once the upstream command has called End and there are no more input items for Process to process

(These are the block label names used in PowerShell function definitions - in a binary cmdlet you'd override the implementation of BeginProcessing, ProcessRecord, EndProcessing methods of your cmdlet)

So, to "collect" every input item, we need to add some logic to the Process block of our command, and then we can put the code that operates on all the items in the End block:

function Sort-ObjectCustom
{
  param(
    [Parameter(Mandatory, ValueFromPipeline)]
    [object[]]$InputObject
  )

  begin {
    # Let's use the `begin` block to create a list that'll hold all the input items
    $list = [System.Collections.Generic.List[object]]::new()

    Write-Verbose "Begin was called"
  }

  process {
    # Here we simply collect all input to our list
    $list.AddRange($InputObject)

    Write-Verbose "Process was called [InputObject: $InputObject]"
  }

  end {
    # The `end` block is only ever called _after_ we've collected all input
    # Now we can safely sort it
    $list.Sort()

    Write-Verbose "End was called"

    # and output the results
    return $list
  }
}

If we invoke our new command with -Verbose, we will see how the input is collected one by one:

PS ~> 10..1 |Sort-ObjectCustom -Verbose
VERBOSE: Begin was called
VERBOSE: Process was called [InputObject: 10]
VERBOSE: Process was called [InputObject: 9]
VERBOSE: Process was called [InputObject: 8]
VERBOSE: Process was called [InputObject: 7]
VERBOSE: Process was called [InputObject: 6]
VERBOSE: Process was called [InputObject: 5]
VERBOSE: Process was called [InputObject: 4]
VERBOSE: Process was called [InputObject: 3]
VERBOSE: Process was called [InputObject: 2]
VERBOSE: Process was called [InputObject: 1]
VERBOSE: End was called
1
2
3
4
5
6
7
8
9
10

For more information on how to implement pipeline input processing routines for binary cmdlets, see the "How to Override Input Processing".

For more information on how to take advantage of the same pipeline semantics in functions, see the about_Functions_Advanced_Methods and related help topics

4
On

To complement the answer from Mathias, you can actually visualize the order of the process from an existing cmdlet using the Write-Host cmdlet which immediately writes the output to the display (rather than the pipeline):

$Data = ConvertFrom-Csv @'
Id, Name
 4, Four
 2, Two
 3, Three
 1, One
'@

Select-Object example

$Data |
    Foreach-Object { Write-Host 'in:' ($_ |ConvertTo-Json -Compress); $_ } |
    Select-Object * |
    Foreach-Object { Write-Host 'out:' ($_ |ConvertTo-Json -Compress); $_ }

Shows:

in: {"Id":"4","Name":"Four"}
out: {"Id":"4","Name":"Four"}

in: {"Id":"2","Name":"Two"}
out: {"Id":"2","Name":"Two"}
in: {"Id":"3","Name":"Three"}
out: {"Id":"3","Name":"Three"}
in: {"Id":"1","Name":"One"}
out: {"Id":"1","Name":"One"}
Id Name
-- ----
4  Four
2  Two
3  Three
1  One

Sort-Object example

$Data |
    Foreach-Object { Write-Host 'in:' ($_ |ConvertTo-Json -Compress); $_ } |
    Sort-Object * |
    Foreach-Object { Write-Host 'out:' ($_ |ConvertTo-Json -Compress); $_ }

Shows:

in: {"Id":"4","Name":"Four"}
in: {"Id":"2","Name":"Two"}
in: {"Id":"3","Name":"Three"}
in: {"Id":"1","Name":"One"}
out: {"Id":"1","Name":"One"}

out: {"Id":"2","Name":"Two"}
out: {"Id":"3","Name":"Three"}
out: {"Id":"4","Name":"Four"}
Id Name
-- ----
1  One
2  Two
3  Three
4  Four

In general, PowerShell cmdlets Write Single Records to the Pipeline where it is possible (one of the advantages of this encouraged guideline is that it reduces memory consumption). As implied by your question, Sort-Object can't do this because the last record might possibly come before the first record. But there are also exceptions where it would be technically possible to write single records according the encouraged guideline, but it is not. See e.g.: #11221 Select-Object -Unique is unnecessary slow and exhaustive