Sort-Object -inputObject

831 Views Asked by At

I can sort an array using the pipeline, like this $array | Sort-object However, I have found that the pipeline is SLOW, and mostly makes sense in quick one liners. But when writing a program, or even a quick script, I find it's better to not use the pipeline, for both readability and performance reasons. So I went to SS64 and found this page on Sort-Object with mention of the
-InputObject parameter. And I would have thought that Sort-Object -InputObject:$array would be the same. But no, that page specifically says of -InputObject

The objects to be sorted.

When the -InputObject parameter is used to submit a collection of items, Sort-Object receives one object that represents the collection. Because one object cannot be sorted, Sort-Object returns the entire collection unchanged.

To sort objects, pipe them to Sort-Object.

So, what on earth is -InputObject for then? It seems like the only possible use case is the one that doesn't work. And, IS there a way to use Sort-Object on an existing array, without resorting to the pipeline?

3

There are 3 best solutions below

0
On

Theo's helpful answer shows a fast, in-place way to sort an array that's already in memory and stored in a variable, using a .NET API.

If you want to sort and also eliminate duplicates (the equivalent of Sort-Object -Unique), you can use a System.Collections.Generic.SortedSet<T> instance:

PS> [System.Collections.Generic.SortedSet[string]]::new(
      [string[]] ('foo', 'bar', 'baz', 'foo'), 
      [System.StringComparer]::InvariantCultureIgnoreCase
    )

bar
baz
foo

Note: If you use [object] as SortedSet's generic type argument, you don't need the [string[]] cast, but it's generally preferable to use specific types.

Note: The result isn't an array, but it can be enumerated like one with a foreach statement and in the pipeline. To copy the results to an array - such as when you need to apply indexing (e.g, [0]) - preallocate a target array of the same type and use the .CopyTo() method to fill it ($arr = [string[]]::new($sortedSet.Count); $sortedSet.CopyTo($arr))

An alternative is to sort the array with duplicates first, and then apply System.Linq.Enumerable.Distinct() afterwards:

[string[]] $arr = 'foo', 'bar', 'baz', 'foo'
[Array]::Sort($arr, [System.StringComparer]::InvariantCultureIgnoreCase)
[Linq.Enumerable]::Distinct($arr)

Note: The result is a lazy enumerable, not an array, but it can be enumerated like one with a foreach statement and in the pipeline. Call the .ToArray() method to create an array explicitly, such as when you need to apply indexing (e.g, [0]).


As for your question:

So, what on earth is -InputObject for then?

For the majority of cmdlets, unfortunately, the -InputObject parameter is just an implementation detail: its purpose is to enable input via the pipeline, and its direct use with arrays (collections) is pointless, such as in the case of Sort-Object.

  • GitHub issue #4242 asks that -InputObject be clearly documented as such, and also contain a list of cmdlets that do meaningfully support direct use of -InputObject, however, not as an alternative to pipeline input, but with different semantics, operating on an array (a collection) as a whole when -InputObject is used; e.g., 1, 'foo' | Get-Member works (meaningfully) differently from Get-Member -InputObject (1, 'foo'): the former reports the types of the array's elements, the latter the type of the array itself.

  • Among data-processing cmdlets (as opposed to formatting cmdlets), it is effectively only Write-Output and Out-String (which in part is a formatting cmdlet as well) that support direct -InputObject use with arrays; e.g.:

    # Both commands produce the same output.
    1, 2 | Write-Output
    Write-Output -InputObject 1, 2
    
    • Even there, however the behavior differs with nested arrays, because the enumeration depths differ between the two methods:

      # NOT the same, due to nesting.
      1, (2, (3, 4)) | Write-Output # -> 1, 2, (3, 4)
      Write-Output -InputObject 1, (2, (3, 4)) # -> 1, (2, (3, 4))
      

This non-support for direct, array-valued -InputObject argument is unfortunate, because it can greatly speed up things, especially with data already in memory in full:

Bypassing the one-by-one streaming that invariably occurs in the pipeline (requiring a handshake of sorts between the sending and the receiving command for each object) - can greatly boost performance.

An example is the -Value parameter of the Set-Content cmdlet, which does accept direct array-valued arguments in lieu of pipeline input, and using -Value directly greatly speeds up the operation.

# Write 100,000 (1e5) numbers to a file:
# Via the pipeline.
1..1e5 | Set-Content temp.txt
# Via -Value - this is much, much faster.
# E.g. on my macOS machine with PowerShell 7.1, about 100(!) times faster.
Set-Content temp.txt -Value (1..1e5)

Potential improvements:

Note:

  • A cmdlet is in theory free to implement its own array support for -InputObject, but that is (a) cumbersome, due to requiring extra logic, and (b) requires either declaring the parameter as an array type (which is inefficient, because even single objects received via the pipeline are then wrapped in arrays) or declaring it as object, which potentially forfeits type safety.

Ideally, PowerShell itself should provide this support, along the following lines:

  • Extend the [Parameter] attribute with a new, Boolean EnumerateArgument property that can be combined with the existing ValueFromPipeline and ValueFromPipelineByPropertyName properties, which, when set to $true, would instruct PowerShell:

    • to implicitly accept arrays of the specified (scalar) parameter type with direct use of the parameter, e.g., [int[]] for an [int]-typed parameter

    • and to enumerate those arrays just like in the pipeline, calling the cmdlet's process block (cmdlets (advanced functions) implemented in PowerShell) / .ProcessRecord() method (binary cmdlets) for each enumerated object.

A hypothetical (contrived) example:

function ConvertTo-Long {

  [CmdletBinding()]
  param(
    # WISHFUL THINKING: implicit array support for direct -InputObject arguments
    [Parameter(ValueFromPipeline, EnumerateArgument)]
    [int] $InputObject
  )

  process {
    [long] $InputObject   
  }

}

# The following calls would then be equivalent:
1, 2, 3 | ConvertTo-Long
ConvertTo-Long -InputObject 1, 2, 3

Note:

  • The improvement would be available to any pipeline-binding parameter (not just -InputObject).

  • Arguably, EnumerateArgument should be $true by default, and that those rare cmdlets where passing an array as an argument has a different meaning, such as Get-Member, should opt out - however, that woulld be a backward-compatibility concern.

  • Since the proposed enhancement would still involve calling the process block / the .ProcessRecord() method for each enumerated object, the speed-up won't be as dramatic as with a custom implementation that itself performs the enumeration in a single call. However, to me the prospect of unifying the behavior between the pipeline and direct -InputObject use alone makes this improvement worthwhile.

0
On

Sort-Object -InputObject is meant to be used from the pipeline, not invoked as a direct argument. Use -InputObject like so:

1, 5, 10, 3, 2 | Sort-Object # ==> 1, 2, 3, 5, 10

When you pass a collection via the pipeline, each element is sent to the cmdlet as it becomes available. The collection is unrolled and each element is processed individually by the cmdlet.

When you invoke the argument directly, the collection is not unrolled. so in the case of
ForEach-Object rather than sorting the several elements individually it sees a single object and returns just that object. Yes, it's a collection, but the cmdlet is designed to use the unrolling mechanic of the PowerShell pipeline, it is not designed to unroll the collection itself.

It can be a bit confusing, and to make it clear as mud many cmdlets work fine when the pipeline argument is directly invoked (often -InputObject but other names are also used). But as a rule of thumb if -InputObject (or whatever the ValueFromPipeline parameter is named) takes a collection, I invoke it using the pipeline.


If you want a faster sort method, you can use the Array.Sort static method instead:

[Array]::Sort($array)

Array.ForEach also performs faster than ForEach-Object or the foreach statement, and Array.Where performs better than Where-Object. The trade-off with using the Array static methods is you have a chink in the pipeline chain, as you can't pipe data into a .NET method.


To address your concern about relying on Sort-Object -Unique, you can pipe the result of Array.Sort to Select-Object -Unique instead. For example:

$myArray = 1, 3, 1, 2, 5, 7, 6, 3, 8, 9

# Array.Sort sorts in-place so we need to sort the array, then uniqify it
[Array]::Sort($myArray)
$myArray | Select-Object -Unique

will output a sorted collection with unique values:

1
2
3
5
6
7
8
9
0
On

Depending on what is inside your $array, you can probably use .Net Array.Sort() for faster performance when sorting the elements in a one-dimensional array.

[array]::Sort($array)

Type [array]::Sort without the brackets and hit enter to see all available OverloadDefinitions.


Good point by mklement0

The above sorts case-sensitively, while PowerShell's Sort-Object works case-insensitively.

To have that same effect when using .Net [array]::Sort(), you can add a second parameter:

[array]::Sort($array, [System.StringComparer]::OrdinalIgnoreCase)