I can sort an array using the pipeline, like this $array | Sort-object
However, I have found that the pipeline is SLOW, and mostly makes sense in quick one liners. But when writing a program, or even a quick script, I find it's better to not use the pipeline, for both readability and performance reasons. So I went to SS64 and found this page on Sort-Object
with mention of the-InputObject
parameter. And I would have thought that Sort-Object -InputObject:$array
would be the same. But no, that page specifically says of -InputObject
The objects to be sorted.
When the
-InputObject
parameter is used to submit a collection of items,Sort-Object
receives one object that represents the collection. Because one object cannot be sorted,Sort-Object
returns the entire collection unchanged.To sort objects, pipe them to
Sort-Object
.
So, what on earth is -InputObject
for then? It seems like the only possible use case is the one that doesn't work.
And, IS there a way to use Sort-Object
on an existing array, without resorting to the pipeline?
Theo's helpful answer shows a fast, in-place way to sort an array that's already in memory and stored in a variable, using a .NET API.
If you want to sort and also eliminate duplicates (the equivalent of
Sort-Object -Unique
), you can use aSystem.Collections.Generic.SortedSet<T>
instance:Note: If you use
[object]
asSortedSet
's generic type argument, you don't need the[string[]]
cast, but it's generally preferable to use specific types.Note: The result isn't an array, but it can be enumerated like one with a
foreach
statement and in the pipeline. To copy the results to an array - such as when you need to apply indexing (e.g,[0]
) - preallocate a target array of the same type and use the.CopyTo()
method to fill it ($arr = [string[]]::new($sortedSet.Count); $sortedSet.CopyTo($arr)
)An alternative is to sort the array with duplicates first, and then apply
System.Linq.Enumerable.Distinct()
afterwards:Note: The result is a lazy enumerable, not an array, but it can be enumerated like one with a
foreach
statement and in the pipeline. Call the.ToArray()
method to create an array explicitly, such as when you need to apply indexing (e.g,[0]
).As for your question:
For the majority of cmdlets, unfortunately, the
-InputObject
parameter is just an implementation detail: its purpose is to enable input via the pipeline, and its direct use with arrays (collections) is pointless, such as in the case ofSort-Object
.GitHub issue #4242 asks that
-InputObject
be clearly documented as such, and also contain a list of cmdlets that do meaningfully support direct use of-InputObject
, however, not as an alternative to pipeline input, but with different semantics, operating on an array (a collection) as a whole when-InputObject
is used; e.g.,1, 'foo' | Get-Member
works (meaningfully) differently fromGet-Member -InputObject (1, 'foo')
: the former reports the types of the array's elements, the latter the type of the array itself.Among data-processing cmdlets (as opposed to formatting cmdlets), it is effectively only
Write-Output
andOut-String
(which in part is a formatting cmdlet as well) that support direct-InputObject
use with arrays; e.g.:Even there, however the behavior differs with nested arrays, because the enumeration depths differ between the two methods:
This non-support for direct, array-valued
-InputObject
argument is unfortunate, because it can greatly speed up things, especially with data already in memory in full:Bypassing the one-by-one streaming that invariably occurs in the pipeline (requiring a handshake of sorts between the sending and the receiving command for each object) - can greatly boost performance.
An example is the
-Value
parameter of theSet-Content
cmdlet, which does accept direct array-valued arguments in lieu of pipeline input, and using-Value
directly greatly speeds up the operation.Potential improvements:
Note:
-InputObject
, but that is (a) cumbersome, due to requiring extra logic, and (b) requires either declaring the parameter as an array type (which is inefficient, because even single objects received via the pipeline are then wrapped in arrays) or declaring it asobject
, which potentially forfeits type safety.Ideally, PowerShell itself should provide this support, along the following lines:
Extend the
[Parameter]
attribute with a new, BooleanEnumerateArgument
property that can be combined with the existingValueFromPipeline
andValueFromPipelineByPropertyName
properties, which, when set to$true
, would instruct PowerShell:to implicitly accept arrays of the specified (scalar) parameter type with direct use of the parameter, e.g.,
[int[]]
for an[int]
-typed parameterand to enumerate those arrays just like in the pipeline, calling the cmdlet's
process
block (cmdlets (advanced functions) implemented in PowerShell) /.ProcessRecord()
method (binary cmdlets) for each enumerated object.A hypothetical (contrived) example:
Note:
The improvement would be available to any pipeline-binding parameter (not just
-InputObject
).Arguably,
EnumerateArgument
should be$true
by default, and that those rare cmdlets where passing an array as an argument has a different meaning, such asGet-Member
, should opt out - however, that woulld be a backward-compatibility concern.Since the proposed enhancement would still involve calling the
process
block / the.ProcessRecord()
method for each enumerated object, the speed-up won't be as dramatic as with a custom implementation that itself performs the enumeration in a single call. However, to me the prospect of unifying the behavior between the pipeline and direct-InputObject
use alone makes this improvement worthwhile.