I've written a script to help me identify duplicate files. For some reason if I split these commands and export/import to CSV it runs much faster than if I leave everything in memory. Here is my original code, it is god-awful slow:
Get-ChildItem M:\ -recurse | where-object {$_.length -gt 524288000} | select-object Directory, Name | Group-Object directory | ?{$_.count -gt 1} | %{$_.Group} | export-csv -notypeinformation M:\Misc\Scripts\Duplicates.csv
If I split this into 2 commands and export to CSV in the middle it runs about 100x faster. I'm hoping someone could shed some light on what I'm doing wrong.
Get-ChildItem M:\ -recurse | where-object {$_.length -gt 524288000} | select-object Directory, Name | Export-Csv -notypeinformation M:\Misc\Scripts\DuplicateMovies\4.csv
import-csv M:\Misc\Scripts\Duplicates\4.csv | Group-Object directory | ?{$_.count -gt 1} | %{$_.Group} | export-csv -notypeinformation M:\Misc\Scripts\Duplicates\Duplicates.csv
remove-item M:\Misc\Scripts\Duplicates\4.csv
appreciate any suggestions,
~TJ
It's not
Group-Object
that is slow, it's your grouping condition, you're asking it to groupFileInfo
objects by their.Directory
property which represents their parent folderDirectoryInfo
instance. So, you're asking the cmdlet to group objects by a very complex object as a grouping condition, instead you could use the.DirectoryName
property as your grouping condition, which represents the parent directory'sFullName
property (a simple string) or you could use the.Directory.Name
property which represents the parent's folderName
(also a simple string).To summarize, the main reason why exporting to a CSV is faster in this case, is because when
Export-Csv
receives your objects from pipeline, it calls theToString()
method on each object's property values, hence theDirectory
instance gets converted to its string representation (callingToString()
to this instance ends up being the folder'sFullName
).As for your code, if you want to keep as efficient as possible without actually overcomplicating it:
If you want to group them by the Parent
Name
instead ofFullName
, you could use: