In the following:
cat file | cut -f 1,5,6 | sort | uniq
Is intuitive to me to think that uniq needs to know the whole dataset before proceeding.
From HERE I understand that sort does write temporary files into disk for long sets of data.
Does uniq writes temporary files into disk for long datasets? Where?
Thank you!
uniqonly needs to read a line at a time and compare the current line to the previous one; it can start working as soon as it starts getting lines; no need to read all input before producing any output.Basically, it just needs to read a line, and compare it to the previous line. If they're the same, increment a counter. If not, print the previous line (With count if requested). Then save the current line as the previous, and repeat.
Here's a bare bones version written in C you can use as an example: