Imagine we have a txt file like the next one:
Input:
a1 D1
b1 D1
c1 D1
a1 D2
a1 D3
c1 D3
I want to count the time each element in the first column appears but also keep the information provided by the second column (someway). Potential possible output formats are represented, but any coherent alternative is also accepted:
Possible output 1:
3 a1 D1,D2,D3
1 b1 D1
2 c1 D1,D3
Possible output 2:
3 a1 D1
1 b1 D1
2 c1 D1
3 a1 D2
3 a1 D3
1 c1 D3
How can I do this? I guess a combination sort -k 1 input | uniq -c <keep col2> or perhaps using awk but I was not able to write anything that works. However, all answers are considered.
I would harness GNU
AWKfor this task following way, letfile.txtcontent bethen
gives output
Explanation: 2-pass solution (observe that
file.txtis repeated), first pass does count number of occurences of first column value storing that data into arrayarr, second pass is for printing computed number from array, followed by whole line.(tested in GNU Awk 5.0.1)