AWK-Get total count of records for numerical grouped column

186 Views Asked by At

I have a variable which splits the results of a column based on a condition (group by in others programming languages).

I'm trying to have a variable that counts the NR of each group. If we sum all the groups we should have the NR of the file.

When I try to use NR in the calculation for example NR[variable that splits], I get a fatal error "you tried to use scalar as matrix.

Any ideas how to use NR as a variable, but not counting all the records, only those from each group?

sex, weight

male,50
female,49
female,48
male,66
male,78
female,98
male,74
male,54
female,65

In this case the NR would be 9 BUT, in reality I want a way to get that NR of male is 5 and 4 for female.

I have the total sum of weigth column but struggle to get the avg:

sex= $(f["sex"])   
ccWeight[sex] += $(f["weight"])
avgWeight = ccWeight[sex] / ¿?

Important: I don't need to print the result as of now, just to store this number on a variable.

2

There are 2 best solutions below

9
On

One awk idea:

awk -F, '
NR>1 { counts[$1]++              # keep count of each distinct sex
       counts_total++            # replace dependency on NR
       weight[$1]+=$2            # keep sum of weights by sex
     }
END  { for (i in counts) {
           printf "%s: (count) %s of %s (%.2f%)\n",i,counts[i],counts_total,(counts[i]/counts_total*100)
           printf "%s: (avg weight) %.2f ( %s / %s )\n",i,(weight[i]/counts[i]),weight[i],counts[i]
       }
     }
' sample.dat

NOTE:

  • OP can add additional code to verify total counts and weights are not zero (so as to keep from generating a 'divide by zero' error)
  • perhaps print a different message if there are no (fe)male records to process?

This generates:

female: (count) 4 of 9 (44.44%)
female: (avg weight) 65.00 ( 260 / 4 )
male: (count) 5 of 9 (55.56%)
male: (avg weight) 64.40 ( 322 / 5 )
0
On

GNU datamash might be what you are looking for, e.g.:

<infile datamash -Hst, groupby 1 count 1 sum 2 mean 2 | column -s, -t

Output:

GroupBy(sex)  count(sex)  sum(weight)  mean(weight)
female        4           260          65
male          5           322          64.4