I am trying to find the bandwidth used by the most prevalent ip addresses making requests within nginx access logs. This is what I have started out with:
$ cat /path/to/access.log |awk '{print $1}' |sort |uniq -c |sort -n |tail
($1 is the ip address, while the bytes of request is $10) - which will output:
# of requests | IP Address
1220 xxx.xxx.xxx.xxx
1347 xxx.xxx.xxx.xxx
1420 xxx.xxx.xxx.xxx
2104 xxx.xxx.xxx.xxx
etc...
What I am trying to accomplish is to identify how much bandwidth each one of these addresses is requesting. For example:
# of requests | IP Address | total bytes requested (unique to ip)
1220 xxx.xxx.xxx.xxx 45626026
1347 xxx.xxx.xxx.xxx 49565157
1420 xxx.xxx.xxx.xxx 56689122
2104 xxx.xxx.xxx.xxx 76665299
etc...
My restrictions are not too limited. So, with that said, if the possible solution would be to use more than one command to resolve upon the final query (i.e. find total bandwidth by ip), so be it. Thanks for any help provided!
With single GNU awk solution:
Sample
access.log
for demonstration purpose:The job:
PROCINFO["sorted_in"]="@val_num_desc"
- comparison of array values, to sort by IP address frequency in descending orderif(++c>10)
- ensures iterating over only first 10 items, which is emulation of tail command (gets the last 10 lines) The loop starts from the most frequent IP addressThe output: