NGINX logs awk find bandwidth by IP address

2.5k Views Asked by At

I am trying to find the bandwidth used by the most prevalent ip addresses making requests within nginx access logs. This is what I have started out with:

$ cat /path/to/access.log |awk '{print $1}' |sort |uniq -c |sort -n |tail

($1 is the ip address, while the bytes of request is $10) - which will output:

# of requests | IP Address
1220 xxx.xxx.xxx.xxx
1347 xxx.xxx.xxx.xxx
1420 xxx.xxx.xxx.xxx
2104 xxx.xxx.xxx.xxx
etc...    

What I am trying to accomplish is to identify how much bandwidth each one of these addresses is requesting. For example:

# of requests | IP Address | total bytes requested (unique to ip)
1220 xxx.xxx.xxx.xxx 45626026
1347 xxx.xxx.xxx.xxx 49565157
1420 xxx.xxx.xxx.xxx 56689122
2104 xxx.xxx.xxx.xxx 76665299
etc...

My restrictions are not too limited. So, with that said, if the possible solution would be to use more than one command to resolve upon the final query (i.e. find total bandwidth by ip), so be it. Thanks for any help provided!

1

There are 1 best solutions below

2
On

With single GNU awk solution:

Sample access.log for demonstration purpose:

127.0.0.1 - - [15/Aug/2017:09:38:35 +0300] "GET / HTTP/1.1" 200 111 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0"
127.0.0.1 - - [15/Aug/2017:09:38:46 +0300] "GET / HTTP/1.1" 200 171 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0"
127.0.0.1 - - [15/Aug/2017:09:59:38 +0300] "GET /favicon.ico HTTP/1.1" 404 152 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0"
127.0.0.1 - - [15/Aug/2017:09:59:39 +0300] "GET /favicon.ico HTTP/1.1" 404 1502 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0"
127.0.0.1 - - [15/Aug/2017:11:04:45 +0300] "GET / HTTP/1.1" 200 23976 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0"
127.0.0.2 - - [15/Aug/2017:09:38:35 +0300] "GET / HTTP/1.1" 200 14111 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gec$
127.0.0.2 - - [15/Aug/2017:09:38:46 +0300] "GET / HTTP/1.1" 200 1414 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gec$
127.0.0.2 - - [15/Aug/2017:09:59:38 +0300] "GET /favicon.ico HTTP/1.1" 404 1522 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; r$
127.0.0.2 - - [15/Aug/2017:09:59:39 +0300] "GET /favicon.ico HTTP/1.1" 404 1332 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; r$
127.0.0.3 - - [15/Aug/2017:11:04:45 +0300] "GET / HTTP/1.1" 200 23976 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) G$
127.0.0.1 - - [15/Aug/2017:09:38:35 +0300] "GET / HTTP/1.1" 200 141 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gec$
127.0.0.1 - - [15/Aug/2017:09:38:46 +0300] "GET / HTTP/1.1" 200 1041 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gec$
127.0.0.3 - - [15/Aug/2017:09:59:38 +0300] "GET /favicon.ico HTTP/1.1" 404 1529 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; r$
127.0.0.1 - - [15/Aug/2017:09:59:39 +0300] "GET /favicon.ico HTTP/1.1" 404 1026 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; r$
127.0.0.1 - - [15/Aug/2017:11:04:45 +0300] "GET / HTTP/1.1" 200 23976 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) G$
127.0.0.3 - - [15/Aug/2017:09:38:35 +0300] "GET / HTTP/1.1" 200 1414 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gec$
127.0.0.1 - - [15/Aug/2017:09:38:46 +0300] "GET / HTTP/1.1" 200 13341 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gec$
127.0.0.3 - - [15/Aug/2017:09:59:38 +0300] "GET /favicon.ico HTTP/1.1" 404 172 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; r$
127.0.0.3 - - [15/Aug/2017:09:59:39 +0300] "GET /favicon.ico HTTP/1.1" 404 1502 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; r$
127.0.0.3 - - [15/Aug/2017:11:04:45 +0300] "GET / HTTP/1.1" 200 23976 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) G$

The job:

awk 'BEGIN{ PROCINFO["sorted_in"]="@val_num_desc" }
     { a[$1]++; b[$1]+=$10 }
     END{ 
         for(i in a) { if(++c>10) break; print i,b[i] } 
     }' /path/to/access.log
  • PROCINFO["sorted_in"]="@val_num_desc" - comparison of array values, to sort by IP address frequency in descending order

  • if(++c>10) - ensures iterating over only first 10 items, which is emulation of tail command (gets the last 10 lines) The loop starts from the most frequent IP address

The output:

127.0.0.1 65437
127.0.0.3 52569
127.0.0.2 18379