Packet profile from netflow

359 Views Asked by At

I have netflow data from previous month in files per 5 minutes and I would like to do a packet profile of all this traffic. I need percentage representation of 1 packet flows, 2 packet flows etc. It is possible to do it in categories like 1 packet flow, 1-100 packet flows, 100 and more... Its not so important. But my question is how to do it. How to do percentage representation of data which I can't add together? Something like do percentage representation for every file and then do some type of average from it?

2

There are 2 best solutions below

0
On

It sounds like you're describing a histogram: You create 'bins' of the size you describe with the raw counts. The sum of the counts for the bins is the total number of sessions. To get the percentages of the total traffic, you just normalize by dividing each bin by the total flow count.

So, if you do a two-bin histogram where the first bin is the count of all sessions with < 100 packet flows and the other 100+ packet flows (note that there can't be gaps or overlaps), and it works out to 30 flows in the former and 60 in the latter, then the total number of flows is 90, and you have 33% of the flows being fewer than 100 packets.

When working with multiple files, the trick is to always use the same bin delineations and to store and work with the raw counts as long as possible and only derive the %s as the very last step. You can add together histograms with no trouble as long as their bins mean the same thing, and then when you normalize the result, you have for each bin the total percent for all files. If you're going to need to add a file, just keep track of the raw counts so that you can re-normalize when there's new data.

You can do this in a tool like Matlab pretty easily, but be careful because many of these tools will very kindly auto-determine bin widths for you. So, the histogram for one file might have bins {x < 100, 100 <= x < 200, x >= 200} and another file, {x < 90, 90 <= x < 180, x >=180} and you won't be able to add the results together.

0
On

What do you mean with "I can't add together"? Actually you can do that with nfdump, if you look at the manual: -R expr /dir/file1:file2 Read all files from file1 to file2. For istance

nfdump -R /yournetflowfolder/nfcapd.201204051609:nfcapd.201204051639

will gather NetFlow informations from 16:09 to 16:39. Then you can do whatever query you need on that data.