How to replace values to binary(0-1) in Pandas for Network data?

278 Views Asked by At

I have 75 columns, and 300k captured network traffic CSV file. I am playing with data to apply ML. I need to convert IP addresses to 1 and 0 according to internal and external. So if it is

10.0.2.* > 0
others > 1

Is there an easy way to do this? I was doing the manually replace method.

df['SrcAddr'] = df['SrcAddr'].replace(['10.0.2.15','10.0.2.2'],[0,0,0])
1

There are 1 best solutions below

6
On

IIUC, you can use:

df['SrcAddr'] = df['SrcAddr'].str.startswith('10.0.2.').rsub(1)

or with a regex:

df['SrcAddr'] = df['SrcAddr'].str.fullmatch('10\.0\.2\.').rsub(1)

How it works: for each match this returns True, using rsub(1) we compute 1-True -> 0 and for each non-match 1-False -> 1

Alternative with np.where for using any value:

df['SrcAddr'] = np.where(df['SrcAddr'].str.startswith('10.0.2.'), 0, 1)

example (as new column):

     SrcAddr  SrcAddr2
0  10.0.2.42         0
1    8.8.8.8         1