How to use dill in another script?

94 Views Asked by At

I have a function in one python file file1.py where I have dumped a function using dill. When I tried loading it a second file file2.py I get the error below:

TypeError: 'bytes' object is not callable

file1.py

def handle_nulls(df):
    df = df[df['account_status'].notna()]

    df = df[df['probability'].notna()]

    max = df['am_daysincelast_txn'].max()
    df['am_daysincelast_txn'].fillna(max, inplace=True)

    return df

with open("handle_nulls.dill", "wb") as f:
        dill.dump(handle_nulls, f, protocol=pickle.HIGHEST_PROTOCOL)

file2.py

with open("handle_nulls.dill", "rb") as f:
    handle_nulls = dill.load(f)

df = handle_nulls(df)

My way seem not to work.

1

There are 1 best solutions below

0
On

It's hard to give you a fill diagnosis, without the full code (i.e. no data frame), or version information -- especially if you are, say, going from one computer to another --- but it seems that your code above works, as is, at least in python 3.6, where I tested it.

>>> import dill
>>>
>>> def handle_nulls(df):
...     df = df[df['account_status'].notna()]
...     df = df[df['probability'].notna()]
...     max = df['am_daysincelast_txn'].max()
...     df['am_daysincelast_txn'].fillna(max, inplace=True)
...     return df
... 
>>> with open("handle_nulls.dill", "wb") as f:
...         dill.dump(handle_nulls, f, protocol=dill.HIGHEST_PROTOCOL)

then in a new session:

>>> import dill
>>> import pandas as pd
>>>
>>> with open('handle_nulls.dill', 'rb') as f:
...         handle_nulls = dill.load(f)
>>>
>>> import numpy as np
>>> df = pd.DataFrame(dict(account_status=pd.Series([np.nan,1,2,3,4,np.nan,6])))
>>> df['probability'] = pd.Series([0,1,np.nan,np.nan,4,5,6])
>>> df['am_daysincelast_txn'] = pd.Series([10,23,12,9,30,12,11])
>>>
>>> df
   account_status  probability  am_daysincelast_txn
0             NaN          0.0                   10
1             1.0          1.0                   23
2             2.0          NaN                   12
3             3.0          NaN                    9
4             4.0          4.0                   30
5             NaN          5.0                   12
6             6.0          6.0                   11
>>> 
>>> handle_nulls(df)
   account_status  probability  am_daysincelast_txn
1             1.0          1.0                   23
4             4.0          4.0                   30
6             6.0          6.0                   11
>>>