Saving and Reloading a ydata-profiling / pandas-profiling ProfileReport object for later use

1.3k Views Asked by At

I am using the ydata-profiling library to generate profile reports of my pandas DataFrame. I would like to save the entire ProfileReport object, so I can load it later without having to regenerate the report.

Questions:

Is there a way to serialize the entire ProfileReport object for later use? If not, is there a recommended way to save the generated profile (maybe as JSON or another format) and later load it back as a ProfileReport object, so I can manipulate or view it?

I tried serializing the object using pickle, but encountered issues as it appears some internal components of the report are not serializable.

I also considered saving the report as a JSON and loading it back, but there doesn't seem to be a direct method to convert a saved JSON report back into a ProfileReport object.

Here's a simple setup for context:

profile = ProfileReport(df)

Saving the report as JSON

profile.to_file("report.json")

Attempt to load the report from JSON (This will throw an error because from_json is not a recognized method)

profile_json = ProfileReport.from_json("report.json")   

This line will cause an error

profile_json.to_file("profile_json.html")

Serializing the profile report using pickle

output_path = "profile.pkl"
with open(output_path, 'wb') as ppfile: 
    pickle.dump(profile, ppfile)

Deserializing the profile report from pickled file:

with open(output_path, 'rb') as ppfile:
    profile_pp = pickle.load(ppfile)
    profile_pp.to_file("profile_pp.html")
2

There are 2 best solutions below

0
Ananth Babu On

Actually it is possible to save and load with existing dump and load methods. But cant use it for compare because

if not all(is_df_available): --> 187 raise ValueError("Reports where not initialized with a DataFrame.")

profile = ProfileReport(df, config_file=config_to_use)    
profile.dump(output_profile_pp)    
loaded_profile = ProfileReport()  # Create an empty instance
loaded_profile.load(output_profile_pp)
print(type(loaded_profile))  # Should print <class 'ydata_profiling.profile_report.ProfileReport'> 
loaded_profile.to_file("loaded_report.html")
0
Ananth Babu On

This solution worked for me:

profile = ProfileReport(df,)
profile.to_file('report.html')  # Trigger the computation / alternative you can use profile.to_json() for no file output
profile.dump('my_report') # Serialize in pickle to my_report.pp

loaded_profile = ProfileReport().load('my_report.pp')  # notice that you have to instantiate an empty instance of ProfileReport

loaded_profile.df = df.head(1)  # or empty but with the columns + proper dtypes