How to Save/Load Optimized GPy Regression Model

1.1k Views Asked by At

I'm trying to save my optimized Gaussian process model for use in a different script. My current line of thinking is to store the model information in a json file, utilizing GPy's built-in to_dict and from_dict functions. Something along the lines of:

import GPy
import numpy as np
import json

X = np.random.uniform(-3.,3.,(20,1))
Y = np.sin(X) + np.random.randn(20,1)*0.05
kernel = GPy.kern.RBF(input_dim=1, variance=1., lengthscale=1.)

m = GPy.models.GPRegression(X, Y, kernel)

m.optimize(messages=True)
m.optimize_restarts(num_restarts = 10)

jt = json.dumps(m.to_dict(save_data=False), indent=4)
with open("j-test.json", 'w') as file:
    file.write(jt)

This step works with no issues, but I run into problems when I try to load the model information using :

with open("j-test.json", 'r') as file:
    d = json.load(file)  # d is a dictionary

m2 = GPy.models.GPClassification.from_dict(d, data=None)

which gives me an assertion error because "data is not None", which it is -- or at least I think so. assertion error

I'm really new to GPy and using jsons, so I'm really not sure where I've gone astray. I tried looking into the documentation, but the documentation is a bit vague and I couldn't find an example of its use. Is there a step/concept that I missed? Also, is this the best way to store and reload my model? Any help with this would be greatly appreciated! Thanks!

2

There are 2 best solutions below

2
houseofleft On BEST ANSWER

The module pickle is your friend here!

import pickle
with open('save.pkl', 'wb') as file:
    pickle.dump(m, file)

you can call it back in a future script with:

with open('save.pkl', 'rb') as file:
    loaded_model = pickle.load(file)
0
theprogressor On

Pickle has not been suggested as the recommended method to do this. See here, in the section towards the end. Following is the example for the same.

# let X, Y be data loaded above
# Model creation:
m = GPy.models.GPRegression(X, Y)
m.optimize()
# 1: Saving a model:
np.save('model_save.npy', m.param_array)
# 2: loading a model
# Model creation, without initialization:
m_load = GPy.models.GPRegression(X, Y, initialize=False)
m_load.update_model(False) # do not call the underlying expensive algebra on load
m_load.initialize_parameter() # Initialize the parameters (connect the parameters up)
m_load[:] = np.load('model_save.npy') # Load the parameters
m_load.update_model(True) # Call the algebra only once
print(m_load)