Most significant input dimensions for GPy.GPCoregionalizedRegression?

426 Views Asked by At

I have trained successfully a multi-output Gaussian Process model using an GPy.models.GPCoregionalizedRegression model of the GPy package. The model has ~25 inputs and 6 outputs.

The underlying kernel is an GPy.util.multioutput.ICM kernel consisting of an RationalQuadratic kernel GPy.kern.RatQuad and the GPy.kern.Coregionalize Kernel.

I am now interested in the feature importance on each individual output. The RatQuad kernel provides an ARD=True (Automatic Relevance Determination) keyword, which allows to get the feature importance of its output for a single output model (which is also exploited by the get_most_significant_input_dimension() method of the GPy model).

However, calling the get_most_significant_input_dimension() method on the GPy.models.GPCoregionalizedRegression model gives me a list of indices I assume to be the most significant inputs somehow for all outputs.

How can I calculate/obtain the lengthscale values or most significant features for each individual output of the model?

1

There are 1 best solutions below

0
On BEST ANSWER

The problem is the model itself. The intrinsic coregionalized model (ICM) is set up such, that all outputs are determined by a shared underlying "latent" Gaussian Process. Thus, calling get_most_significant_input_dimension() on a GPy.models.GPCoregionalizationRegression model can only give you one set of input dimensions significant to all outputs together.

The solution is to use a GPy.util.multioutput.LCM model kernel, which is defined as a sum of ICM kernels with a list of individual (latent) GP kernels. It works as follows

import GPy

# Your data
# x = ...
# y = ...

# # ICM case
# kernel = GPy.util.multioutput.ICM(input_dim=x.shape[1],
#                                   num_outputs=y.shape[1],                                                   
#                                   kernel=GPy.kern.RatQuad(input_dim=x.shape[1], ARD=True))

# LCM case
k_list = [GPy.kern.RatQuad(input_dim=x.shape[1], ARD=True) for _ in range(y.shape[1])]
kernel = GPy.util.multioutput.LCM(input_dim=x.shape[1], num_outputs=y.shape[1],
                                              W_rank=rank, kernels_list=k_list)

A reshaping is of the data is needed (This is also necessary for the ICM model and thus independent of the scope of this questions, see here for details)

# Reshaping data to fit GPCoregionalizedRegression  
xx = reshape_for_coregionalized_regression(x)
yy = reshape_for_coregionalized_reshaping(y)

m = GPy.models.GPCoregionalizedRegression(xx, yy, kernel=kernel)
m.optimize()

After converged optimization one can call get_most_significant_input_dimension() on an individual latent GPs (here output 0).

sig_inputs_0 = m.sum.ICM0.get_most_significant_input_dimensions()

or looping over all kernels

sig_inputs = []
for part in self.gpy_model.kern.parts:
    sig_inputs.append(part.get_most_significant_input_dimensions())