GPflow2: Multi output kernels (MOK) with partially shared kernels

362 Views Asked by At

I have a question regarding multi output kernels in gpflow 2. For the application I am working on, I want to create a independent multi output kernel that shares kernels across some output dimensions but not all. Two relevant classes in GPflow are the SharedIndependent and the SeparateIndependent multi output kernel classes. These however use either one shared kernel for all P output dimensions, or P separate kernels for P output dimensions.

The code attached below is a small adaption from the multi output kernels notebook (https://gpflow.readthedocs.io/en/master/notebooks/advanced/multioutput.html). The 4 output dimensions can be divided in 2 groups (2 output dimensions each): one group with high flexibility, and the other with low flexibility. I would like to use 2 kernels for this task in order to detect 2 different length-scale parameters for the Squared Exponential kernel. For completeness I have added the implementation from the example notebook for the SharedIndependent class and the SeparateIndependent class.

I have currently tried to combine these as follows: SeparateIndependent([SharedIndependent(SquaredExponential()+Linear(), output_dim=2) for _ in range(2)]). This leads to the following error: (15 is the number of inducing points)

ValueError: Dimensions must be equal, but are 2 and 15 for '{{node add_8}} = AddV2[T=DT_DOUBLE](stack, mul_13)' with input shapes: [2,15,2,15,2], [1,15,15].

I have also tried to create a new multi output kernel class for this purpose, that largely mimics the SeparateIndependent class with some alterations to the list comprehension for loop in the K(X, X2) and K_diag(X) functions. This did not turn out successful, due to not knowing how to register the correct Kuu(), Kuf() and conditional() functions for the multi output kernel class. An attempted to create an identical class to the SeparateIndependent class with only a different name failed as well.

I am happy to know if it is possible to combine the SharedIndependent and the SeparateIndependent classes, or if constructing a new MOK class is a preferable solution to the problem. If so, what is the best approach to solve the issues with registering the conditional expressions (Kuu(), Kuf() and conditional())?

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import gpflow
from gpflow.kernels import SquaredExponential, Linear, SharedIndependent, SeparateIndependent
from gpflow.inducing_variables import SharedIndependentInducingVariables, InducingPoints
from gpflow.utilities import print_summary
from gpflow.ci_utils import ci_niter
gpflow.config.set_default_float(np.float64)
np.random.seed(0)
MAXITER = ci_niter(2000)

N = 100  # number of points
D = 1  # number of input dimensions
M = 15  # number of inducing points
L = P = 4  # number of latent GPs, number of observations = output dimension

def generate_data(N=100):
    X = np.random.rand(N)[:, None] * 10 - 5  # Inputs = N x D
    G = np.hstack((0.5 * np.sin(X/2) + X, 3.0 * np.cos(X/2) - X,0.5 * np.sin(4 * X) + X, 3.0 * np.cos(4*X) - X))  # G = N x L
    W = np.array([[0.5, -0.3, 0, 0], [0.5, -0.3, 0, 0], [0, 0, -0.4, 0.6],[0.0, 0.0, 0.6, -0.4]])  # L x P
    F = np.matmul(G, W)  # N x P
    Y = F + np.random.randn(*F.shape) * [0.2, 0.2, 0.2, 0.2]

    return X, Y
X, Y = data = generate_data(N)
print(X.shape, Y.shape)
Zinit = np.linspace(-5, 5, M)[:, None]

def plot_model(m, name, lower=-7.0, upper=7.0):
    pX = np.linspace(lower, upper, 100)[:, None]
    pY, pYv = m.predict_y(pX)
    if pY.ndim == 3:
        pY = pY[:, 0, :]
    plt.plot(X, Y, "x")
    plt.gca().set_prop_cycle(None)
    plt.plot(pX, pY)
    for i in range(pY.shape[1]):
        top = pY[:, i] + 2.0 * pYv[:, i] ** 0.5
        bot = pY[:, i] - 2.0 * pYv[:, i] ** 0.5
        plt.fill_between(pX[:, 0], top, bot, alpha=0.3)
    plt.xlabel("X")
    plt.ylabel("f")
    plt.title(f"{name} kernel.")
    plt.show()

# initialization of inducing input locations (M random points from the training inputs)
Z = Zinit.copy()
# create multi-output inducing variables from Z
iv = SharedIndependentInducingVariables(InducingPoints(Z))

def optimize_model_with_scipy(model):
    optimizer = gpflow.optimizers.Scipy()
    optimizer.minimize(
        model.training_loss_closure(data),
        variables=model.trainable_variables,
        method="l-bfgs-b",
        options={"disp": True, "maxiter": MAXITER},
    )


# create multi-output kernel
kernels = [
        (SeparateIndependent([SquaredExponential() + Linear() for _ in range(P)]),'Seperate Independent'),
        (SharedIndependent(SquaredExponential()+Linear(), output_dim=P), 'Shared Independent'),
        (SeparateIndependent([SharedIndependent(SquaredExponential()+Linear(), output_dim=2) for _ in range(2)]), 'Partially shared independent')
       ]
for (kernel, name) in kernels:
    m = gpflow.models.SVGP(kernel, gpflow.likelihoods.Gaussian(), inducing_variable=iv, num_latent_gps=P)
    print_summary(m)
    optimize_model_with_scipy(m)
    print_summary(m)
    plot_model(m, name)
2

There are 2 best solutions below

0
On

Update: I have solved this issue by creating a separate multi output kernel class containing D Shared Independent multi output kernels. The conditional expression iterates over these to apply them to the appropriate output dimensions. Note that this only is implemented for Shared Independent inducing variables.

class CustomMultiOutput(MultioutputKernel, Combination):    
    def __init__(self, kernels, name=None):
        kernels = [SharedIndependent(k, output_dim=nu) for k in kernels]
        super().__init__(kernels=kernels, name=name)

    @property
    def num_latent_gps(self):
        return len(self.kernels)

    @property
    def latent_kernels(self):
        """The underlying kernels in the multioutput kernel"""
        return tuple(self.kernels)

@conditional.register(object, SharedIndependentInducingVariables, CustomMultiOutput, object)
def custom_shared_independent_conditional(
    Xnew,inducing_variable,
    kernel, f,*,
    full_cov=False, full_output_cov=False, q_sqrt=None, white=False, ):
    N, P = Xnew.shape[0], q_sqrt.shape[0]
    nu = kernel.nu
    fmeans, fvars = [], []

    for i, k in enumerate(kernel.kernels):
        nu_idx_start, nu_idx_end = int(i * kernel.nu), int((i + 1) * kernel.nu)
        f_i = f[:, nu_idx_start:nu_idx_end]
        q_sqrt_i = q_sqrt[nu_idx_start:nu_idx_end, :]
        Kmm = covariances.Kuu(inducing_variable, k, jitter=default_jitter())  # [M, M]
        Kmn = covariances.Kuf(inducing_variable, k, Xnew)  # [M, N]
        Knn = k.kernel(Xnew, full_cov=full_cov)

        fmean, fvar = base_conditional(
            Kmn, Kmm, Knn, f_i, full_cov=full_cov, q_sqrt=q_sqrt_i, white=white
        )  # [N, P],  [P, N, N] or [N, P]
        fmeans.append(tf.transpose(fmean,perm=[1,0]))
        if full_cov:
            fvars.append(tf.transpose(fvar,perm=[0,2,1]))
        else:
            fvars.append(tf.transpose(fvar,perm=[1,0]))

    fmeans = tf.transpose(tf.reshape(fmeans,shape=(-1,N)))
    if full_cov:
        fvars = tf.reshape(fvars,shape=(-1,N,N))
    else:
        fvars = tf.transpose(tf.reshape(fvars, shape=(-1,N)))

    return fmeans, expand_independent_outputs(fvars, full_cov, full_output_cov)
0
On

You should be able to tie some, but not all kernel parameters by re-using kernel objects (without any custom classes):

k_group1 = SquaredExponential() + Linear()
k_group2 = SquaredExponential() + Linear()
kernel_list = [k_group1, k_group1, k_group2, k_group2]
# or if you have many more classes: [k_group1 for _ in range(num_group1)] + [k_group2 for _ in range(num_group2)]
mo_kernel = SeparateIndependent(kernel_list)