I'm experimenting with approaches to speeding up the calculation of the MSE gradient function. I thought a good way to save time would be to precalculate some of the matrix operations, which I've attempted in the function mse_grad_2 below. However, when I compare to a naive function the results differ (only by a very small amount, but I want to figure out why they aren't exactly the same).
set.seed(1)
y <- rnorm(10)
X <- matrix(rnorm(10*15), ncol=15, nrow=10)
beta <- rnorm(15)
mse_grad <- function(y, X, input, n) {
r <- -(1/n)*t(X) %*% (y - X %*% input)
return(r)
}
mse_grad_2 <- function(XTY, XTX, input, n) {
r <- -(1/n)*(XTY - XTX %*% input)
return(r)
}
mean(mse_grad(y, X, beta, 10) - mse_grad_2(t(X) %*% y, t(X) %*% X, beta, 10))
# [1] 4.810966e-17
This is due to finite-precision arithmetic issues (see e.g. https://en.wikipedia.org/wiki/Floating-point_arithmetic). For the same reasons
See below: