Matlab to Python translation of design matrix function

813 Views Asked by At

Last year I've written a code in Matlab for a design matrix in linear regression program. It works just fine. Now, I need to translate it to Python and run in Pycharm. I've been at it for days, and while I'm really new to Python, I can't find any mistakes in my translation, but I get an error while the code is run with the rest of the program.

Code in matlab:

function DesignMatrix = design_matrix( xTrain, M )
% This function calculates the Design Matrix for
% a M-th degree polynomial
% xTrain - training set Nx1
% M - polynomial degree 0,1,2,...

N = size(xTrain,1);
DesignMatrix = zeros(N,M+1); 
for i=1:M+1
  DesignMatrix(:,i)=xTrain.^(i-1)
end
end

and my translation in Python (np stands for numpy, which is imported):

def design_matrix(x_train,M):
    '''
    :param x_train: input vector Nx1
    :param M: polynomial degree 0,1,2,...
    :return: Design Matrix Nx(M+1) for M degree polynomial
    '''
    desm = np.zeros(shape =(len(x_train), M+1))
    for i in range(1, M+1):
        desm[:,i] = np.power(x_train, (i-1))
    return desm
    pass

The error points to this line: desm[:,i] = np.power(x_train, (i-1)) and it's a value error. I tried using the online translator ompc but it seems to be outdated since it didn't work for me. Could anyone kindly explain to me if there're any obvious mistakes in my translation? I know it's a part of a bigger program, but what I'm asking is just the syntax translation itself. If it's correct, I'll try to find any other mistakes, though I didn't come up with any so far. Thank you.

Edit: Traceback

ERROR: test_design_matrix (test.TestDesignMatrix)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "...\test.py", line 61, in test_design_matrix
    dm_computed = design_matrix(x_train, M)
  File "...\content.py", line 34, in design_matrix
    desm[:,i] = np.power(x_train, (i-1))
ValueError: could not broadcast input array from shape (20,1) into shape (20)

I'm not able to change the test.py file, it's provided to me and can't be changed, so I'm only relying on the second error.

Excerpt from test.py of the function that gives the error:

def test_design_matrix(self):
    x_train = TEST_DATA['design_matrix']['x_train']
    M = TEST_DATA['design_matrix']['M']
    dm = TEST_DATA['design_matrix']['dm']
    dm_computed = design_matrix(x_train, M)
    max_diff = np.max(np.abs(dm - dm_computed))
    self.assertAlmostEqual(max_diff, 0, 8)
3

There are 3 best solutions below

7
Robert Valencia On BEST ANSWER

Can you try this:

def design_matrix(x_train,M):
    '''
    :param x_train: input vector Nx1
    :param M: polynomial degree 0,1,2,...
    :return: Design Matrix Nx(M+1) for M degree polynomial
    '''
    x_train = np.asarray(x_train)
    desm = np.zeros(shape =(len(x_train), M+1))
    for i in range(0, M+1):
        desm[:,i] = np.power(x_train, i).reshape(x_train.shape[0],)
    return desm

The error comes from incompatible Numpy array dimensions. desm[:,i] has the shape (n,), but the value you are trying to store to it has the shape (n,1), so you need to reshape it to (n,). Also, as GLR mentioned, Python indexing starts at 0 so you need to modify your indices, and function execution stops at the return line, so the pass line is not reached at all.

6
GLR On

I see three mistakes:

  • In Python, the indexing starts in zero.

  • To power all the items of an array, it is possible to use the ** operator.

  • pass does nothing, as it is put after the return statement. The function never reaches this point.

I would try this one:

def design_matrix(x_train,M):
    '''
    :param x_train: input vector Nx1
    :param M: polynomial degree 0,1,2,...
    :return: Design Matrix Nx(M+1) for M degree polynomial
    '''
    desm = np.zeros(shape =(len(x_train), M+1))
    for i in range(0, M+1):
        desm[:,i] = x_train.squeeze() ** (i-1)
    return desm
0
Bill Bell On

You might be interested to know that you can created orthogonal design matrices for polynomial regression using the patsy language and module.

>>> import numpy as np
>>> from patsy import dmatrices, dmatrix, demo_data, Poly
>>> data = demo_data("a", "b", "x1", "x2", "y", "z column")
>>> dmatrix('C(x1, Poly)', data)
DesignMatrix with shape (8, 8)
Columns:
['Intercept', 'C(x1, Poly).Linear', 'C(x1, Poly).Quadratic', 'C(x1, Poly).Cubic', 'C(x1, Poly)^4', 'C(x1, Poly)^5', 'C(x1, Poly)^6', 'C(x1, Poly)^7']
Terms:
'Intercept' (column 0), 'C(x1, Poly)' (columns 1:8)
(to view full data, use np.asarray(this_obj))
>>> dm = dmatrix('C(x1, Poly)', data)
>>> np.asarray(dm)
array([[ 1.        ,  0.23145502, -0.23145502, -0.43082022, -0.12087344,
         0.36376642,  0.55391171,  0.35846409],
       [ 1.        , -0.23145502, -0.23145502,  0.43082022, -0.12087344,
        -0.36376642,  0.55391171, -0.35846409],
       [ 1.        ,  0.07715167, -0.38575837, -0.18463724,  0.36262033,
         0.32097037, -0.30772873, -0.59744015],
       [ 1.        ,  0.54006172,  0.54006172,  0.43082022,  0.28203804,
         0.14978617,  0.06154575,  0.01706972],
       [ 1.        ,  0.38575837,  0.07715167, -0.30772873, -0.52378493,
        -0.49215457, -0.30772873, -0.11948803],
       [ 1.        , -0.54006172,  0.54006172, -0.43082022,  0.28203804,
        -0.14978617,  0.06154575, -0.01706972],
       [ 1.        , -0.07715167, -0.38575837,  0.18463724,  0.36262033,
        -0.32097037, -0.30772873,  0.59744015],
       [ 1.        , -0.38575837,  0.07715167,  0.30772873, -0.52378493,
         0.49215457, -0.30772873,  0.11948803]])