Rolling linear regression for use with groupby operation on a cuDF dataframe

352 Views Asked by At

I would like to calculate the rolling slope of y_value over x_value using cuML LinearRegression.

Sample data (cuDF dataframe):

| date       | x_value | y_value |
| ------     | ------  |  ----   |
| 2020-01-01 | 900     | 10      |
| 2020-01-01 | 905     | 15      |
| 2020-01-01 | 910     | 15      |
| 2020-01-01 | 915     | 15      |
| 2020-01-02 | 900     | 30      |
| 2020-01-02 | 905     | 40      |
| 2020-01-02 | 910     | 50      |
| ------     | ------  | ------  |

A simple function to use LinearRegression:

def RollingOLS(x, y):
    lr = LinearRegression(fit_intercept = True, normalize = False, algorithm = 'svd')
    reg = lr.fit(x, y)
    
    return reg.coef_

What I would like to do:

data.groupby('date').rolling(2).apply(RollingOLS, x=x_value, y=y_value)

However, I am getting an error: NotImplementedError: Handling UDF with null values is not yet supported. Is there any way to overcome this error? Thank you.

1

There are 1 best solutions below

1
On

In order to resolve the error NotImplementedError: Handling UDF with null values is not yet supported please reassign None/null values to another value or remove the samples which have None/null values from your DataFrame.