RAPIDS cuML linear regression running slower than statsmodels.api equivalent?

134 Views Asked by At

This is my first time posting on here so my apologies if this is the wrong place to ask or if I'm missing info. Basically I have the following code for a linear regression model using statsmodels and cuml, and I expected the rapids version to be much quicker since it's on the GPU but the actual time is slower. My drivers and libraries are all up to date and I confirmed the GPU is in use while the code is running. Does anyone have any idea why this might be the case? Any help would be appreciated

This is the code

import numpy as np
import cudf
import cuml
import statsmodels.api as sm
import time

# Generate some sample data
n_samples = 1000
n_features = 1
X = np.random.rand(n_samples, n_features)
y = np.random.rand(n_samples)
X_ = cudf.DataFrame(X)
y_ = cudf.Series(y)

start = time.time()
# Fit OLS model using statsmodels
ols_model = sm.OLS(y, X)
ols_results = ols_model.fit()
end = time.time()
print(f'ols runtime:{end-start}s')

# Fit linear regression model using cuML
reg_model = cuml.LinearRegression(fit_intercept=False)
reg_model.fit(X_, y_)
end2 = time.time()
print(f'cuml runtime:{end2-end}s')

# Print the results
print('OLS coefficients:', ols_results.params)
print('cuML coefficients:', reg_model.coef_)

and this is the output

ols runtime:0.001081705093383789s
cuml runtime:1.3555335998535156s
OLS coefficients: [0.75085789]
cuML coefficients: 0    0.750858
dtype: float64
0

There are 0 best solutions below