Polynomialize dataset for selected columns of pd.Dataframe

35 Views Asked by At

I am new to regression concepts. I have a dataset which has a text column

enter image description here

I am using CatboostRegressor to regress the transaction time based on the other 3 features

  1. Ticket text
  2. Created Hour (Between 0-23 hours)
  3. Business day (0 for weekends and 1 for weekdays)

Since there is no linear relationship between my predictor column (Created Hour) and response column (Transaction time), I am trying to do a polynomial regression with just "Created Hour" feature but keeping the other features intact.

I have the below code

text_features = ["unprocessed_text"]
poly = PolynomialFeatures(degree=2, include_bias=True)
x_train_trans = poly.fit_transform(train_data[["Created Hour"]])
x_test_trans = poly.transform(train_data[["Created Hour"]])

train_dataset = catboost.Pool(x_train_trans, train_data[target], text_features=text_features)
test_dataset = catboost.Pool(x_test_trans, test_data[target], text_features=text_features)


#Fit the model
model = CatBoostRegressor(verbose=0, text_features=text_features, loss_function='RMSE', eval_metric = 'R2')
model.fit(train_dataset)

And the code continues.

However, I am not able to merge the dataframe with other 2 features (unprocessed_text and Business Day) to my polynomial dataframe.

I have 2 questions here.

  1. Am I doing it right by just adding a polynomial degree for just 1 feature (Because I cannot transform text features) so that I can expect a non-linear relationship between this and the response variable?

  2. If it is going in the right direction, may I know how can I concatenate the polynomial dataframe with the other 2 features?

0

There are 0 best solutions below