XGBoost train_model function fails when adding a learning rate hyperparameter (using Snowpark)

116 Views Asked by Rafid Sarker At 07 June 2025 at 14:11

I'm trying to add a learning rate hyperparameter to my train_model function, which uses XGBoost for regression in a Snowflake environment. However, whenever I include the learning rate parameter, the function fails with an error. The function works fine without the learning rate, so I suspect there's an issue with the way I'm adding the learning rate.

The main train_model function:

from typing import Tuple
import numpy as np
import snowflake.snowpark.types as T

"""
Trains an XGBoost model using the provided Snowflake session, table, features, and target variable.

Args:
    session (snowflake.snowpark.Session): Snowflake session object for connecting to Snowflake.
    table (str): Name of the table in Snowflake containing the training data.
    features (list): List of feature column names to be used for training.
    target_variable (str): Name of the target variable column.
    cat_cols (list): List of categorical column names in the feature set.
    num_cols (list): List of numerical column names in the feature set.

Returns:
    float: Root mean squared error (RMSE) of the trained XGBoost model on the validation set.
"""

def train_model(session: snowflake.snowpark.Session, 
                table: str, 
                features: list, 
                target_variable: str,
                cat_cols: list,
                num_cols: list) -> T.Variant:

    # Load the Snowflake table
    snowdf = session.table(table)

    # Split the data into training and validation sets
    snowdf_train, snowdf_valid = snowdf.random_split([0.75, 0.25], seed=123)

    # Save the train and validation sets in Snowflake
    snowdf_train.write.mode("overwrite").save_as_table("lapse_data_train")
    snowdf_valid.write.mode("overwrite").save_as_table("lapse_data_valid")

    # Prepare the training and validation data
    train_x = snowdf_train[features].to_pandas()  # Drop labels for the training set
    train_y = snowdf_train.select(target_variable).to_pandas()
    valid_x = snowdf_valid[features].to_pandas()
    valid_y = snowdf_valid.select(target_variable).to_pandas()

    # Define the preprocessing steps for numerical and categorical features
    num_pipeline = Pipeline([
        ('imputer', SimpleImputer(strategy="median")),  # Impute missing values with median
        ('std_scaler', StandardScaler()),  # Scale the numerical features
    ])

    preprocessor = ColumnTransformer(
        transformers=[
            ('num', num_pipeline, num_cols),  # Apply the numerical pipeline to numerical features
            ('encoder', OneHotEncoder(handle_unknown="ignore"), cat_cols),  # One-hot encode categorical features
        ]
    )

    # Construct the pipeline with preprocessing and XGBoost model
    pipe = Pipeline([
        ('preprocessor', preprocessor),
        ('xgboost', XGBRegressor(learning_rate=0.01)),  # XGBoost regression model
    ])

    # Train the model
    pipe.fit(train_x, train_y)

    # Make predictions on the validation set
    valid_preds = pipe.predict(valid_x)

    # Calculate the root mean squared error (RMSE) of the predictions
    rmse = mean_squared_error(valid_y, valid_preds, squared=False)
    
    # Save the trained model to a file
    model_file = os.path.join('/tmp', 'model.joblib')
    joblib.dump(pipe, model_file)
    session.file.put(model_file, "@SANDBOX_SGATE", overwrite=True)

    return rmse

The error when I try to write the stored procedure to Snowflake:

# Now create a stored procedure of the train function and export to Snowflake
train_model_sp = F.sproc(train_model, 
                         session=session, 
                         replace=True,
                         is_permanent=True, 
                         name="xgboost_sproc", 
                         stage_location="@SANDBOX_SGATE")

ProgrammingError: 091003 (22000): Failure using stage area. Cause: [SANDBOX_SGATE GET and PUT commands are not supported with external stage]

I've reviewed the XGBoost documentation for Python and verified that the learning rate parameter is valid.
I've checked my imports and made sure all required libraries, including the Snowpark Python Connector, are installed.
I've attempted various ways of adding the learning rate parameter, but it still results in an error.

Original Q&A

XGBoost train_model function fails when adding a learning rate hyperparameter (using Snowpark)

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in SNOWFLAKE-CLOUD-DATA-PLATFORM

Related Questions in XGBOOST

Related Questions in XGBREGRESSOR

Trending Questions

Popular # Hahtags

Popular Questions