I have a pipeline running preprocessing and then a Random Survival Forest from the SciKit-Survival package. I am trying to use Scikit-Survival's as_concordance_index_ipcw_scorer() class found here.
My pipeline looks like the following:
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('num',
Pipeline(steps=[('imputer',
SimpleImputer(strategy='median')),
('scaler',
StandardScaler())]),
Index(['IntVar1', 'IntVar2', 'IntVar3',
'IntVar4'],
dtype='object')),
('cat',
Pipeline(steps=[('imputer',
SimpleImputer(fill_value='missing',
strategy='constant')),
('onehot',
OneHotEncoder(handle_unknown='ignore',
sparse=False))]),
Index(['CharVar1', 'CharVar2', 'CharVar3'], dtype='object'))])),
('randomsurvivalforest',
RandomSurvivalForest(max_features='sqrt',
min_samples_leaf=0.005,
min_samples_split=0.01, n_estimators=150,
n_jobs=-1, oob_score=True,
random_state=200))])
This is the python code leading up to the pipeline and the fitting of the pipeline:
print("Importing global DF")
print("Creating X & Y set")
X = df.iloc[:,:-2].copy()
y = Surv.from_dataframe("AliveStatus","Target_Age",df.iloc[:,-2:].copy()) ## Creates structured array for Scikit Surv
print("Defining feature categories by data type")
numerical_features = X.select_dtypes(include=['int64', 'float64']).columns
categorical_features = X.select_dtypes(include=['object']).columns
print("Splitting dataset")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5) #SkLearn splitter
print("Defining preprocessing steps using SciKitLearn pipeline...")
## Pipeline Steps
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())])
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(sparse=False,handle_unknown='ignore'))]) ## Use "sparse=False" because Random Forests cannot take Spare Matrixes, only Dense Matrixes.
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numerical_features),
('cat', categorical_transformer, categorical_features)])
## Pipeline defining
print("Defining Random Survival Forest pipeline from SciKit Survival")
rsf = make_pipeline(
preprocessor,
RandomSurvivalForest(n_estimators=150, ## Default 100
min_samples_split=0.01, ## Default 6
min_samples_leaf=0.005, ## Default 3
max_features="sqrt", ## Defaults to none when not defined
n_jobs=-1, ## Default -1
oob_score = True,
random_state=200) ## Random State 2020
)
##Fitting & Scoring
print("Fitting dataframe to RSF Pipeline")
rsf.fit(X_train,y_train)
print("Fitting completed.")
After the fitting is completed I try to run the following:
as_concordance_index_ipcw_scorer(rsf).score(X_test,y_test)
I get the following error after:
AttributeError Traceback (most recent call last)
<ipython-input-97-9a92b22d8026> in <module>
----> 1 as_concordance_index_ipcw_scorer(rsf).score(X_test,y_test)
C:\ProgramData\Anaconda3\lib\site-packages\sksurv\metrics.py in score(self, X, y)
788 score : float
789 """
--> 790 estimate = self._do_predict(X)
791 score = self._score_func(
792 survival_train=self._train_y,
C:\ProgramData\Anaconda3\lib\site-packages\sksurv\metrics.py in _do_predict(self, X)
768
769 def _do_predict(self, X):
--> 770 predict_func = getattr(self.estimator_, self._predict_func)
771 return predict_func(X)
772
AttributeError: 'as_concordance_index_ipcw_scorer' object has no attribute 'estimator_'
An option I've tried was specifying the RSF section of the pipeline without any success:
as_concordance_index_ipcw_scorer(rsf[1]).score(X_test,y_test)
Any suggestions?
Apologies for length or missing information, I'm new to pipelines & ScikitSurvival and wanted to give as much detail as I see.
Thanks
The estimator instance from
as_concordance_index_ipcw_scorer
needs to be fitted; having fitted the underlying estimator doesn't help in this case.From the source code (of the Mixin class), fitting one of these wrappers fits the underlying estimator saving it in the new attribute
estimator_
(which is what your error complains about being missing), and also saves the training labels. So you might be able to create those attributes directly without adverse effects, but you'd be going around the expected process.