HDBSCAN : clustering , persistance and approximate_predict()

768 Views Asked by At

I want to cache my model results in order to make predictions without redoing the clustering.
I read that I can do that with memory parameter in HDBSCAN.
I did that instead because I wanted to save the file in the same directory as my script instead of '/tmp/joblib' that's here ((HDBSCAN cluster caching and persistance)) :

clusterer = hdbscan.HDBSCAN(min_cluster_size=30, prediction_data=True).fit(data) 
# save the model to disk
filename = 'finalized_model.joblib'
joblib.dump(clusterer, filename)

I then tried to load the model in a different file:

from joblib import load 

# load the model
model = load('finalized_model.joblib')
# make predictions
test_labels, strengths = model.approximate_predict(model, test_points)

But I got this error: AttributeError: 'HDBSCAN' object has no attribute 'approximate_predict' Last time I got this error, it was because prediction_data was not set to True, but what's the problem now?

1

There are 1 best solutions below

0
On

approximate_predict() is under hdbscan package itself, instead of a HDBSCAN object.

Here's what you need to do:

from joblib import load
import hdbscan

# load the model
model = load('finalized_model.joblib')
# make predictions
test_labels, strengths = hdbscan.approximate_predict(model, test_points)

API Reference: