Serving "Frankenstein" (combined) models at scale

54 Views Asked by At

I have a tensorflow model that's combined with a clustering algorithm in (HDBSCAN). Both have been trained/fitted separately but they work together (tf -> hdbscan). I'm looking to serve predictions on GCP at scale.

Currently, I've created a custom serving container that stitches the models together in python, but you can imagine that this isn't very performant, especially since the tf model is loaded in eager mode. Are there canonical solutions to this problem?

An idea I have is to run the canonical tf server detached inside the container and have a outside facing server that intercepts request, passes it to the local tf server, then run the clustering algorithm on the tf server response, but I'm not sure how well this will work or if there's better ways.

0

There are 0 best solutions below