I have a simple tf.function to calculate sum of entries from a mask; noticed runtime is around .25s on my laptop, each time the function is called for serving (after warmup).
when I return 0
instead of my_var
the runtime is 3 orders lower, ~0.003. is it safe to assume this is because by returning the tensor it tf calls graph to value conversion which in turn takes runtime?
more importantly, any thoughts on improving the runtime while enabling access to value when function is called for serving?
thank you.
@tf.function(input_signature=[tf.TensorSpec(shape=None, dtype=tf.float32)])
def my_serve(self, img_file):
mask = self.model(img_file)[0]
my_var= tf.cast(tf.reduce_sum(mask[..., 0]), tf.float16)
return my_var