Runtime ~100X higer when return a graph with tf.function and serving

58 Views Asked by At

I have a simple tf.function to calculate sum of entries from a mask; noticed runtime is around .25s on my laptop, each time the function is called for serving (after warmup). when I return 0 instead of my_var the runtime is 3 orders lower, ~0.003. is it safe to assume this is because by returning the tensor it tf calls graph to value conversion which in turn takes runtime?

more importantly, any thoughts on improving the runtime while enabling access to value when function is called for serving?

thank you.

@tf.function(input_signature=[tf.TensorSpec(shape=None, dtype=tf.float32)])
def my_serve(self, img_file):

    mask = self.model(img_file)[0]

    my_var= tf.cast(tf.reduce_sum(mask[..., 0]), tf.float16)

    return my_var
0

There are 0 best solutions below