I have built a model and I am successfully able to prune it using tf.contrib's model pruning module with default params and sparsity as 90%, but the problem is when I run the model it still takes the same amount of execution time as of the original model, my guess is that instead of running only the pruned version, tensorflow is running the entire graph with masked weghts and thats why there is no improvement even after pruning.
So how to export the pruned model with subgraph and respective weights and use it?
The strip_pruning_vars utility might be what you're looking for.
From the read.me file: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/model_pruning#adding-pruning-ops
Would you mind sharing your code?