I am profiling a tensorflow GPU application with NVIDIA's command line Visual Profiler nvprof, and one of the kernels that was launched and is very active in the profiling is something called redzone_checker? I cannot for the life of me find any useful information on what this means anywhere on the internet...
Any help or tips would be greatly appreciated.
redzone_checker kernel is implemented in TensorFlow (v2.3.0) https://github.com/tensorflow/tensorflow/blob/master/tensorflow/stream_executor/gpu/redzone_allocator.cc line 138
According to the comment in the code, the redzone_checker kernel checks that every byte in input_buffer is equal to redzone_pattern.
Sorry for uncertain information, I guess the term (redzone) is brought from the redzone on memory protection. The redzones around stack or global object to detect overflows and underflows.
I profiled using nvprof (with --print-gpu-trace option) mnist examples(https://www.tensorflow.org/xla) with XLA JIT compilation and without it. redzone_checker invocations are shown only mnist with XLA JIT compilation but there is no redzone_checker invocation in the other profiled result.
My conclusion is that modification on kernel provided by TensorFlow (even optimized by XLA compiler) leads invocations of redzone_checker to protect memory.