What is redzone_checker? Profiling my tensorflow application on a GPU

822 Views Asked by At

I am profiling a tensorflow GPU application with NVIDIA's command line Visual Profiler nvprof, and one of the kernels that was launched and is very active in the profiling is something called redzone_checker? I cannot for the life of me find any useful information on what this means anywhere on the internet...

Any help or tips would be greatly appreciated.

redzone_checker kernel in nvprof

2

There are 2 best solutions below

0
On

You can try redzone/cuda?eed=0.575x

frame= 248 fps= 55 q=31.0 size= 256kB time=00:00:02.69 bitrate= 778.7kbits/s speed=0.592x
frame= 268 fps= 53 q=31.0 size= 256kB time=00:00:03.04 1
0
On

redzone_checker kernel is implemented in TensorFlow (v2.3.0) https://github.com/tensorflow/tensorflow/blob/master/tensorflow/stream_executor/gpu/redzone_allocator.cc line 138

According to the comment in the code, the redzone_checker kernel checks that every byte in input_buffer is equal to redzone_pattern.

Sorry for uncertain information, I guess the term (redzone) is brought from the redzone on memory protection. The redzones around stack or global object to detect overflows and underflows.

I profiled using nvprof (with --print-gpu-trace option) mnist examples(https://www.tensorflow.org/xla) with XLA JIT compilation and without it. redzone_checker invocations are shown only mnist with XLA JIT compilation but there is no redzone_checker invocation in the other profiled result.

My conclusion is that modification on kernel provided by TensorFlow (even optimized by XLA compiler) leads invocations of redzone_checker to protect memory.