I was training a deep learning model, but an exception related to cuDNN occurred. I wanted to terminate the process through kill -9 PID
, but found that the process status changed to Zombie. It may be that the thread [cuda-EvtHandlr] in the zombie process is still running, so the process cannot be recycled by init(PID:1)? How should I handle this besides rebooting?
I've tried sending a SIGCHLD signal to the init process but that doesn't work :(