How can I stop seeing JVM Full Thread dumps in my AWS EMR Spark job stdout logs?

1.1k Views Asked by At

I have PySpark jobs running in AWS EMR. Recently, I upgraded (AWS EMR 6.4, Spark 3.1.2) and switched to running the job in a docker container. Ever since, there are sporadic thread dumps in the stdout logs that start with Full thread dump OpenJDK 64-Bit Server VM (25.312-b07 mixed mode).

I've been unable to figure out why they occur. There are no associated errors or warnings in stderr and the job is unaffected. However, these thread dumps make it difficult to read the stdout logs, and I have been unable to figure out why they are happening. Things I have tried include using previous versions of AWS/EMR and even simpler EMR configurations, as I suspected that AWS EMR is sending SIGQUIT somewhere, since I did not find anything in the Spark source that would do it (except for thread dumps initiated by the Spark UI, and the Spark task reader, which is disabled).

At a loss for what to do, I would resign to instructing the JVM to redirect these thread dumps or even ignore the signal for them, if that's an option. I am open to alternative suggestions.

I am aware of -Xrs but I suspect it's not what I want, since it would likely kill the process on the first SIGQUIT.

1

There are 1 best solutions below

0
On

I have a solution for viewing the logs on the the instance itself or other Unix environment.

By piping the output through a mawk filter we can remove the stacktrace when reading or tailing the logs.

On AWS Linux this requires you to install the mawk package from the Epel repository.

sudo amazon-linux-extras install epel -y
sudo yum install mawk -y

Create a function that creates a tmp filename, tails and filters the input file and write it to the tmp file. Then open the tmp file with less and remove the tmp file when the user closes less. The filter removes all lines between ^Full thread dump and ^\[[0-9 Which works for me because my logs start with [2023-09-7 ...

less_log() { 
  tmp_file=$(mktemp)
  tail -f -n +1 $1 | mawk -W interactive '/^Full thread dump/{f=1} /^\[[0-9]/{f=0} !f' > "$tmp_file" &
  less "$tmp_file"; kill %; rm "$tmp_file"
}

You can now view the logs like this

less_log /var/log/hadoop-yarn/containers/application_1693901863825_0025/container_1693901863825_0025_01_000001/stdout

Sources: