Zookeeper docker container stuck

1.3k Views Asked by At

We've been using zookeeper as part of our Kafka deployment and for other usages as well (via docker-compose).

Occassionally the docker image would stop to function, to the point where docker stop zookeeper would not return (zk would keep running). Also docker kill -s SIGTERM would not kill it.

when that happens, attempts to run zkCli from within the container also halt (the zkCli.sh command doesn't return).

Only killing the docker service (on Mac the docker app) would allow to recover it - but only after deletion of the container while zk is still down.

Any idea how to troubleshoot this? What could cause such scenario?

1

There are 1 best solutions below

0
Nadav On

The container logs have this error repeatedly when its in that state:

log4j:ERROR Failed to flush writer,
java.io.IOException: Invalid argument
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:326)
    at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
    at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
    at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
    at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
    at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
    at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59)
    at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324)
    at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276)
    at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
    at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
    at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
    at org.apache.log4j.Category.callAppenders(Category.java:206)
    at org.apache.log4j.Category.forcedLog(Category.java:391)
    at org.apache.log4j.Category.log(Category.java:856)
    at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:210)
    at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:89)
    at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169)

google told me it is likely caused by bad ulimit config.
added to the compose of zookeeper this section:

ulimits: 
  nofile:
    soft: 20000
    hard: 40000

so far so good