Node.js + Beanstalkd (Nodestalker) ECONNRESET

966 Views Asked by At

I've been working on a large, multi-server Node.js deployment. The tech stack:

Server 1 (Ubuntu 12.04):

  • Node.js API Server (Express app, used for input)
  • Node.js Push Server (100 workers, used to send out results)
  • Redis
  • Beanstalkd

Server 2-4 (Ubuntu 12.04):

  • Node.js Engine Server (150 workers per server, used for computation)

All Node.js apps are using Nodestalker as their Beanstalkd client.

Upon starting up all the servers, one or more of the Node.js apps will crash repeatedly with this error (LongJohn output):

Error: read ECONNRESET
    at errnoException (net.js:901:11)
    at onread (net.js:556:19)
---------------------------------------------
    at Readable.on (_stream_readable.js:681:33)
    at BeanstalkClient.command (/opt/app_deployment/engine/node_modules/nodestalkerib/beanstalk_client.js:248:13)
    at BeanstalkClient.watch    (/opt/app_deployment/engine/node_modules/nodestalker/l/beanstalk_client.js:285:14)
    at consumer (/opt/app_deployment/engine/scrape.js:52:12)
    at listOnTimeout (timers.js:110:15)
---------------------------------------------
    at Array.<anonymous> (/opt/app_deployment/engine/compute.js:215:9)
    at fire (/opt/app_deployment/engine/node_modules/jquery/lib/node-jquery.js:999:)
    at self.fireWith (/opt/app_deployment/engine/node_modules/jquery/lib/node-jquerjs:1109:7)
    at Object.<anonymous> (/opt/app_deployment/engine/node_modules/jquery/lib/node-uery.js:1236:16)
    at fire (/opt/app_deployment/common/node_modules/jquery/lib/node-jquery.js:999:)
    at self.fireWith (/opt/app_deployment/common/node_modules/jquery/lib/node-jquerjs:1109:7)
    at self.fire (/opt/app_deployment/common/node_modules/jquery/lib/node-jquery.js116:10)
    at /opt/app_deployment/common/results.js:18:19

The servers that successfully open all connections work flawlessly until manually restarted.

Each Engine server has 2 open Beanstalk clients per worker, and each push worker has a Beanstalkd client as well. This would result in ~1000 open connections to Beanstalk at any given time.

After research, it seemed as though I had hit the open-file descriptor limit (default 1024). However, no matter what I upped the limit to, the error still happened almost immediately after I restart the processes. A quick lsof showed no connection leaks.

As root, I have run ulimit -n 4096 on each user that runs the processes, which is accurately reflected in a ulimit -n immediately after.

I have also edited the soft and hard nofile limits in limits.conf for all relevant users. It may or may not be coincidence, but these values do not apply to the users upon server reboot.

My limits.conf:

beanstalkd soft nofile 4096
beanstalkd hard nofile 4096

After server reboot su beanstalkd and ulimit -n still shows 1024. I have session required pam_limits.so uncommented in /etc/pam.d/common-session and all other pam.d files.

In short, all signs point to hitting a file-descriptor wall, but no matter what the limit is upped to the errors still occur. Thanks in advance!

0

There are 0 best solutions below