I have a runit
service I use to run a rails
app using unicorn
.
Its restart command uses a signal (USR2) to handle a zero-downtime restart. Basically, it waits until the new process is ready before the old ones die.
This causes a very long (40 seconds) restart time, in which service myservice restart
doesn't return until the end.
While I can give runit a longer timeout (which I already do), I want to make this restart a fire-and-forget kind of action so it'll return instantly (or after the USR2 signal was fired, but without waiting for it to complete.
The entire logic is taken from multiple blog posts about zero-downtime rails
deployments with unicorn
restarts:
- https://gist.github.com/czarneckid/4639793
- https://gist.github.com/JeanMertz/8996796
- https://nulogy.com/who-we-are/company-blog/articles/zero-downtime-deployments-with-chef-nginx-and-unicorn/
This is the runit
script (generated by chef):
#!/bin/bash
#
# This file is managed by Chef, using the <%= node.name %> cookbook.
# Editing this file by hand is highly discouraged!
#
exec 2>&1
#
# Since unicorn creates a new pid on restart/reload, it needs a little extra
# love to manage with runit. Instead of managing unicorn directly, we simply
# trap signal calls to the service and redirect them to unicorn directly.
#
RUNIT_PID=$$
APPLICATION_NAME=<%= @options[:application_name] %>
APPLICATION_PATH=<%= File.join(@options[:path], 'current') %>
BUNDLE_CMD="<%= @options[:bundle_command] ? "#{@options[:bundle_command]} exec" : '' %>"
UNICORN_CMD=<%= @options[:unicorn_command] ? @options[:unicorn_command] : 'unicorn' %>
UNICORN_CONF=<%= @options[:unicorn_config_path] ? @options[:unicorn_config_path] : File.join(@options[:path], 'current', 'config', 'unicorn.rb') %>
RAILS_ENV=<%= @options[:rails_env] %>
CUR_PID_FILE=<%= @options['pid'] ? @options['pid'] : File.join(@options[:path], 'current', 'shared', 'pids', "#{@options[:application_name]}.pid") %>
ENV_PATH=<%= @options[:env_dir] %>
OLD_PID_FILE=$CUR_PID_FILE.oldbin
echo "Runit service restarted (PID: $RUNIT_PID)"
function is_unicorn_alive {
set +e
if [ -n $1 ] && kill -0 $1 >/dev/null 2>&1; then
echo "yes"
fi
set -e
}
if [ -e $OLD_PID_FILE ]; then
OLD_PID=$(cat $OLD_PID_FILE)
echo "Old master detected (PID: $OLD_PID), waiting for it to quit"
while [ -n "$(is_unicorn_alive $OLD_PID)" ]; do
sleep 5
done
fi
if [ -e $CUR_PID_FILE ]; then
CUR_PID=$(cat $CUR_PID_FILE)
if [ -n "$(is_unicorn_alive $CUR_PID)" ]; then
echo "Detected running Unicorn instance (PID: $CUR_PID)"
RUNNING=true
fi
fi
function start {
unset ACTION
if [ $RUNNING ]; then
restart
else
echo 'Starting new unicorn instance'
cd $APPLICATION_PATH
exec chpst -e $ENV_PATH $BUNDLE_CMD $UNICORN_CMD -c $UNICORN_CONF -E $RAILS_ENV
sleep 3
CUR_PID=$(cat $CUR_PID_FILE)
fi
}
function stop {
unset ACTION
echo 'Initializing graceful shutdown'
kill -QUIT $CUR_PID
while [ -n "$(is_unicorn_alive $CUR_PID)" ]; do
echo '.'
sleep 2
done
echo 'Unicorn stopped, exiting Runit process'
kill -9 $RUNIT_PID
}
function restart {
unset ACTION
echo "Restart request captured, swapping old master (PID: $CUR_PID) for new master with USR2"
kill -USR2 $CUR_PID
sleep 2
echo 'Restarting Runit service to capture new master PID'
exit
}
function alarm {
unset ACTION
echo 'Unicorn process interrupted'
}
trap 'ACTION=stop' STOP TERM KILL
trap 'ACTION=restart' QUIT USR2 INT
trap 'ACTION=alarm' ALRM
[ $RUNNING ] || ACTION=start
if [ $ACTION ]; then
echo "Performing \"$ACTION\" action and going into sleep mode until new signal captured"
elif [ $RUNNING ]; then
echo "Going into sleep mode until new signal captured"
fi
if [ $ACTION ] || [ $RUNNING ]; then
while true; do
[ "$ACTION" == 'start' ] && start
[ "$ACTION" == 'stop' ] && stop
[ "$ACTION" == 'restart' ] && restart
[ "$ACTION" == 'alarm' ] && alarm
sleep 2
done
fi
This is a super weird way to use Runit, move your reload logic to the
control/h
script and usesv hup
(or since it doesn't seem to be anything more than sending USR2sv 2
). The main run script shouldn't be involved.