How to configure Puma 5 application server for graceful restarts, and what is the difference between regular, hot, and phased restarts?
Application restarts are necessary when things go wrong or whenever we need to push a new application version. But a regular restart isn’t usually anything more than stopping and starting the server again. To keep clients connected or even keep serving requests, we need a better strategy.
Because Puma 5 is a multi-process application server, it allows for a few strategies we can choose from when we run Puma in a multi-process mode (which we almost always should):
- regular restarts: connections are lost, and new ones are established after booting the new application process
- hot restarts: connections are not lost but remain to wait for the new application server workers to boot
- phased restarts: current connections finish with old workers and new workers handle new ones
With hot restarts, Puma will try to finish current requests and restart itself with new workers, whereas with phased restarts, Puma will keep processing requests with the old workers until it’s possible to send them to the newly updated workers. Hot restarts will incur some extra latency for the new requests while the process restarts. Phased restarts will take longer to finish. We call both graceful restarts, because they try to finish the requests gracefully.
While the hot restart works in a single mode (one process), to understand phased restarts, we also have to know how Puma forks worker processes in the cluster mode.
Forking is a Linux mechanism for creating new processes out of existing processes. When we start Puma (and, therefore, its main process), Puma is able to create separate workers that would handle web requests from clients. It does it by calling
fork() system call and keeping track of these new workers. Whenever Puma receives a new request, it passes it to one of its workers for processing.
This is important to realize when we start talking about preloading the application with the
preload_app! option. Preloading the application can keep the memory usage down because all new workers can share the virtual memory in the beginning – thanks to Linux copy-on-write, which prevents copying parent process memory. This is not what we want when we are upgrading the application underneath, though.
To initiate hot restart for single or cluster mode, send
SIGUSR2 signal to the Puma’s master process. Alternatively, run
pumactl restart or request
/restart if you have a Puma control server running.
$ kill -SIGUSR2 25197
You should take advantage of
on_restart hook to clean up everything before the restart takes place.
To initiate phased restart in a cluster mode, send
SIGUSR1 signal to the Puma’s master process. Alternatively, run
pumactl phased-restart or request
/phased-restart if you have Puma control server running.
$ kill -SIGUSR1 25197
Phased restarts work with the
preload_app! option but won’t upgrade the application. To be able to upgrade the application, you cannot take advantage of
preload_app! and have to run with the
prune_bundler option. Either way, phased restarts won’t upgrade Puma (and its dependencies).
on_restart hook won’t run, but you can take advantage of
directory settings to point to a new application directory (and keep the old path around).
Puma 5 introduces an experimental cluster-mode where it allows to keep copy-on-write functionality for phased restart and application upgrades. The option fork_worker (
--fork-worker on the command line) will fork additional workers from worker 0 instead of the Puma master process. The
preload_app! option cannot be used (but it’s not necessary anyway).
We know what different restarts are, their implications, and even how to initiate them. But what if we want to test that things really work as they should? To know Puma behaves as we expect, we should be hitting the application with requests and request a restart in the middle. Keep Puma busy:
$ while true; do wget localhost:3000; done
And request a restart:
$ kill -SIGUSR2 25197
While you could spot that
wget will suddenly stop for a second and wait for a restart, it’s hard to be sure requests were not lost. To that end, it’s better to use a load benchmarking tool like
Run it without a restart:
$ siege -t5s -i http://0.0.0.0:3000 Lifting the server siege... Transactions: 1792 hits Availability: 100.00 % Elapsed time: 4.53 secs Data transferred: 0.07 MB Response time: 0.05 secs Transaction rate: 395.58 trans/sec Throughput: 0.02 MB/sec Concurrency: 18.83 Successful transactions: 1792 Failed transactions: 0 Longest transaction: 0.10 Shortest transaction: 0.03
And with a hot restart in the middle:
$ siege -t5s -i http://0.0.0.0:3000 Lifting the server siege... Transactions: 418 hits Availability: 100.00 % Elapsed time: 4.90 secs Data transferred: 0.02 MB Response time: 0.05 secs Transaction rate: 85.31 trans/sec Throughput: 0.00 MB/sec Concurrency: 4.13 Successful transactions: 418 Failed transactions: 0 Longest transaction: 0.28 Shortest transaction: 0.02
siege, we can see no requests were lost. They all finished, although we were slowed down and served only 418 requests.
There are various strategies to choose from when it comes to Puma restarts – each with its implications. It’s also worth noting that we can solve application upgrades on a higher level. We can keep connections open while Puma restarts with systemd socket activation, serve old and new applications side by side with Docker, or solve the entire upgrade on a load balancer level. Most applications will likely do with standard hot restarts.
← IT'S OUT NOW
I wrote a complete guide on web application deployment. Ruby with Puma, Python with Gunicorn, NGINX, PostgreSQL, Redis, networking, processes, systemd, backups, and all your usual suspects.