Notes to self

Puma graceful restarts

How to configure Puma 5 application server for graceful restarts, and what is the difference between regular, hot, and phased restarts?

Application restarts are necessary when things go wrong or whenever we need to push a new application version. But a regular restart isn’t usually anything more than stopping and starting the server again. To keep clients connected or even keep serving requests, we need a better strategy.

Graceful restarts

Because Puma 5 is a multi-process application server, it allows for a few strategies we can choose from when we run Puma in a multi-process mode (which we almost always should):

  • regular restarts: connections are lost, and new ones are established after booting the new application process
  • hot restarts: connections are not lost but remain to wait for the new application server workers to boot
  • phased restarts: current connections finish with old workers and new workers handle new ones

With hot restarts, Puma will try to finish current requests and restart itself with new workers, whereas with phased restarts, Puma will keep processing requests with the old workers until it’s possible to send them to the newly updated workers. Hot restarts will incur some extra latency for the new requests while the process restarts. Phased restarts will take longer to finish. We call both graceful restarts, because they try to finish the requests gracefully.

While the hot restart works in a single mode (one process), to understand phased restarts, we also have to know how Puma forks worker processes in the cluster mode.

Forking is a Linux mechanism for creating new processes out of existing processes. When we start Puma (and, therefore, its main process), Puma is able to create separate workers that would handle web requests from clients. It does it by calling fork() system call and keeping track of these new workers. Whenever Puma receives a new request, it passes it to one of its workers for processing.

This is important to realize when we start talking about preloading the application with the preload_app! option. Preloading the application can keep the memory usage down because all new workers can share the virtual memory in the beginning – thanks to Linux copy-on-write, which prevents copying parent process memory. This is not what we want when we are upgrading the application underneath, though.


To initiate hot restart for single or cluster mode, send SIGUSR2 signal to the Puma’s master process. Alternatively, run pumactl restart or request /restart if you have a Puma control server running.

$ kill -SIGUSR2 25197

You should take advantage of on_restart hook to clean up everything before the restart takes place.


To initiate phased restart in a cluster mode, send SIGUSR1 signal to the Puma’s master process. Alternatively, run pumactl phased-restart or request /phased-restart if you have Puma control server running.

$ kill -SIGUSR1 25197

Phased restarts work with the preload_app! option but won’t upgrade the application. To be able to upgrade the application, you cannot take advantage of preload_app! and have to run with the prune_bundler option. Either way, phased restarts won’t upgrade Puma (and its dependencies).

The on_restart hook won’t run, but you can take advantage of directory settings to point to a new application directory (and keep the old path around).

Puma 5 introduces an experimental cluster-mode where it allows to keep copy-on-write functionality for phased restart and application upgrades. The option fork_worker (--fork-worker on the command line) will fork additional workers from worker 0 instead of the Puma master process. The preload_app! option cannot be used (but it’s not necessary anyway).


We know what different restarts are, their implications, and even how to initiate them. But what if we want to test that things really work as they should? To know Puma behaves as we expect, we should be hitting the application with requests and request a restart in the middle. Keep Puma busy:

$ while true; do wget localhost:3000; done

And request a restart:

$ kill -SIGUSR2 25197

While you could spot that wget will suddenly stop for a second and wait for a restart, it’s hard to be sure requests were not lost. To that end, it’s better to use a load benchmarking tool like siege.

Run it without a restart:

$ siege -t5s -i
Lifting the server siege...
Transactions:           1792 hits
Availability:         100.00 %
Elapsed time:           4.53 secs
Data transferred:         0.07 MB
Response time:            0.05 secs
Transaction rate:       395.58 trans/sec
Throughput:           0.02 MB/sec
Concurrency:           18.83
Successful transactions:        1792
Failed transactions:             0
Longest transaction:          0.10
Shortest transaction:         0.03

And with a hot restart in the middle:

$ siege -t5s -i
Lifting the server siege...
Transactions:            418 hits
Availability:         100.00 %
Elapsed time:           4.90 secs
Data transferred:         0.02 MB
Response time:            0.05 secs
Transaction rate:        85.31 trans/sec
Throughput:           0.00 MB/sec
Concurrency:            4.13
Successful transactions:         418
Failed transactions:             0
Longest transaction:          0.28
Shortest transaction:         0.02

With siege, we can see no requests were lost. They all finished, although we were slowed down and served only 418 requests.


There are various strategies to choose from when it comes to Puma restarts – each with its implications. It’s also worth noting that we can solve application upgrades on a higher level. We can keep connections open while Puma restarts with systemd socket activation, serve old and new applications side by side with Docker, or solve the entire upgrade on a load balancer level. Most applications will likely do with standard hot restarts.

by Josef Strzibny