Here’s what you should know about Kamal healthchecks, namely the Docker healthcheck and the new Kamal 1.6 web barrier.
Docker healthcheck
Every running Docker container can come with a healthcheck. A typical web
role container running with Kamal might have a following healthcheck:
$ docker inspect [CONTAINER_ID] --format ""
{"Test":["CMD-SHELL","(curl -f http://localhost:3000/up || exit 1) && (stat /tmp/kamal-cord/cord > /dev/null || exit 1)"],"Interval":5000000000}
This heathcheck marks containers either healthy or unhealthy.
There are two things going on:
- First, there is a test for an
/up
endpoind on port3000
. That’s application-specific check. - Second, there is a Kamal’s
cord
check.
Application check
Standard Rails 7.1 application will run on port 3000
with a healthcheck path mounted at/up
, so that’s also Kamal’s default. You can change this if you need to under healthcheck
settings:
# config/deploy.yml
healthcheck:
path: /up
port: 3000
Cord check
Kamal creates a special cord file on the host and bind mounts it into the container at var/run/kamal-cord
. By extending the application check with a cord check Kamal can now make container unhealthy at any give time by cutting the cord (deleting the bind mounted file).
This is done during deploys to let already dispatched requests to the old container to finish before Traefik notices the change.
We can change the cord file location or disable this check entirely:
# config/deploy.yml
healthcheck:
cord: /var/run/kamal-cord
# cord: false
By disabling cords we lose on zero-downtime deployment.
Interval
If you paid close attention at the start you also noticed that the docker inspect
command mentioned an interval. This interval specifies how often Docker runs the Test
command for the healthcheck and defaults to 30 seconds.
The interval of this check is set with interval
. Here’s a 20 seconds check:
# config/deploy.yml
healthcheck:
interval: 20s
Note that the number shows up in nanoseconds on the Docker side, that’s why we got such a high number for just 5 seconds.
Maximum attempts
There is one more settings related to the container healthcheck and that’s the maximum number of checks Kamal will do before giving up on deploys.
When Kamal deploys a new revision it waits for the container to be healthy. It will ask for the container status 7 times as default, but we can change it with max_attempts
:
# config/deploy.yml
healthcheck:
max_attempts: 7
This is not linear. A first try is after 1s, the second after another 2s, the third after another 3s, and so on.
Per-role check
If your other roles require a specific healthcheck, you can nest the above settings under a specific role:
# config/deploy.yml
servers:
job:
cmd: bin/jobs
...
healthcheck:
cmd: bin/check
interval: 60s
Web barrier
Originally a new health check container named healthcheck-*
was booted on port 3999
to ensure the container can serve traffic.
Kamal 1.6 cancelled this healthcheck and replaced it with a so-called web barrier.
Now non-web roles (that might lack a healthcheck of their own) always wait for at least one web
container to pass the Docker healthcheck before shutting down their old containers.