A small report on how upgrading OTP to version 23 brought out unknown certificate authority errors when making requests from HTTP libraries based on Hackney.
One morning, we started to receive errors similar to the one below:
An error occurred while making the network request. The HTTP client returned the following reason: {:tls_alert, {:unknown_ca, 'TLS client: In state certify at ssl_handshake.erl:1895 generated CLIENT ALERT: Fatal - Unknown CA\\n'}}
We use HTTPoison, which in turn uses Erlang’s Hackney HTTP library. Suddenly all outgoing requests could no longer validate the authority of the SSL connection.
The error wasn’t extremely detailed:
CLIENT ALERT: Fatal - Unknown CA
Because we needed the service to be operational immediately, we deployed a hotfix that turns off the SSL checks. Since we use HTTPoison, it looks like this:
HTTPoison.post(
@api_url,
params,
headers,
pool: :custom_pool,
hackney: [:insecure]
)
If you use Hackney pools, it’s fair to mention that these options are reused for connections in the pool.
After we saved the production from a collapse, I needed to find out what happened and how it could happen to us at all.
My investigation revealed that the issue is happening in the OTP 23, which we now run within production. I knew it’s something with the new OTP and Hackney but didn’t know what precisely at first. But after a little bit of search, I found it. It turned out to be OTP 23 change regarding the TLS hostname validation when providing custom verify_fun
function (which Hackney indeed does).
Then it wasn’t difficult to find that it’s also already fixed in the latest Hackey 1.17.0. So, upgrade people!
Finally, it’s worth mentioning how can an OTP upgrade happen out of the blue. We use a multi-stage Docker build, and we make the release first from the alpine-elixir:latest
image. I wasn’t creating the Dockerfile, so I had no idea we are upgrading OTP like that.
I am not saying it’s a bad practice per se, but I don’t advise it without some additional safeguards.