If you are building AI-powered applications in Ruby on Rails, you might have come across Informers or Transformers.rb gems for transformer inference. Here’s how to improve the production deployment of their models in Kamal setups.
Informers & Transformers.rb
Informers and Transformers.rb are excellent gems for creating document embeddings. These embeddings can then be used to search your documents, build chatbots, and more.
Here’s an example from the README:
sentences = ["This is an example sentence", "Each sentence is converted"]
model = Informers.pipeline("embedding", "sentence-transformers/all-MiniLM-L6-v2")
embeddings = model.(sentences)
As you can see, you can directly pass a model name you want to the gem. However, these models are quite big and don’t ship with these gems. Conveniently, they will be downloaded on first use.
This auto-download works amazingly well for development but quickly become a major issue in production. If you are using Kamal or other Docker-based deployment you would be constantly downloading these models with every new release.
Production setup
For production, we need to download these models just once to a single location, then point the gems to read the models from there.
The first step is creating a location for them on the server:
# run on the server after logging in with SSH
# or ideally make it part of server configuration
mkdir -p /models;
chmod 700 /models;
chown 1000:1000 /models
Setting up the ownership for the user 1000
is important since we’ll be downloading them from the Rails containers.
The next step is figure out where they get saved. After a little bit of search on GitHub I found out that both gems construct the cache path using the XDG_CACHE_HOME
environment variable.
So we’ll need to set this variable to our location and make sure there are downloaded before we run the application.
Kamal bits
To expose our /models
location we’ll add a new volume definition:
# config/deploy.yml
service: [SERVICE_NAME]
image: [DOCKER_USERNAME]/[SERVICE_NAME]
volumes:
- "/storage:/rails/storage"
- "/models:/rails/models"
Now if we read and save from /rails/models
inside the application image, we actually read from and save to a permanent location now.
So now we can set XDG_CACHE_HOME
env to point to /rails/models
:
# config/deploy.yml
env:
clear:
XDG_CACHE_HOME: "/rails/models"
...
This way we make sure the downloaded models will get reused, but we should make sure they are available before we run our embedding tasks.
Pre-download
We have a few options on populating the new /models
cache.
We can grab the local models and scp
them to the expected location (note the full paths with the subdirectory). We can also ‘one off’ a Rails console task with kamal console
and make sure they are downloaded or create a Rake task.
We can also make this a firm part of deployment by moving this task into the bin/docker-entrypoint
like this:
...
if [ "${@: -1:1}" == "solid_queue:start" ]; then
echo "Caching models into $XDG_CACHE_HOME"
bundle exec rails runner '
begin
model = Informers.pipeline("embedding", "thenlper/gte-base")
puts "Successfully downloaded thenlper/gte-base model"
end
'
fi
echo "Running ${@}"
exec "${@}"
Since I run Solid Queue, I used the solid_queue:start
as argument. You might need to adjust this to fit your background job processing system.
Note that you need call all of the models you’ll actually use.