Notes to self

Predownloading embedding models in Rails with Kamal

If you are building AI-powered applications in Ruby on Rails, you might have come across Informers or Transformers.rb gems for transformer inference. Here’s how to improve the production deployment of their models in Kamal setups.

Informers & Transformers.rb

Informers and Transformers.rb are excellent gems for creating document embeddings. These embeddings can then be used to search your documents, build chatbots, and more.

Here’s an example from the README:

sentences = ["This is an example sentence", "Each sentence is converted"]

model = Informers.pipeline("embedding", "sentence-transformers/all-MiniLM-L6-v2")
embeddings = model.(sentences)

As you can see, you can directly pass a model name you want to the gem. However, these models are quite big and don’t ship with these gems. Conveniently, they will be downloaded on first use.

This auto-download works amazingly well for development but quickly become a major issue in production. If you are using Kamal or other Docker-based deployment you would be constantly downloading these models with every new release.

Production setup

For production, we need to download these models just once to a single location, then point the gems to read the models from there.

The first step is creating a location for them on the server:

# run on the server after logging in with SSH
# or ideally make it part of server configuration
mkdir -p /models;
chmod 700 /models;
chown 1000:1000 /models

Setting up the ownership for the user 1000 is important since we’ll be downloading them from the Rails containers.

The next step is figure out where they get saved. After a little bit of search on GitHub I found out that both gems construct the cache path using the XDG_CACHE_HOME environment variable.

So we’ll need to set this variable to our location and make sure there are downloaded before we run the application.

Kamal bits

To expose our /models location we’ll add a new volume definition:

# config/deploy.yml

service: [SERVICE_NAME]
image: [DOCKER_USERNAME]/[SERVICE_NAME]

volumes:
  - "/storage:/rails/storage"
  - "/models:/rails/models"

Now if we read and save from /rails/models inside the application image, we actually read from and save to a permanent location now.

So now we can set XDG_CACHE_HOME env to point to /rails/models:

# config/deploy.yml

env:
  clear:
    XDG_CACHE_HOME: "/rails/models"
    ...

This way we make sure the downloaded models will get reused, but we should make sure they are available before we run our embedding tasks.

Pre-download

We have a few options on populating the new /models cache.

We can grab the local models and scp them to the expected location (note the full paths with the subdirectory). We can also ‘one off’ a Rails console task with kamal console and make sure they are downloaded or create a Rake task.

We can also make this a firm part of deployment by moving this task into the bin/docker-entrypoint like this:

...

if [ "${@: -1:1}" == "solid_queue:start" ]; then
  echo "Caching models into $XDG_CACHE_HOME"

  bundle exec rails runner '
  begin
    model = Informers.pipeline("embedding", "thenlper/gte-base")
    puts "Successfully downloaded thenlper/gte-base model"
  end
  '
fi

echo "Running ${@}"
exec "${@}"

Since I run Solid Queue, I used the solid_queue:start as argument. You might need to adjust this to fit your background job processing system.

Note that you need call all of the models you’ll actually use.

Check out my book
Learn how to use Kamal to deploy your web applications with Kamal Handbook. Visualize Kamal's concepts, understand Kamal's configuration, and deploy practical life examples.
by Josef Strzibny
RSS