machine learning

How to ML - Deploying

Roland Szabo

20 Jan 2021 — 2 min read

So the ML engineer presented the model to the business stakeholders and they agreed that it performed well enough on the key metrics in testing that it's time to deploy it to production.

So now we have to make sure the models run reliably in production. We have to answer some more questions, in order to make some trade offs.

How important is latency? Is the model making an inference in response to a user action, so it's crucial to have the answer in tens of milliseconds? Then it's time to optimize the model: quantize weights, distill knowledge to a smaller model, weight pruning and so on. Hopefully, your metrics won't go down due to the optimization.

Can the results be precomputed? For example, if you want to make movie recommendations, maybe there can be a batch job that runs every night that does the inference for every user and stores them in a database. Then when the user makes a request, they are simply quickly loaded from the database. This is possible only if you have finite range of predictions to make.

Where are you running the model? On big beefy servers with a GPU? On mobile devices, which are much less powerful? Or on some edge devices that don't even have an OS? Depending on the answer, you might have to convert the model to a different format or optimize it to be able to fit in memory.

Even in the easy case where you are running the model on servers and latency can be several seconds, you still have to do the whole dance of making it work there. "Works on my machine" is all to often a problem. Maybe production runs a different version of Linux, which has a different BLAS library and the security team won't let you update things. Simple, just use Docker, right? Right, better hope you are good friends with the DevOps team to help you out with setting up the CI/CD pipelines.

But you've killed all the dragons, now it's time to keep watch... aka monitoring the models performance in production.

TIA: Cod liver with boiled egg

This evening I came home late after a long day, starving. So I asked my friend ChatGPT what I should eat. It gave me a 5 suggestions, taking into account my food preferences and health issues. I didn't fancy anything from those 5 things, but it did remind

TIL: Caddy

I used to use Nginx to proxy requests and set up SSL. It's not super hard, but the config is a screenfull. Enter Caddy. Install it. Put the following in /etc/caddy/Caddyfile: DOMAIN.NAME { reverse_proxy localhost:8000 } Restart the systemd service. And voila, you have an

TIL: Git push to deploy

Half a year ago, I wrote about how to push Git repos between random repos. But at the time, I did it like a farmer, having to change the branch on the remote machine before pushing. Today, after fighting with piku, which is really nice, except it doesn't

Blog updates

Every couple of years, I get bored of my existing blogging setup and make some changes to it. Now, I lasted 4 years on my previous server, but I got annoyed with my static site setup. I've been thinking about this change for a long time. In the

Read more

TIA: Cod liver with boiled egg

TIL: Caddy

TIL: Git push to deploy

Blog updates