machine learning

How to ML - Metrics

Roland Szabo

28 Dec 2020 — 1 min read

We saw that machine learning algorithms process large amounts of data to find patterns. But how exactly do they do that?

The first step in a machine learning project is establishing metrics. What exactly do we want to do and how do we know we're doing it well?

Are we trying to predict a number? How much will Bitcoin cost next year? That's a regression problem. Are we trying to predict who will win the election? That's a binary classification problem (at least in the USA). Are we trying to recognize objects in an image? That's a multi class classification problem.

Another question that has to be answered is what kind of mistakes are worse. Machine learning is not all knowing, so it will make mistakes, but there are trade-offs to be made. Maybe we are building a system to find tumors in X-rays: in that case it might be better that we call wolf too often and have false positives, rather than missing out on a tumor. Or maybe it's the opposite: we are trying to implement a facial recognition system. If the system recognizes a burglar incorrectly, then the wrong person will get sent to jail, which is a very bad consequence for a mistake made by "THE algorithm".

These are not just theoretical concerns, but they actually matter a lot in building machine learning systems. Because of this, many ML projects are human-in-the-loop, meaning the model doesn't decide by itself what to do, it merely makes a suggestion which a human will then confirm. In many cases, that is valuable enough, because it makes the human much more efficient. For example, the security guard doesn't have to look at 20 screens at once, but can only look at the footage that was flagged as anomalous.

Tomorrow we'll look at the next step: gathering the data.

To AI and back - Platonic Bliss

In part 1 I described how AI has affected my mental health over the last year. But my concerns about AI go further than just "it's bad for your mental health". Sometimes these AI assistants sometimes exhibit weird behaviors. Gemini "feeling" miserable and self-loathing

To AI and back - part 1

I wrote my first AI program in high school (around 2009-2010). I found a tutorial for writing a genetic algorithm to find a list of number that sum to a value (I think). It was written in C++, I knew only PHP, I didn't know any OOP, so

TIL: pytz can return ancient timezone

Also TIL: Romania was on timezone UTC +1:44:24 until 1930. So if you ever get weird offsets, not full hour (and not even 30 minute) offsets in code, it's probably because of pytz for some reason returns the first recorded offset for a certain timezone. Fix:

A wild rolisz reappears

I haven't written a blog post in a long time. It's been a tough year. Mostly from a health perspective. One of the highlights was when I cracked four ribs from coughing (yes, the doctors were just as shocked as you are). Before that, I used

Read more

To AI and back - Platonic Bliss

To AI and back - part 1

TIL: pytz can return ancient timezone

A wild rolisz reappears