Why I like working as ML evaluation after my QA experience

A lot of people ask how I feel in my new role as an ML evaluation engineer after 20+ years in QA.

Here is why I really like it:

1. Building complex architectures (and yes, the refactoring!)

I’ve traded standard test automation for complex Python environments. When your “auxiliary tasks” involve writing custom methods to manipulate remote Git repositories for dataset versioning or managing system processes for heavy model weights, the infrastructure becomes a massive engineering puzzle. I spend so much time on design patterns and architecture now. Clean, modular code and deep refactoring have always been what gets me excited.

2. A whole new universe of dimensions to play with

What really keeps me on my toes is that quality here isn’t just about code correctness. I’m suddenly factoring in completely new, fascinating dimensions. I’m dealing with token economics (optimizing costs), analyzing execution latency, running continuous A/B testing between models, and exploring the embeddings space. It’s a dynamic puzzle where the variables are constantly shifting, and navigating this complexity is incredibly rewarding.

3. Prompt Engineering

Designing the “perfect” prompt is essentially the ultimate adversarial test. It requires a unique mix of logic, a “breaker” mindset, and a deep understanding of linguistics to find exactly where the model’s reasoning starts to crumble. My linguistics background has finally found its ultimate playground here, and seeing it click makes me incredibly happy.

4. Math

I’m finally putting my math background to active use. Moving toward calculating Cosine Similarity, standard deviations, and statistical significance has added a layer of scientific rigor that feels deeply satisfying. To push myself further, I’m even taking a Coursera Math for ML Engineers course right now to refresh the foundations. Diving into matrix operations and statistics after work? Turns out, it adds pure pleasure to my everyday activities.

5. Financial stability

We all know the situation in the market today and how quickly skills lose their value, even (or especially) among experienced specialists. Having transitioned into the ML space, I can be sure of the relevance of my skills for at least the next several years.

Stepping out of a 20-year comfort zone was scary, but ML Evaluation proved to be a specialized, beautiful discipline at the intersection of system programming, data science, and linguistics.

If you ever feel like you’ve hit a ceiling or lost that initial spark – don’t be afraid to pivot. The learning curve is steep, but the view from here is absolutely worth it.

Will AI pass a code review?

How to Test AI Applications: The Gold Standard for LLM-as-a-judge

How to Test AI Applications: Determinism vs. Probability

Add a comment Cancel reply