For Beginners

Testing AI Applications June 1, 2026

Why I Enjoy Working as an ML Evaluation Engineer After 20 Years of Experience in QA

A lot of people ask how I feel in my new role as an ML evaluation engineer after 20 years in QA. Here is why I really like it:

Read

Testing AI Applications May 9, 2026

Why It’s Too Late to Learn Automation (and What to Do Next)

The idea for this post came to me during a regular meeting with my fellow mentors, SDETs from several international companies. We were discussing the future of the QA Automation market and reached some rather interesting conclusions.

Read

Testing AI Applications April 6, 2026

How to Test AI Applications: The Grader’s Ruler for LLM-as-a-judge

When your LLM-as-a-Judge pipeline uses prompts like "rate this response as good, okay, or bad," you're essentially delegating your quality bar to whatever distribution dominated the judge's training data. A model trained on polite-but-unhelpful customer service text will happily score polite-but-unhelpful bot responses as "good." Consider a concrete failure mode:

Read

Testing AI Applications March 28, 2026

How to Test AI Applications: The Gold Standard for LLM-as-a-judge

Using LLM-as-a-judge without a gold standard is like asking a reviewer to grade an exam without the answer key - they'll fall back on their own memory, and in niche or professional domains, that memory hallucinates more than you'd expect. Consider a refund scenario:

Read

Testing AI Applications March 21, 2026

How to Test AI Applications: Determinism vs. Probability

Traditional QA, even when armed with AI tools, operates on a deterministic contract: if A + B is expected to be C, and the system returns anything other than C, it's a defect.

Read

Testing AI Applications March 14, 2026

A take-home assignment for an AI QA role

I was asked earlier to talk about this topic. So, here’s a look at the ML Evaluation Engineer skills assessment assignment:

Read

Testing AI Applications March 7, 2026

LLM-as-a-Judge in QA terminology

Traditional AQA assertions fail catastrophically when applied to LLM output. Two architectural reasons make this inevitable:

Read

Testing AI Applications February 28, 2026

Working day of AI QA engineer

09:30 – 10:30 The Architectural Shift Started the day with a sync on our AI agentic workflow. The development team is introducing a new Agent.

Read

Testing AI Applications February 15, 2026

What’s the difference: QA Engineer with AI tools, AI QA Engineer and ML Evaluation Engineer

There is a growing terminology crisis in the QA market, and it is bleeding into hiring decisions and team structures. Three distinct roles are being lumped under the same "AI Testing" umbrella, despite having almost nothing in common at the technical level.

Read

Testing AI Applications January 12, 2026

How to Become an AI Application Tester

The shift from automating deterministic systems to evaluating probabilistic models is not a lateral career move—it's a fundamental change in how you define "correctness." A test that asserts `expected == actual` becomes meaningless when the system under test produces non-deterministic outputs, and traditional pass/fail assertions give way to statistical thresholds, distribution analysis, and metric-based evaluation. Here's a condensed breakdown of what the transition actually required:

Read