What’s the difference: QA Engineer with AI tools, AI QA Engineer and ML Evaluation Engineer

Stop calling it “QA AI Testing” if you’re just using ChatGPT to write scripts. 🛑

There’s massive confusion in the market between three completely different roles. If you’re hiring or pivoting, you need to know the difference:

1. QA Engineer with AI tools

Goal: Efficiency.
The Reality: You’re testing a traditional, deterministic product. You just use ChatGPT to generate test cases, or use Cursor/Claude Code for test automation. It’s “vibe-coding” for old-school tasks.

2. AI QA Engineer

Goal: Integration health check
The Reality: Testing how an LLM chat looks inside a CRM. You check if it’s polite and doesn’t break the UI. You’re still using asserts, just with a bit more “flavor.”

3. ML Evaluation Engineer

Goal: Managing the inherent chaos of non-deterministic models.
The Reality: You don’t use Asserts; you use Statistical Metrics.
The Tools: Evaluation harnesses (like EleutherAI), Python-driven metric modules.

Why the 3rd one is a different beast:

Probability > Determinism: You’re not checking if 2+2=4. You’re checking if a metric score of 0.87 is acceptable for your specific use case.
Cost is a Metric: In ML Eval, token spend is as important as latency. If your agent is “smart” but costs $2 per request, you’ve failed the test.
Speed is everything: In the current market, a fast 7B model often beats a slow 70B model. Performance testing isn’t an “extra”—it’s the core.

Conclusion

Traditional QA = Finding bugs.
ML Evaluation = Measuring uncertainty.

Conclusion

Every pipeline costs a lot

Will AI pass a code review?

How to Test AI Applications: The Grader’s Ruler for LLM-as-a-judge

Add a comment Cancel reply