What’s the difference: QA Engineer with AI tools, AI QA Engineer and ML Evaluation Engineer

Stop calling it “QA AI Testing” if you’re just using ChatGPT to write scripts. 🛑
There’s massive confusion in the market between three completely different roles. If you’re hiring or pivoting, you need to know the difference:
1. QA Engineer with AI tools
Goal: Efficiency.
The Reality: You’re testing a traditional, deterministic product. You just use ChatGPT to generate test cases, or use Cursor/Claude Code for test automation. It’s “vibe-coding” for old-school tasks.
2. AI QA Engineer
Goal: Integration health check
The Reality: Testing how an LLM chat looks inside a CRM. You check if it’s polite and doesn’t break the UI. You’re still using asserts, just with a bit more “flavor.”
3. ML Evaluation Engineer
Goal: Managing the inherent chaos of non-deterministic models.
The Reality: You don’t use Asserts; you use Statistical Metrics.
The Tools: Evaluation harnesses (like EleutherAI), Python-driven metric modules.
Why the 3rd one is a different beast:
- Probability > Determinism: You’re not checking if 2+2=4. You’re checking if a metric score of 0.87 is acceptable for your specific use case.
- Cost is a Metric: In ML Eval, token spend is as important as latency. If your agent is “smart” but costs $2 per request, you’ve failed the test.
- Speed is everything: In the current market, a fast 7B model often beats a slow 70B model. Performance testing isn’t an “extra”—it’s the core.
Conclusion
Traditional QA = Finding bugs.
ML Evaluation = Measuring uncertainty.


