Tanvi Mittal

US Bank Corp , Test Autumation Lead

About

Tanvi Mittal is a Test Automation Lead at a leading US bank and the founder of Log Miner QA, an AI-driven quality engineering tool that converts production logs into actionable test intelligence. She is also the founder of When2Vibe, a web application focused on smart community-based scheduling and social coordination. A recognized voice in modern testing and AI-powered quality practices, she is a Keynote Speaker at WCSC 2026 and a featured speaker at WITCON. She is an IEEE Senior Member and serves as the Cincinnati Chapter Lead for BrowserStack, driving community learning and professional development in software quality and automation. She actively mentors and advises professionals and founders, and builds community through HerNextTech, a platform supporting women in tech with practical AI learning, leadership growth, and career guidance.

Connect

LinkedIn Instagram Blog Website

Testing the Untestable: A Practical Guide to LLM Quality Assurance

Time

1:00 PM - 1:50 PM

Room

Great Hall 1 & 2

Description

Your entire QA career has been a lie. Okay, not entirely but everything you know about testing breaks down when the system under test is an LLM.
Same input, different output. No spec to test against. "Correct" is subjective. Welcome to AI testing, where assert_equals goes to die.
But here's the thing: AI still needs QA. It needs it MORE than deterministic systems because the failure modes are weirder, harder to detect, and way more embarrassing when they hit production.
In this talk, I'll share the AI QA Playbook, a practical framework for testing systems that don't behave the same way twice.
The five testing pillars you need:

Accuracy Testing: Building golden datasets when "correct" is fuzzy
Bias Testing: Counterfactual test design that catches discrimination
Hallucination Testing: Detecting confident nonsense before users do
Security Testing: Prompt injection, jailbreaks, and data leakage
Regression Testing: What does "regression" even mean for AI?

What makes this different:

Real test data examples, not theory
Metrics that actually work for non-deterministic systems
CI/CD integration patterns
Tools you can use today (including my open-source contributions)

I've spent the last two years figuring out how to do QA for systems that refuse to be predictable. This talk is the playbook I wish existed when I started.