Conference Schedule

Full day of sessions, workshops, and networking opportunities.

7:00 AM · 1h · East Ballroom

Breakfast & Opening Remarks

8:00 AM · 50 min · East Ballroom Keynote

Ensuring Software Quality in the world of AI Developers

Like it or not, AI agents can now turn a loosely written paragraph of requirements into a pull request that looks production-ready in minutes. That’s impressive — and horrifying. When code is being generated faster than humans can fully internalize it, QA becomes the last line of defense between “seems fine” and a 2 a.m. incident caused by a misunderstood requirement or a bad database migration. In this session, we’ll explore how quality practices must evolve in a world where teams treat AI agents like new junior developers. We’ll talk about strengthening test plans so they validate intent instead of just implementation, expanding automated coverage to catch AI-specific failure modes, and partnering closely with developers whose familiarity with the generated code may be thinner than in years past. We’ll look at redefining code and feature review processes, improving requirement clarity to reduce ambiguity before it becomes defects, documenting our new vibe coded enterprise systems, and adding guardrails so AI-authored changes can’t slip past quality gates unchecked. By the end, you’ll have a clear understanding of the new risks AI introduces — and practical strategies to help your team move fast without letting AI-generated pull requests quietly YOLO their way into prod.

Matthew-Hope Eland

Matthew-Hope Eland

8:50 AM · 10 min · East Ballroom

Coffee Break

9:00 AM · 50 min · Great Hall 1 & 2

Taming Autonomous Testing and Behavior Models – Mario, Will You Behave?

How do you test a system with millions of possible states? Traditional test automation follows a simple pattern: write tests that execute predefined steps and verify expected outcomes. But this approach breaks down when the system under test has vast state spaces and complex behaviors that can't be fully covered by manually written tests. In this session, I'll show how I tamed two powerful techniques—autonomous testing and behavior models—using Super Mario Bros. as my test subject. I'll demonstrate: 1. Autonomous State Space Exploration How I used mutation-based input generation and fitness functions to discover paths through complex systems without human guidance. Watch as Mario autonomously navigates and completes levels. 2. Behavior Models for Validation How I defined causal, safety, and bounded liveness properties that validate system correctness frame-by-frame, building models that act as universal correctness oracles. 3. Combining Both Techniques The real power emerges when autonomous exploration meets behavior validation—systematically discovering edge cases while ensuring correctness at every step. By the end of this session, you'll see: • The difference between traditional test automation and autonomous testing • How fitness functions guide exploration through complex state spaces • How to define behavior properties (causal, safety, liveness) • How to integrate behavior models into autonomous exploration loops • When these techniques make sense for your own projects This talk is based on a published technical series with open-source code: • Part 1: https://testflows.com/blog/testing-super-mario-using-a-behavior-model-autonomously-part1/ • Part 2: https://testflows.com/blog/testing-super-mario-using-a-behavior-model-autonomously-part2/

Vitaliy Zakaznikov

Vitaliy Zakaznikov

9:00 AM · 50 min · Great Hall 3

The Quality Horizon: Modern Best Practices and the Art of Constant Adaptation

When I started my career twenty years ago, software quality basically meant "no bugs," and testing meant executing a finite set of cases. Since then, I’ve watched the concept evolve alongside significant shifts in technology and industry practices. Because these changes have been largely additive, we face an ever-expanding horizon of what "good" software looks like. In this talk, I offer a definition of modern software quality that incorporates the many expectations accumulated from technological and process trends such as Agile, automated testing, cloud hosting, DevOps, and AI. Drawing on my own painful experiences with neglecting or misunderstanding the evolving dimensions of quality, I’ll share examples of what effective practices can look like. To help you assess how you and your team are doing with respect to this laundry list of things to be responsible for, we will break them down into four key pillars: - User Value: Building the Right Thing (strategic alignment, risk assessment, measurable impact) - Product Health: Testing the Things We Can Predict (enabling and executing solid testing strategies) - Operational Health: Dealing with the Unexpected (observability, recovery, non-derministic behaviour) - Sustainability: Holistic Stewardship (security, cost, performance, accessibility, maintainability) We’ll cover a series of questions to help you explore and audit each area, followed by a set of prompts to help you determine where the next quality evolution is likely to come from in your context. While the field may have been simpler when I began, the constant transformation is what has kept it exciting. Fortunately, today it’s easier than ever to learn a new skill that can bring more value to your users and your business. As a final thought, I hope to leave you with the realization that the most important software quality strategy is the willingness to adapt, evolve, and stay curious in an ever-changing landscape.

Tina Fletcher

Tina Fletcher

9:00 AM · 50 min · West Ballroom

Awesome Web Testing with Playwright and AI

Everybody gets frustrated when web apps are broken, but testing them thoroughly doesn't need to be a chore. [Playwright](https://playwright.dev/) together with AI makes testing web apps fun! Playwright offers a slew of nifty features like automatic waiting, mobile emulation, and network interception. Plus, with isolated browser contexts, Playwright tests can set up *much* faster than traditional Web UI tests. In this tutorial, we will automate concise yet robust web app tests for "BuggyBoard", a bug tracking web app with Playwright in TypeScript. And we will expedite our test development with the superpowers of AI. Specifically, we will cover: 1. How to add Playwright to a project 2. How to explore an app with AI assistance and Playwright's MCP 3. How to generate Playwright test code 4. How to context engineer scalable automation patterns 5. How Playwright compares to other browser automation tools like Selenium and Cypress By the end of this session, you'll be empowered to test modern web apps with modern web test tools. You'll also have an example project to be the foundation for your future tests. --- This year, I am revamping my Playwright content to include AI assisted coding techniques, like using context engineering, agentic analysis, and Playwright's MCP servers.

Andrew Knight

Andrew Knight

9:00 AM · 50 min · Student-Alumni Room

What is Your Working Genius?

The working genius model is a productivity model developed by Patrick Lencioni with the goal of accomplishing a simple concept: bringing more joy and fulfillment at work! When you and your team understand where your geniuses are and how to (and when not to) use them, it can improve meetings, reduce burnout, and dramatically reduce turbulence in getting projects done. In this session we will review the 6 types of working geniuses and how they bring projects from ideation to implementation. We will discover the hidden cause of burnout and how to keep meetings, including our agile ceremonies, more focused and more productive as a whole, all with the goal of improving your life and team culture, both in and outside of work. (that's right… ALL projects!)

Kyle Jenkins

Kyle Jenkins

9:00 AM · 50 min · Interfaith Room

The Human Advantage: Making Better QA Decisions with AI in the Loop

As AI becomes a key player in our tools and workflows, testing and testers need to evolve the strategies since the quality still depends on human judgment. This session focuses on the tester’s mindset and the strategies to follow in their personal + professional lives that help QA engineers make better decisions, communicate risk clearly, and build trust in what they deliver This is a practical, human-centered talk on how to become a stronger Quality Assurance professional in an AI-accelerated environment without becoming overly dependent on tools. Focusing on the human aspect of the QA, we’ll explore the mental models and habits that separate “someone who runs tests” from “someone who owns quality.” The key takeaways include - Using AI as a test helper, expanding ideas, help debug and write reports, prompt pattens to generate better scenarios QA mindset shifts for stronger testing - risk-based thinking, systems thinking, and skepticism without becoming adversarial. Communication strategies that make QAs more human Personal growth strategies that anyone can follow to build confidence and to be resilient A set of repeatable daily/weekly habits (checklists, journaling prompts, review routines) to build stronger intuition and consistency. This session will be beneficial for Quality Assurance professions in all levels, if you are wondering how the QA roles evolve with AI in the mix, this session is for you.

Krishna Bandarupalli

Krishna Bandarupalli

9:00 AM · 50 min · Cartoon Room

Decision Records: Understanding Why Those Decisions Were Made!

Ever stared at a complex system and thought, "Wait, why was that decision made?!” We’ve all been there – lost in a maze of logic, struggling to track down the root cause of a problem. Decision Records offer a brilliant way to finally unlock that understanding. They’re like having a detailed, searchable log of every decision in your applications – from architectural style to authentication to service discovery and containerization. By capturing these decisions, you build more maintainable, auditable, and, frankly, less frustrating systems. In this session, you will learn how decision records solve problems and will be given ideas of templates that you can implement in your projects today! Let’s ditch the guesswork and start documenting the why – because truly understanding your systems is the key to unlocking its full potential!

Sarah Dutkiewicz

Sarah Dutkiewicz

9:00 AM · 2h · Senate Chamber Workshop

The Gumshoe Protocol

It was a dark and stormy night, the kind of night you expect a P0 defect, when the Teams call interrupted my dinner of Cup of Noodles. It was the VP of Customer Success “we have a customer facing problem. We need you and the Gumshoe Team on the case”. Root cause analysis (RCA) is a critical skill for everyone, however, most professionals have never had the opportunity to identify the root cause of a defect before needing to do so in a critical situation. Effective RCA requires all stakeholders to think critically and use their best judgement on the often limited information available. In this workshop, participants will role-play through a real-life scenario and interact with logs, users, and other stakeholders to figure out the root cause before coming together to brainstorm on how we could have prevented the incident. Join the Gumshoe Team of developers, QAs, product professionals, customer support, and project managers to crack the case.

Jenna Charlton

Jenna Charlton

Jenny Bramble

Jenny Bramble

9:00 AM · 2h · Sphinx Centennial Leadership Suite Workshop

Breaking GPT & Claude: The AI Red-Team Workshop That Gets You Hired

Meta AI. OpenAI. Anthropic. They're not just hiring prompt engineers anymore — they're hunting for AI Evaluation Engineers: people who can find where AI breaks before attackers do. It's one of the fastest-growing, highest-paying roles in tech, and almost nobody knows how to get there yet. This 2-hour hands-on workshop is your unfair head start. You'll go behind the curtain of how production AI assistants are actually built — spoiler: it's one model plus a paragraph of rules — and learn exactly how those rules fail. Working against two real-world bots (a healthcare triage assistant and a brokerage chatbot), you'll run live attacks across six frontier models from Meta, OpenAI, and beyond, watching the same technique succeed or fail in real time. What you'll actually do: * Jailbreak AI chatbots using role-play overrides, authority framing, encoded injections, and prompt extraction — then see why some guardrails hold and others crumble * Run automated red-team evals using the same open-source tooling (promptfoo) used by professional AI safety teams * Learn why "technique transfers" — master one attack pattern and it works across GPT, Claude, Llama, and whatever ships next year * Understand why cost and latency are safety metrics, not just engineering concerns Bring your laptop – everything else runs in the cloud. No prior AI experience needed. In this workshop, you'll use Claude Code – Anthropic's AI assistant, which must be run directly in your computer's terminal. We ask everyone attending to install it beforehand. Workshop Setup Instructions: https://drive.google.com/file/d/1MVOMWtzM7jQd5dGL1nfpPsnk_TJtfNzB/view?usp=drive_link Whether you're a QA engineer looking to pivot, a developer curious about AI safety, or simply someone who wants to understand what's really going on inside the tools everyone's using — this workshop hands you the skills companies are paying top dollar to find. Come break something. Leave knowing how to make it unbreakable.

Igor Dorovskikh

Igor Dorovskikh

Vladimir Tanev

Vladimir Tanev

9:50 AM · 10 min · East Ballroom

Break & Refreshments - 2nd floor, Traditions Room

10:00 AM · 50 min · Great Hall 1 & 2

If AI is Writing the Code, Who’s Guarding the Quality?

AI is helping teams rapidly transform the software development lifecycle—from requirements and design to coding and testing. While AI tools offer faster delivery and greater productivity, they also introduce new and often overlooked quality risks. Without the right processes and culture in place, AI can amplify existing challenges—leading to increased rework, wasted resources, team burnout, and a loss of trust in delivery outcomes. In this interactive session, Jeff Van Fleet and Scott Boyd break down how AI is reshaping the SDLC, where it can introduce hidden risks, and how QA teams can mitigate those risks. You’ll assess where your team sits on an AI maturity model, see what the data show about how teams like yours are performing at each stage, and leave with concrete next steps — not theory, but something you can act on tomorrow: How AI is impacting each phase of the SDLC—and where it creates the most risk The most common quality issues introduced by AI-generated code How to build reciprocal feedback loops where your team’s domain expertise ensures outputs are repeatable, high-quality, and continuously improved upon by both humans and AI

Jeff Van Fleet

Jeff Van Fleet

Scott Boyd

Scott Boyd

10:00 AM · 50 min · Great Hall 3

Playwright+MCP Server+Claude: a powerful trio

We’ve all heard the hype about Playwright. And the hype about MCP Servers. And even more hype about Claude Code. Now you get to see what all the hype is about. During this session we are going to wire up Claude Code to a Playwright MCP Server in a Playwright test automation project. Then use Claude Code to travel around the web, mapping out pages as it goes. We might even be able to get Claude to create some tests. Will Claude hallucinate? Probably. Will Claude create a bunch of duplicate tests? Probably. Will Claude create a production-ready test automation framework in the short amount of time we have for this session? We’ll find out.

Matthew Eakin

Matthew Eakin

10:00 AM · 50 min · West Ballroom

Being Nimble - The next step in Agile Testing Optimization

We’ll examine how playbooks drive collaboration by ensuring the right stakeholders engage at the right moments throughout the SDLC by turning “who’s in the room” from a variable into a strategic advantage. You’ll learn practical approaches to building playbooks that support nimble pivots without sacrificing quality. We’ll also address the reality that every Agile framework has tradeoffs. Rather than debating methodologies, this session focuses on quick diagnostic techniques to identify friction points and practical adjustments that help teams operate more efficiently within their chosen approach. Attendees will leave with actionable strategies to strengthen team collaboration, unlock efficiency gains and critical capabilities when adaptability matters most.

Melissa Tondi

Melissa Tondi

10:00 AM · 50 min · Student-Alumni Room

Layoff to Launch

Losing a job can feel like a dead end, but what if its the start of your ever big chapter ? session covers strategies for turning layoffs and reorgs in to career comebacks, leverage your network, upskill, and pivot in to new opportunities. if you are navigating a layoff or looking to future-proof your career, this session will equip you with practical tools to turn roadblocks in to launchpads.

Ram Gadde

Ram Gadde

10:00 AM · 50 min · Interfaith Room

Rolling the Dice

One of the hardest challenges in Quality Assurance is deciding when a release is ready. All software has bugs, but was that strange behavior you observed in testing just a fluke, or a sign of something catastrophic? This talk introduces a practical risk assessment framework built around “rolling the dice” on quality. Each situation is modeled as a die, and each potential outcome as one of its faces. This mental model helps attendees visualize uncertainty, understand the range of possibilities, and evaluate the stability of their releases. Attendees will learn to identify sources of chaos in their products and focus their testing efforts on high-impact risks, removing negative outcomes from their dice and increasing the odds of a successful release.

Paul Turchinetz

Paul Turchinetz

10:00 AM · 50 min · Cartoon Room

Taming the Beast: Testing Non-Deterministic AI Systems with Confidence

For decades, software testing has relied on a comforting assumption: given the same input, systems should produce the same output. AI-enabled systems break that assumption entirely. Large language models and other AI components generate responses that can vary in structure, tone, and content while still appearing “correct”. In this session, we’ll explore why traditional testing strategies struggle with non-deterministic AI behavior and where they quietly fail. Using real-world examples such as AI chatbots and resume-screening systems, we’ll walk through practical techniques for validating AI outputs without relying on brittle, deterministic assertions. Topics include input variation strategies, semantic similarity analysis, bias detection, and using LLMs responsibly as automated evaluators (aka “LLM-as-a-Judge”). Attendees will leave with a clear mental model for testing AI-based systems, concrete patterns they can apply immediately, and guidance on balancing automation, human judgment, and risk. If you’re responsible for the quality of AI-driven features, this talk will help you move from uncertainty to confidence!

Lee Barnes

Lee Barnes

10:50 AM · 10 min · East Ballroom

Break

11:00 AM · 50 min · Great Hall 1 & 2

Business Challenges of Testing B.U.C.K.E.Y.E.D. – A Self-Driving AI Bus System

Updated for 2026! AI is rapidly reshaping Software Quality Assurance and Testing—creating new expectations, risks, and business challenges, especially as organizations begin confronting the complexity of testing strong AI systems. In this dynamic, interactive session, attendees will examine how AI is impacting QA and software testing from the perspective of developers, test engineers, managers, executives, and business owners. Using a realistic but fictitious scenario, participants will serve as members of the Checkpoint Technologies Executive IT Team after being awarded a contract by Imaginary Corporation to test B.U.C.K.E.Y.E.D. — Bus Using Cognitive Knowledge, Engineered to Yield Efficient Driving — a fully autonomous, AI-powered self-driving bus system developed for the state of Ohio. Because the system uses strong AI, with advanced autonomous reasoning, learning, and adaptability, the testing effort presents uniquely complex and high-stakes challenges. Led by Bob Crews, CEO of Checkpoint Technologies, participants will identify ten primary business testing challenges through the eyes of an IT Executive Board, with the goal of ensuring project success, reducing risk, preventing negligence, and protecting both Checkpoint Technologies and Imaginary Corporation.

Bob Crews

Bob Crews

11:00 AM · 50 min · Great Hall 3

Agentic Testing & The Future of Trust

AI agents are generating more code than ever, but how do you ensure its reliability? Learn why traditional testing fails and how Agentic Testing guarantees trust and scalability in AI-driven systems.

Mohit Juneja

Mohit Juneja

11:00 AM · 50 min · West Ballroom

Integrated AI: Scaling Quality with Unified Test Stack & AI Agents

Ever wondered how other QE teams actually use AI in their day-to-day work? We spoke with over 200 organizations to find the answer. While 94% of teams already use AI in software testing, their use cases and ROI vary widely. What seems to matter most is how well they integrate AI within existing workflows. In this session, we will share our findings and showcase how companies like Wells Fargo, Nike, General Motors, and Spotify use Unified Test Stack & AI Agents to natively integrate AI and deliver measurable, practical value. We will demonstrate high-impact use cases - such as automated test generation, low-code automation, self-healing execution, accessibility remediation, and auto-generated test failure analysis - that help customers achieve a 10x boost in productivity, a 50% jump in coverage, and a 40% reduction in build failures. If you’re a QA leader, developer, or tester looking to deliver quality at scale, this session will provide a clear, practical view of what’s working—and how to stay ahead as testing enters the AI era

Akshay Kumar

Akshay Kumar

Arpit Agrawal

Arpit Agrawal

11:00 AM · 50 min · Student-Alumni Room

From Go-Live Risk to Confidence: Continuous Validation for D365 F&O with AI-Powered Automation

As organizations roll out Microsoft Dynamics 365 Finance & Operations, ensuring system stability and performance at go-live has never been more critical. Yet many teams still rely on fragile, manual, or script-heavy testing approaches that slow progress and introduce risk. In this session, we’ll explore how leading enterprises are adopting a continuous validation approach to support D365 F&O implementations from testing through go-live and beyond. As a strategic Microsoft partner for test automation, Leapwork enables teams to validate complex, end-to-end business processes across ERP and connected systems without the burden of code. You’ll learn how to: Reduce go-live risk with scalable, reliable test automation Ensure performance readiness across critical workflows Accelerate testing cycles with a no-code, enterprise-ready platform Extend validation into the future with AI-powered capabilities We’ll also introduce Leapwork AI Studio and how it is pushing the boundaries of efficiency and speed for QA teams, helping organizations move faster while maintaining confidence in every release. Whether you’re preparing for a D365 F&O implementation or optimizing an existing environment, this session will provide practical insights to help you move forward with confidence.

Charles Cedrone

Charles Cedrone

11:00 AM · 50 min · Interfaith Room

Making Migrations Safer and Cheaper with AI-powered Testing

Migrating legacy systems, such as COBOL-based applications to modern technologies are more feasible today thanks to development copilots that accelerate work and reduce costs (Github Copilot, Cursor, etc). However, testing remains a significant hurdle, especially when documentation is missing or outdated (which is the most common scenario), making it hard to verify that the new system behaves the same as the old one. This talk presents a two-track approach designed to help QA teams tackle this challenge in migrations in a more sustainable way: first, a static understanding of the system through code-derived diagrams and second, a dynamic understanding via observability (being able to ask in natural language to the system what’s happening in the backend), acting as a testing copilot. A core contribution of the session is the integration of open-source tools we developed for AI agents that assist testers (https://github.com/abstracta/tero) and a set of examples used to demonstrate with practical demos. To elaborate, the static track uses code-derived diagrams (state machines, flow diagrams, and sequence diagrams) generated directly from the codebase to illuminate system behavior without relying on outdated documentation. In addition, the dynamic track introduces observability as a copilot for testers, enabling real-time visibility into backend behavior in production-like conditions and helping testers validate that changes preserve intended behavior even in the absence of perfect docs. The talk emphasizes AI as a friendly ally for testers across roles, not just developers, and avoids overpromising metrics. While formal metrics are still in progress, early signals suggest that this approach can enable more feasible migration projects by optimizing the testing part. As part of the process, we need a validation stage that will combine expert reviews of generated artifacts (diagrams and observability outputs) with cross-checks by peers to ensure coherence and usefulness. Contribution to the audience This talk offers a practical, replicable view on how to address legacy migrations from a QA lens powered by AI, without relying on complete or up-to-date documentation. It promotes an “AI as ally” narrative that fosters collaboration among testers, developers, and stakeholders, and it establishes a clear pathway aiming to reduce costs and delivery times while safeguarding quality and behavioral verification. Notes for reviewers Target audience: QA managers, test leads, engineering managers, VPs and DevOps leaders involved in migrations.

Federico Toledo

Federico Toledo

11:00 AM · 50 min · Cartoon Room

When Regression Testing Holds You Hostage

Is your release cadence for long-lived software bogging down? It’s rarely the result of bad decisions—it’s the cost of success. As systems evolve, they accumulate features, behaviors, and user expectations that make change increasingly complex. In these environments, regressions are not anomalies but a structural reality. Regression testing emerges as the system’s immune response, preserving trust as complexity grows. Over time, however, ever-expanding regression suites slow feedback cycles and unintentionally anchor release velocity. This talk explores test impact analysis (TIA) as a governance model for scaled QA automation. By correlating tests to actual code coverage and using change data to determine impact, TIA introduces precision and policy into regression execution. Instead of relying on blanket “run everything” strategies, teams can make informed, repeatable decisions about what to test, when, and why. Attendees will see how this approach transforms regression testing from a blunt instrument into a sustainable, scalable quality strategy for complex systems. You'll learn how to: * Recognize why complexity, regressions, and expanding regression suites are inevitable outcomes of successful, long-lived software—and why they demand intentional governance. * Understand regression gravity as a systemic force that constrains release frequency, feedback loops, and automation scalability. * Apply test impact analysis principles as a governance mechanism to control regression growth, optimize execution, and maintain confidence at scale.

David Vano

David Vano

Wilhelm Haaker

Wilhelm Haaker

11:00 AM · 1h · Senate Chamber Workshop

Pull Over: AI Agent Testing Has Entered the Highway

AI agents are writing code, making decisions, and shipping features — but who's testing the agents? In this session, Deep Barot, CEO of ContextQA, tackles one of the most pressing challenges in modern software quality: how do you test something that doesn't behave the same way twice? Deep will walk through a live demo focused specifically on evaluating AI agents — from generating meaningful test scenarios to running multi-LLM judge evaluations that give your team real confidence in what's shipping. You'll see how ContextQA approaches evals not as a checkbox, but as a rigorous, repeatable process — using multiple LLM judges to score outputs, surface disagreements, and produce confidence levels your team can actually act on. If your organization is building with AI agents and wondering how to trust what they do, this session gives you a practical, live look at how to bring quality engineering discipline to agentic systems. Key Takeaways: • How to design test scenarios purpose-built for AI agent behavior • Why single-model evaluation falls short and how multi-LLM judging fills the gap • How to interpret confidence levels across judges to make informed release decisions • A live look at ContextQA's eval framework in action

Deep Barot

Deep Barot

11:00 AM · 1h · Sphinx Centennial Leadership Suite Workshop

Your AI Agent Just Went Rogue: A Live Security Audit of Autonomous AI

Description: AI agents aren't chatbots anymore. They send emails, edit CRM records, run shell commands, browse the web, and remember everything — autonomously, with no human in the loop. That's not a product pitch. That's an attack surface. In this 50-minute session, we crack open a real autonomous sales agent — one with 33 tools, root runtime privilege, and production access to Gmail and HubSpot — and run a live security audit against it using the same methodology Fortune 500 companies use to evaluate AI risk. You'll watch six guardrails break in real time. What we cover: We start with two cautionary tales: Air Canada, whose AI chatbot created direct legal liability after giving customers wrong refund advice, and Zillow, whose AI pricing model quietly accumulated $500M+ in losses before anyone noticed. These aren't edge cases — they're the playbook for what happens when AI systems ship without proper evaluation. Then we go hands-on. Using AIVSS — the AI Vulnerability Scoring System — we walk through ten risk categories every autonomous agent should be tested against, focusing on the three that matter most: Tool Use, Memory, and Autonomy of Action. You'll see exactly how an agent gets tricked into running a shell command it was never supposed to touch, how it can be convinced to draw its own attack map for a bad actor, and how a simple role-swap prompt sends it completely off the rails. Every failure is scored, every vulnerability is traceable, and every fix is actionable. You'll leave with: An understanding of why the gap between what an agent can do and what it should do is your biggest security problem right now The question isn't whether your agent can go rogue. It's whether you'll find out before your users do.

Igor Dorovskikh

Igor Dorovskikh

Vladimir Tanev

Vladimir Tanev

11:50 AM · 1h 10min · East Ballroom

Lunch

12:50 PM · 1h · Senate Chamber Workshop

How to create a QA or the Highway talk

This session is aimed at the person who is interested in presenting at a conference like QA or the Highway but needs practical help building their first talk. Using examples, suggestions and group feedback, the participant will leave with a step-by-step playbook for what they need to do in order to submit their proposed talk next year.

David Leslie

David Leslie

12:50 PM · 2h · Sphinx Centennial Leadership Suite Workshop

Life after Tech: Rebuilding your Career in the Post-Tech Economy

An interactive workshop let by a veteran tech industry employee and certified career coach. In this session, we'll: - Identify your strongest tech skills and how they are transferable to other industries and roles - Identify what you hope to get out of your relationship with your work - Identify your target industries and roles and how you might capture them Attendees will leave with a plan for what to do now, next and later as they embark on a journey toward their personal "Career Nirvana" This workshop can be run in either 1 hour or 2 hours, the 2 hour version is more in depth and hands-on. Speaker profile: https://docs.google.com/document/d/1cW-RW9_0qRlaDh8MREHUOSmdLPqvnJRoSusQ1lZ1NAM/edit?usp=sharing

Nicole Derr

Nicole Derr

1:00 PM · 50 min · Great Hall 1 & 2

Testing the Untestable: A Practical Guide to LLM Quality Assurance

Your entire QA career has been a lie. Okay, not entirely but everything you know about testing breaks down when the system under test is an LLM. Same input, different output. No spec to test against. "Correct" is subjective. Welcome to AI testing, where assert_equals goes to die. But here's the thing: AI still needs QA. It needs it MORE than deterministic systems because the failure modes are weirder, harder to detect, and way more embarrassing when they hit production. In this talk, I'll share the AI QA Playbook, a practical framework for testing systems that don't behave the same way twice. The five testing pillars you need: Accuracy Testing: Building golden datasets when "correct" is fuzzy Bias Testing: Counterfactual test design that catches discrimination Hallucination Testing: Detecting confident nonsense before users do Security Testing: Prompt injection, jailbreaks, and data leakage Regression Testing: What does "regression" even mean for AI? What makes this different: Real test data examples, not theory Metrics that actually work for non-deterministic systems CI/CD integration patterns Tools you can use today (including my open-source contributions) I've spent the last two years figuring out how to do QA for systems that refuse to be predictable. This talk is the playbook I wish existed when I started.

Tanvi Mittal

Tanvi Mittal

1:00 PM · 50 min · Great Hall 3

From Quality Metrics to Quality Mindset: Building Teams That Own Outcomes

Software teams often rely on metrics, dashboards, and defined processes to guide quality efforts. While these tools provide important visibility, they don’t automatically create ownership, accountability, or better outcomes. In many cases, teams learn to optimize for the numbers rather than the intent behind them. This session explores how quality outcomes are shaped less by tools and processes and more by leadership behaviors and team mindset. Drawing from experience in software testing and people leadership, the talk examines how well-meaning management practices can unintentionally reinforce “check-the-box” behavior, and what leaders can do differently to build teams that truly own quality. Participants will explore: Why quality challenges are often rooted in leadership and communication gaps. How metrics can shift from helpful signals to counterproductive targets. Common ways leaders unintentionally discourage ownership. Practical leadership behaviors that promote clarity, accountability, and proactive thinking Rather than introducing new frameworks or methodologies, this session focuses on small, intentional leadership shifts that can have an outsized impact on team behavior. Attendees will leave with actionable ideas they can apply immediately to move teams from compliance-driven execution to shared ownership of outcomes.

Barbara Deaton

Barbara Deaton

1:00 PM · 50 min · West Ballroom

Deploying and Testing Ephemeral Environments

Merging code that hasn't been fully tested is one of the biggest reasons teams experience missed release dates and flaky test suites. Why? Merging code often means other developers start their new work based on the newly merged code, changes are queued up for the next release, and the code quickly becomes coupled to other changes. The result is often code freezes, failing test runs, late night release "parties", and painful go / no-go meetings where you get pinned between postponing the release or shipping with bugs. Deploying to and testing ephemeral environments gives your team the ability to know for certain that the new features are implemented correctly *before committing those features to the release* and *before other developers start depending on the new code*. This approach is more important than ever when working with AI-generated code. In this session, we will cover, - What ephemeral environments are and why they are so important - Branching strategies for proper test code management - Configuring your CI/CD pipeline to automatically deploy feature branches - How to run end-to-end tests on ephemeral environments - Strategies for managing databases and test data in ephemeral environments - Leveraging ephemeral environments to protect your company from AI risks Key Takeaways - Understand the architecture and tooling options for ephemeral environment workflows - Configure your CI/CD pipeline to deploy and test feature branches in isolation - Establish quality gates to protect your team from starting with broken code - Apply these practices to validate AI-generated code before it impacts your team

Chris Harbert

Chris Harbert

1:00 PM · 50 min · Student-Alumni Room

Turbocharge Your Playwright: Capabilities You're Probably Not Using

Most teams use Playwright to open a browser and click through flows. But Playwright can do so much more — mock APIs, persist authentication, inject custom headers for distributed tracing, and more. In this talk, Kevin goes beyond the basics and explores the capabilities that separate a good Playwright suite from a great one. You'll leave with techniques you can add to your suite the same week. This is not a "getting started with Playwright" talk. It's for teams that are already running Playwright tests and want to unlock capabilities they didn't know they had. I'll cover things like persisting authentication state across tests, mocking API responses to isolate your UI layer, injecting traceparent headers for distributed tracing, and intercepting network traffic — all features built into the library that most teams never touch. The goal is to get the audience thinking like testers when they read Playwright's API, spotting features that solve real testing problems rather than just following tutorials.

Kevin Roe

Kevin Roe

1:00 PM · 50 min · Interfaith Room

The Judgment Gap: Why AI Adoption Without Verification Is Worse Than No AI At All

Your organization adopted AI. Congratulations — so did everyone else. But here's what the data actually shows: in a pre-registered experiment with 758 consultants, Harvard and BCG found that AI made good work 40% better and bad work 19 percentage points *worse* than working without AI at all. The tool that amplifies expertise also amplifies poor judgment — and the boundary between the two is invisible without training. This isn't a theoretical risk. In our own survey of 571 professionals, 93.9% use AI frequently — but 69.4% spend zero time on advanced capabilities. Microsoft's 300,000-person Copilot rollout found the same pattern: broad adoption clustered at the simplest features, with a measurable productivity dip from weeks 3 through 10 as initial excitement collided with real-world complexity. Adoption isn't the hard part anymore. Judgment is. In this session, you'll learn: - How to identify whether a task falls inside or outside AI's reliability boundary — and why getting this wrong is catastrophic - A practical verification framework that catches AI failures *before* they reach your clients or production systems - Why the "adoption valley" kills most AI initiatives, and the specific practices that get teams through it - How to build team-level habits that make AI output trustworthy by default, not by accident You'll leave with concrete processes you can implement Monday morning to close the gap between "we use AI" and "we use AI well."

Tim Rayburn

Tim Rayburn

1:00 PM · 50 min · Cartoon Room

AI Testing Isn't One Thing (And Treating It Like It Is Will Bite You)

Your team shipped an AI feature. Congrats. Now someone asks: how do we test this? You write a test. The output changes. You run it again. Different output. You consider a career in farming. Here's the thing nobody tells you upfront: testing AI-powered software isn't one discipline, it's two. And the moment you try to apply one strategy to both, you're in trouble. This talk breaks down the Two-Track Testing Model that every QA engineer building on AI needs to understand. There's the deterministic side, your traditional test pyramid covering infrastructure, routing, logic, and guardrails, and there's the AI evaluation side, where outputs are non-deterministic, pass/fail doesn't exist, and you need a completely different mental model to even know what "quality" means. We'll walk through how these two tracks diverge, when they converge, and what it takes to get quality signals from both. You'll leave with a practical framework: the Three Pillars of AI Evaluation (human eval, deterministic checks, and LLM-as-judge), a benchmark-first approach to designing your eval strategy, and a clear picture of how maturity stage changes what you should be testing and how. The fundamentals of our craft haven't changed. The pesticide paradox still applies. Risk-based thinking still applies. You still can't test everything. But the tools, vocabulary, and decision-making are genuinely new, and it's worth getting oriented before you're neck-deep in a chatbot that nobody can evaluate with any confidence. This is the talk I wish existed when I started.

Joel Wilson

Joel Wilson

1:50 PM · 10 min · East Ballroom

Break

1:50 PM · 1h · Senate Chamber Workshop

Collaborate on Your LEGO(R) Vision

LEGO(R) sets are fun to build, but who has ever attempted to build a complete set without looking at the instructions? In this meeting, attendees will form teams and attempt to build a LEGO(R) set without instructions. Only one person from each team will be able to view the finished product before the team starts building. That person must share their vision with the team, who will attempt to build the LEGO(R) set as close to the instructions as possible without peeking. Each group will learn different approaches to collaborate on product development during the meeting to build a set according to a customer's needs. The activity highlights the two Quality Gaps of product development: (1) the gap between what we set out to build and the finished product; and (2) the gap between what customers expect and the finished product. Our goal is to close the two Quality Gaps so we deliver a product on-time & on-budget that customers will love.

Thomas Haver

Thomas Haver

2:00 PM · 50 min · Great Hall 1 & 2

Verify, Then Trust: Human Judgement in the Age of Generative AI

Generative AI is rapidly becoming part of how modern organizations write, evaluate, and make decisions — from drafting content and summarizing data to supporting technical workflows. As these systems become more fluent and convincing, the real challenge is no longer whether AI can produce outputs quickly, but whether humans maintain the skills of verification, discernment, and accountability alongside it. Verify, Then Trust offers a grounded, human-centered framework for navigating AI without panic, blind faith, or outsourcing our thinking. This session explores why trust must be earned through validation, especially in environments where accuracy, clarity, and responsibility matter. Rather than positioning AI as a threat or a replacement, this talk reframes it as a powerful tool that still requires human oversight. Attendees will learn practical habits for staying mentally present, asking better questions, and building norms that keep human judgment meaningfully in the loop. This session is designed for technology professionals, leaders, and teams adapting to AI-enabled work who want to move forward with confidence — without losing critical thinking or responsibility in the process. Key themes include: * Speed vs. correctness in AI-generated outputs * Why verification is becoming a core professional skill * The risk of automation complacency * Practical ways to keep humans accountable and engaged

Ashley René Casey

Ashley René Casey

2:00 PM · 50 min · Great Hall 3

Accessibility testing with Cypress

The European Accessibility Act began enforcement on June 28, 2025, and the U.S. DOJ's ADA Title II web-app rule has hard compliance dates in 2026/2027. In this talk, I'll share real examples from projects where catching WCAG issues early saved a lot of time and money compared to fixing them late. I'll provide examples of testing accessibility using lint rules, component-level accessibility checks (including Storybook), and UI-level validation. I'll also show how I test the real keyboard behavior e2e: TAB order, focus states, common navigation paths.

Vitaly Skadorva

Vitaly Skadorva

2:00 PM · 50 min · West Ballroom

LGTM is not a Strategy

Pull requests are where quality is won or lost, and too many teams still treat reviews with a rushed "LGTM". In this talk, you'll learn a practical, repeatable approach to high-quality PR reviews that balances speed with risk management - covering how to triage changes quickly, review in layers, and write comments that lead to better outcomes without friction. We'll also show how QA and developers bring complementary lenses to the same review, turning acceptance criteria, scenarios, and observability into shared responsibility instead of production surprises. You'll leave with a lightweight framework and concrete habits for authors and reviewers that make reviews faster, kinder, and more effective.

Todd Nussbaum

Todd Nussbaum

2:00 PM · 50 min · Student-Alumni Room

Testing without Ai - How I learned to stop worrying and love QA

Ai has gone from a curiosity to a "must have" technology in less than 4 years and (I grudgingly admit) has gotten significantly better over that time. We better understand it's capabilities and limitations. But there is still a lot of hype and anxiety and misconceptions about Ai. I will talk about how I went from a technical leadership role developing test automation to a 10x engineer with Ai, to becoming obsolete, redundant, unemployed. I'll talk about the hardships and stress affected me, the cold hard realities of Ai, and why I believe whatever the end state of Ai adoption, we're going to need more testers and better QA, not less. This is a difficult transition period, but I see cause for optimism. I'll talk about what Ai can and cannot do well, what Ai should and should not do, how to help non-technical leadership know the difference, and how you can improve the quality of life and the quality of software with or without Ai. P.S. I'm currently working in a position where Ai is forbidden and QA is very important.

Aaron Evans

Aaron Evans

2:00 PM · 50 min · Interfaith Room

The 3 legged stool: AI Won’t Fix Your Agile Problems

Agile frameworks promise adaptability and speed — but even in the AI and digital age, many teams struggle with unclear roles, brittle dependencies, and communication breakdowns that slow delivery and burn people out. The 3‑legged stool is more than a metaphor— it’s a simple systems model for understanding why agile teams succeed or fail. When roles, collaboration, and communication are balanced, teams are stable and productive. When one “leg” is overloaded, ignored, or missing, the system wobbles — no matter how good your AI is or good the tooling or framework looks on paper. In this session, we’ll explore how the 3‑legged stool analogy can be used as a practical diagnostic and alignment tool across modern agile environments. Drawing on real-world experience working with teams navigating today’s increased complexity — remote work, hybrid structures, and constant delivery pressure — you’ll see how this model helps surface hidden constraints, clarify ownership, and manage dependencies more effectively. Rather than prescribing yet another framework, this talk focuses on how to create shared understanding between engineers, product, and leadership — so teams can make better decisions inside the frameworks they already use. Whether you’re building software, coaching teams, or leading transformation efforts, you’ll leave with a lightweight model you can immediately apply to improve team health, delivery flow, and long-term sustainability.

Aubrey Wade

Aubrey Wade

2:00 PM · 50 min · Cartoon Room

From API Contracts to UI Confidence: AI-Driven Quality in CI/CD

In modern distributed architectures, the most disruptive defects are often the ones that live in the gaps between services. Contract violations—schema drift, breaking API changes, and consumer-provider mismatches—frequently bypass traditional test suites, only to cause catastrophic failures in the UI or downstream services after deployment. This session provides a technical blueprint for bridging the gap between API reliability and UI confidence. We will walk through a practical implementation of containerized CI/CD pipelines that utilize oasdiff and Docker to detect breaking changes before they hit production. Key technical takeaways include: Automating the "Contract-to-UI" Link: How to ensure UI automation remains stable by catching underlying API shifts early. AI-Driven Testing with Schemathesis: Using AI to derive edge cases and boundary tests directly from OpenAPI specs to increase coverage without manual script bloating. Intelligent Triage: Implementing AI-assisted failure analysis to interpret pipeline logs and provide plain-language explanations for complex integration failures. Securing the Pipeline: A critical look at security-conscious AI adoption, focusing on data residency and sandboxed execution using enterprise-bounded platforms like Azure OpenAI or AWS Bedrock.

Mohini Agarwal

Mohini Agarwal

Rachana Menon

Rachana Menon

2:50 PM · 10 min · East Ballroom

Break & Refreshments - 2nd floor, Traditions Room

3:00 PM · 50 min · East Ballroom Keynote

The Day Testing Died — And Quality Evolved

Twenty years ago, my job was to break software. I was trained to think in edge cases, failure paths, and regression suites. If something slipped into production, it meant I missed something. Back then, quality meant testing. Then the world changed. Agile arrived. DevOps arrived. Continuous delivery arrived. And I realized something uncomfortable: quality was never about the test cases. It was about the system behind them. And now AI has changed the ground again. Today, code is written by machines. Tests are generated by copilots. Reviews are assisted by algorithms. So if machines can write the tests… Who is responsible for trust? That question is why we must evolve — from testers, to quality champions, to AI validation designers.

Tatyana Arbouzova

Tatyana Arbouzova

3:50 PM · 10 min · East Ballroom

Raffle & Closing Remarks