In this deep-dive technical session, I'll share how Uber revamped its mobile testing approach using large language models (LLMs) to create DragonCrawl - a system that tests mobile applications with human-like intuition. I'll walk through how we transformed mobile testing from a maintenance-heavy, script-based approach to an intelligent, adaptive system that can handle UI changes automatically across multiple languages and cities.
The session will cover:
- The challenges of traditional mobile testing at scale (3,000+ simultaneous experiments)
- Architecture and implementation of DragonCrawl using MPNet and embedding techniques
- Real-world examples of DragonCrawl's adaptive behavior and problem-solving capabilities
- Practical strategies for handling LLM challenges like hallucinations and adversarial cases
- Results and metrics from production deployment
- Live demonstration of DragonCrawl in action
Key Takeaways:- Understanding how to leverage LLMs for automated testing, including model selection criteria, architecture decisions, and implementation strategies that enable human-like testing behavior
- Practical techniques for handling LLM challenges in production systems, including specific approaches to manage hallucinations, adversarial cases, and edge scenarios
- Real-world insights into scaling automated testing across multiple languages and locations without maintaining separate test scripts, including specific metrics and benchmarks from Uber's production environment