A New Layer of Confidence: Our Journey with AI-driven E2E Testing at Freenow by Lyft
- Christian Schiller
- vor 16 Stunden
- 5 Min. Lesezeit
Introduction: The Hunt for a True E2E Solution
For several years leading up to 2024, the mobile engineering division at Freenow had operated on a developer-led quality model. With no dedicated manual QA engineers and a handful of SDETs focused on improving testing capabilities, the responsibility for ensuring quality fell directly to our development teams. At its peak, this involved nearly 90 engineers across 11 squads, all pushing to deliver value in a fast-paced weekly release cycle.
While this model fostered a strong sense of ownership, it left us with a significant challenge. We were proficient at testing our code in isolation, but we were missing a crucial piece of the puzzle. For a long time, we had been searching for—and frankly, dreaming of—a true end-to-end (E2E) testing solution that could validate the entire user experience from start to finish.
The Problem: A World of Testing Silos
Our developers were incredibly diligent, having grown our unit test suite to multiple thousands. The codebase also has component-level UI tests (using Espresso and XCUITest), though these were often fragile, flaky, and notoriously hard to write in the first place. However, all these tests, by design, operated in a bubble. To ensure stability and focus, they ran against mocked network responses. Our backend engineers, similarly, tested their services in isolation.
This created a world of testing silos. While individual components and services were verified, we had no automated way to ensure they all worked together in a real-world scenario. We had previously attempted to bridge this gap with contract testing, but it proved difficult to maintain at our scale.
The result was a critical blind spot: we had little confidence in our end-to-end (E2E) user journeys. A developer could verify that the login screen's UI worked, but we couldn't automatically check if a real user could actually log in, book a taxi, and pay for it. This gap wasn't theoretical; it meant significant bugs could, and did, slip into production. For instance, a subtle bug was creating a silent barrier for new users in some of our key markets, an issue our previous testing methods were not designed to catch.
Our First Encounter with AI: A Surprising Discovery
In July 2024, we began exploring potential solutions and decided to pilot Mobileboost's GPT-Driver. The initial experience was a genuine surprise. As one of our engineers put it, "We were surprised how far it could reach with just a simple prompt, right out of the box."
We gave it a basic, plain-English instruction for a complex user flow, like logging in with email and password (bypassing OTP for that test user), and it just... worked. It navigated, accepted permissions, tapped popups away, typed and verified login. This was our "aha!" moment. We realized this technology could be the key to finally breaking down our testing silos and automating the true user experience, something that was previously only possible through time-consuming manual testing by developers.
The Journey: From a Simple Pilot to a CI Safety Net
Our adoption was a gradual, deliberate process that unfolded in four key steps:
1. The Pilot: We moved from simple experimental prompts to creating our first real tests for the most critical user flows: registration, email login, and booking a taxi. These were the journeys we knew were essential but had no automated coverage for.
2. CI Integration: The Nightly Signal: The true power of automation is consistency. We immediately integrated these pilot tests into our GitLab CI/CD pipeline. This was a critical step. It created a reliable, nightly signal on the health of our most important flows, giving us a level of confidence we hadn't had before.
3. Calculating the Value: With a stable signal in place, the impact became easy to quantify. A full manual E2E test of a critical flow could take a developer around 30 minutes. By automating just a few of these, we were saving multiple developer-days per month. The AI tests represented a clear return on investment, freeing up our developers to focus on building features, not just testing them.
4. Expansion and Collaboration: Armed with a successful pilot and clear ROI, we expanded access to other teams. The simplicity of writing tests in natural language was a major factor in its adoption. For the first time, Android, iOS, and backend engineers had a common ground to debug integration issues, making collaboration significantly easier. This accessibility also meant that writing a test no longer required deep knowledge of platform-specific frameworks like Espresso or XCUITest. With just a few easy-to-learn commands, our Product Managers and Designers could understand and even contribute to tests, further boosting cross-functional collaboration.
The Results: Shipping with True Confidence
The new E2E layer had a transformative impact on our confidence. While we, like any large-scale app, never formally "block" a release due to the constant stream of improvements and fixes, the critical difference was the type of issues that no longer slipped through our alpha and beta testing phases. Our new safety net began catching critical bugs before they could ever become a problem for our customers. For a period of over six months, we saw a dramatic reduction in post-release critical incidents. Here are some of the major issues it caught:
Catching Critical Insights: The AI tests immediately identified a previously invisible issue in our registration flow that was preventing new users from getting started.
Silent Experience-Killers:Â It caught app freezes, API malfunctions, and a frustrating bug where the homescreen would get stuck in a permanent loading state.
Platform-Specific Gremlins:Â It identified tricky login issues that were unique to iOS and had previously been difficult to consistently reproduce.
The true ROI wasn't just about cost savings; it was the newfound confidence to ship.
Our New Multi-Layered Approach to Quality
We haven't abandoned our component-level UI tests; they remain a crucial tool. We've simply evolved our strategy to a multi-layered approach:
Layer 1 (Unit & UI Component Tests)Â Continue to provide fast, focused feedback on isolated logic and UI components against mocked data.
Layer 2 (AI-driven E2E Tests): Validates the complete, integrated user experience, ensuring all the pieces of the puzzle—mobile app, backend services, and infrastructure—work together as intended.
What's Next
Our journey isn't over. We're now focused on making this layer even more robust by developing best practices for test creation and exploring deeper integration, expand our test suite, separate test case data, while chaining test scenarios, speed up execution, expanding our test coverage on support range of OS versions, and adding automated accessibility (A11y) checks to our flows and eventually reporting bug to Jira directly.
By embracing a tool that tests the real user experience, we didn't just get faster or more efficient—we got safer. We now have a robust, multi-layered approach to quality that gives our developers the confidence to innovate at the pace our business demands.