How to Restart and Recover Apps During Mobile Test Automation Runs

Christian Schiller
3. Sept. 2025
13 Min. Lesezeit

Problem: Flaky Tests After Crashes and Relaunches

Restarting a mobile app in the middle of a test run – for example, to validate that a mini-player still works after a crash – is possible but historically very tricky. Such crash recovery scenarios are a notorious source of flaky tests. The core problem is that traditional automation frameworks (Appium, Espresso, XCUITest) aren’t built to seamlessly handle an app process dying and coming back. Tests often lose the app’s state or even break the test session when an app is restarted unexpectedly. In CI pipelines, a hard reset or relaunch can cause the run to hang or fail, especially if the framework isn’t explicitly instructed how to recover. The result: validating features like media playback mini-players after a crash becomes unreliable and brittle in staging or device cloud environments.

Why Crash/Relaunch Scenarios Cause Flakiness

Several factors make crash recovery tests unstable:

Lost Application State: By default, restarting an app often resets it. Unless you configure “no reset” behavior, a relaunch might wipe in-memory state or even user data. For example, one tester found that using Appium’s driver.closeApp() then driver.launchApp() put the app back in a clean state (user logged out, settings cleared). This defeats the purpose of checking post-crash behavior, since the app isn’t in the same state it was before crashing.

Broken Test Sessions: Many frameworks treat an app crash as a fatal error. In Espresso or XCUITest, if the app under test crashes, the instrumentation run typically aborts immediately. The test can’t resume because the framework lost its connection to the app. In Appium (which runs outside the app), the session can survive a bit longer, but any commands will fail until you manually start a new app instance. Without special handling, the test will simply error out. In fact, Android test engineers commonly use the Android Test Orchestrator to mitigate this – it runs each test in a separate instrumentation instance so that “occasional crashes [don’t] stop the entire test suite.” Orchestrator can isolate a crash to one test case, but it doesn’t magically continue the same test after a crash – you still need a plan to re-initialize the app state.

Fragile Re-initialization: Even if you attempt to catch and recover from a crash, doing so is fragile. You might script the app to relaunch and navigate back to where it was, but timing issues and unexpected dialogs can cause flakiness. For instance, a crash on Android often triggers an OS dialog (“App has stopped” or a prompt to send feedback). If your script isn’t prepared to dismiss that, the relaunch may stall. Similarly, after relaunch, the app might show a splash screen or login prompt due to lost session – requiring extra steps to get back to the mini-player screen. All these moving parts create opportunities for tests to fail intermittently.

Environmental Variability: In device cloud CI environments, crashes can be even more disruptive. An emulator or device might slow down or disconnect after an app crash, or the automation server might treat the crash as an end-of-session. Without robust error handling, the test may hang (common symptom: the test “freezes” after a crash). Network conditions or backend state in staging can also affect whether the app can seamlessly log back in after a crash, making outcomes inconsistent.

Common Approaches to App Restart Recovery (Pros & Cons)

Engineering teams have tried various workarounds to handle app restarts in tests. Each has benefits and drawbacks:

Scripted Relaunch in the Test: In frameworks like Appium, you can manually code the restart logic. For example, you might call driver.terminateApp() (or an ADB force-stop command) to simulate a crash, then driver.launchApp() to reopen it. This can be done within a single test script. Pro: Keeps the flow in one test, and Appium can continue if done correctly (especially with noReset=true to preserve app data). Con: It’s easy to get wrong – as noted, a naive approach might wipe state. You must ensure the session is still valid or create a new Appium session on the fly, which complicates test code. Timing is critical: you often need to insert waits for the app to fully restart. Even then, any surprise (like a permissions popup or crash report dialog) can break the script.

Using Checkpoints or Multiple Tests: Another approach is to split the scenario into two phases: pre-crash and post-crash. For example, one test case brings the app to the desired pre-crash state (e.g. mini-player showing), then intentionally crashes the app. The next test case (with the help of a test orchestrator or test framework configuration) starts the app fresh and validates the mini-player recovery state. Pro: This isolates crashes so they don’t abort the entire suite. Android’s Test Orchestrator, for instance, will continue with the next test even if the previous one crashed. Con: You lose the continuity – the second test is essentially launching a new instance, which may not precisely mimic an in vivo crash recovery. Any transient state (like an ongoing audio stream) may be gone. You might need to implement persistent storage of state or use deep links to simulate “resume where we left off,” which adds complexity and still isn’t truly the same as a single flow. On iOS, there’s no direct equivalent of orchestrator; you’d rely on the test suite continuing after a failure (XCTest can continue after failures for separate tests, but not within one test method).

Always Clean Restart (Avoiding the Scenario): Some teams simply avoid mid-test restarts by structuring tests to always start from a clean app launch. They might simulate a crash by terminating the app at test end, then begin a new test to verify something. Pro: Simpler test logic – each test starts fresh, reducing inter-dependencies. Con: This doesn’t truly answer the original question of validating recovery during a run. It also misses bugs that occur when an app crashes and resumes within a user session. Essentially, it sidesteps the hard problem, which may be acceptable for certain pipelines but won’t help catch crash-recovery issues.

Custom In-App State Preservation: In some cases, developers add features to help testers with this (e.g., writing the app’s state to disk periodically or on crash, then auto-restoring it on next launch). If such features exist (like a “restore last session” prompt), testers can leverage them. Pro: The app itself helps maintain continuity, which a test can then verify (for example, a media app might remember the last song and position after a crash). Con: Few apps have this, and implementing it just for tests may not be feasible. Even with it, the test still needs to handle the relaunch and any UI flows to confirm the state was restored.

In summary, traditional methods either require a lot of manual scripting (which is brittle) or breaking the test into pieces (which doesn’t fully replicate a seamless crash recovery flow). This is where new AI-driven approaches are making a difference.

GPT Driver’s AI-Based Approach to Reliable Recovery

GPT Driver is an example of an AI-enhanced test automation solution (with a no-code studio and low-code SDK) that improves reliability in crash/relaunch scenarios. It introduces a few capabilities that directly tackle the flakiness:

Natural Language “Restart” Command: Instead of writing low-level code to force-stop and relaunch the app, you can simply use a high-level command. For instance, in a GPT Driver test script you might have a step that says: restart – which “closes and reopens the app, resetting it to its launch state.” This is a deterministic step that the platform provides. Under the hood it ensures the app is relaunched cleanly (similar to a user tapping the app icon again) without wiping the user data. This drastically simplifies test authoring: any team member could insert a restart step (since it’s in plain English).

Resume from Checkpoints: GPT Driver allows defining checkpoints or recovery steps in the test flow. For example, you could instruct: “If the app crashes at this point, restart and continue.” These can be set up as conditional steps or as part of the test’s logic. The platform’s AI keeps track of the app state so it can return to the right context after a relaunch. In practice, this means you don’t have to manually preserve session IDs or element references – the next steps will re-query the app’s UI fresh. Combined with the no-code approach, this provides a deterministic recovery flow (you decide where to resume) without the usual fragility of hand-coded re-inits.

Adaptive Self-Healing: Beyond explicit commands, GPT Driver’s AI agent is continuously monitoring the app’s UI and device state. If something unexpected happens – say the app suddenly disappears (crash to home screen) or a system dialog pops up – the AI can react. It will automatically handle many “unexpected situations like pop-ups or minor changes in copy and layout”, reducing flakiness. In a crash scenario, this might mean the AI detects the crash dialog and closes it, then issues the restart command on its own to keep the test going. Essentially, the test doesn’t have to explicitly anticipate every failure mode; the AI provides a safety net to keep the workflow on track.

Integration with Device Clouds: GPT Driver’s system is built to work across different environments (it supports Appium, Espresso, and XCUITest under the hood via an SDK). This means whether you run on a local emulator or a cloud device farm, it can perform the same recovery steps. The adaptive nature helps in cloud runs where latency or device performance might vary – for example, if a device is slow to reboot the app, GPT Driver will wait appropriately. Traditional scripts often have fixed timeouts that might be too short or too long on different devices, whereas an AI-driven approach can adjust based on actual app responses. The outcome is more consistent test passes on services like BrowserStack or Sauce Labs, where flaky crash recovery tests would otherwise break the CI run.

Reduced Maintenance: By handling the messy parts (state tracking, waiting, dismissing pop-ups), an AI solution significantly cuts down on test maintenance. If the app’s crash behavior changes (for instance, a new “restore session?” prompt gets added in an update), GPT Driver’s vision and language understanding can adapt to it without you rewriting the test. The platform was designed to “reduce test flakiness” and maintenance effort by self-healing around minor UI changes, and this extends to crash recovery flows as well. In contrast, a purely coded approach might break until you update the script to handle the new prompt.

In short, GPT Driver provides a higher-level abstraction for crash and relaunch scenarios. You describe what you want (e.g. “restart the app and confirm the mini-player is showing the last played track”), and the tool handles how to get there reliably. It blends deterministic steps with AI adaptability – executing direct commands instantly and only invoking AI logic when needed. This hybrid ensures tests run fast when everything is normal, but also survive aberrations like crashes.

Best Practices for Crash Recovery Testing in CI

Whether using AI tools or not, here are some practical recommendations to improve reliability of crash/relaunch tests in your continuous integration:

Deliberately Test Crash Scenarios: Don’t shy away from testing crashes just because they’re hard. Instead, design specific test cases for them. For example, if you have a mini-player feature, write a test that purposely kills the app mid-playback and then relaunches it. Using a controlled crash (like an intentional abort() or an automation command to terminate the app) can help you consistently reproduce the scenario.

Use Framework Features to Preserve State: If using Appium, leverage capabilities like noReset=true so that a relaunch doesn’t wipe app data (cookies, caches, logged-in status). On Android, you can also use startActivity to bring an app back without resetting it. Ensure that your automation session remains open if possible, or if you must create a new session, design your code to handle that seamlessly (for example, encapsulate the login routine so it can be re-run after a restart if needed).

Employ Orchestrators or Recovery Hooks: For Espresso tests, enable Android Test Orchestrator in your Gradle config – it will spawn a new Instrumentation for each test, so one crash doesn’t halt the whole suite. Similarly, on iOS, run tests as separate logical cases where applicable (XCTest will continue to the next test function even if the previous one crashed, as long as the runner itself stays alive). While this doesn’t save a crashing test, it at least confines the damage. If your framework or CI allows post-failure hooks, use them to reset the app or device and then proceed with remaining tests.

Incorporate Conditional Checks: Add logic to detect crash symptoms. For instance, after a relaunch, verify that the app is on the expected screen – if not, your test can log a warning or try an alternate path (like navigating from the home screen). In code, this might be an if check for a known element that should be present post-crash. In GPT Driver’s no-code setup, you could use a conditional step to say “if playback did not resume, tap play”. This makes tests more resilient to unpredictable post-crash states.

Manage External Dependencies: A lot of flakiness comes from external factors like network conditions, backend data, or device performance. In staging environments, ensure the test data is prepared so that a crash/relaunch doesn’t coincidentally log the user out due to session expiration on the server. Similarly, in device clouds, configure timeouts generously – a device might take longer to reopen an app after a crash, so give it a few extra seconds before failing the test. Some cloud providers also allow setting a “relaunch on crash” capability; use those if available so that the automation doesn’t fully stop on an app crash.

Use Robust Tools for Critical Flows: For particularly flaky flows (like media playback), consider using an AI-driven tool or a reliable wrapper library. For example, GPT Driver’s SDK can wrap around your existing Appium tests to catch failures and retry with AI assistance. Even if you cannot adopt it for all tests, using it for the high-value but flaky crash cases can save a lot of headache. The goal is to make sure your CI pipeline goes green when the app actually works – flakiness should not mask the real quality signal.

By following these practices, you’ll mitigate the randomness in crash recovery tests and make your pipeline more robust.

Example: Validating a Mini-Player After an App Crash

Let’s walk through a concrete example – the original question: Can we restart the app during a test to validate mini-player functionality after a crash and relaunch? Suppose we have a music streaming app with a mini-player (a small playback bar that persists at the bottom of the UI).

Traditional Approach (Scripted):

Setup: Launch the app and log in (if not already). Navigate to a song or video and start playback. Confirm that the mini-player UI is visible (e.g. showing the song title and play/pause controls).
Crash the App: Programmatically trigger a crash or force-kill the app process. In Appium, this could be done with driver.terminateApp(appId) or an adb command. In Espresso, you might not have a direct way, but you could simulate a crash by forcing an exception in the app (not straightforward). The app is now not running.
Relaunch: Still in the same test, instruct the app to relaunch. With Appium, driver.launchApp() might be used (ensuring noReset so the user is still logged in and the app doesn’t start from scratch). In a pure Espresso scenario, this step is not possible in the same test – the crash would have already stopped the test; you’d have to rely on a new test launching the app.
Post-crash Validation: Once the app UI is back up, navigate to whatever screen would now show the mini-player. It might automatically show on the home screen if the app resumes playback, or you might need to go to a “Now Playing” screen. Then verify that the mini-player is functioning – e.g., the play/pause button is in the correct state (if the app resumed the song, it might be playing; if not, the mini-player might show a paused state). You’d likely assert that the last played song’s title is displayed in the mini-player component.

In this traditional path, there are many points of failure. The test could hang at step 2 if the framework doesn’t know the app was killed. Or the relaunch (step 3) might actually start a fresh session without the user logged in, meaning the mini-player won’t appear at all because there’s no playback. The script would then need logic to log back in or restart playback – essentially duplicating what the app’s own crash-recovery would do. It’s doable with a lot of scripting, but very brittle.

AI-Driven Approach (GPT Driver):

Setup via Natural Language: The test author writes steps in GPT Driver’s studio like: “Open the app and play a song.” The AI takes care of finding the song and tapping play (or you can specify the exact song if needed). The mini-player appears once music plays.
Crash and Restart Step: The next step in plain English might be, “restart the app.” Under the hood, this triggers the GPT Driver’s restart command. The platform ensures the app is cleanly relaunched. Crucially, because GPT Driver knows this is a restart (and not a brand-new test), it can preserve context such as the fact that a song was playing. (For example, if the app itself saves the playback state, GPT Driver will simply wait for the UI to reflect it on relaunch. If the app doesn’t save state, you could instruct GPT Driver to “resume playback” after restart as an extra step.)
Adaptive Continuation: Suppose the app, when restarted, pops a dialog “Sorry, we had to restart.” The GPT Driver’s AI vision would recognize this unexpected dialog and could automatically dismiss it (this falls under handling “unexpected pop-ups” that the AI is trained for). A traditional script would have missed this unless the engineer explicitly added code for it.
Post-crash Validation: Finally, the test might say: “Check that the mini-player is visible and shows the current track.” GPT Driver’s AI will look at the screen; if the mini-player UI is present, it can read the text (using OCR or direct view hierarchy access) to verify the track name. If the mini-player wasn’t there, the AIcould even infer something went wrong and attempt a recovery (for instance, maybe it would try to navigate to the player screen or press play). In a deterministic mode, you could also explicitly tell it: “If not playing, tap play on the track again.”

The key difference is that the AI-driven flow abstracts the error handling. As a test writer, you don’t have to script every contingency. You declare the intent (restart and check mini-player), and the underlying system handles the nuances like relaunching the app, waiting for it to load, dealing with any pop-ups, and re-verifying the UI. This results in a much more reliable test. If the app properly supports crash recovery, GPT Driver will consistently validate the mini-player state without random failures. If the app has a bug (say the mini-player doesn’t come back after a crash when it should), the test will correctly catch that – rather than leaving you guessing if it was the test flaking out or a real bug.

Key Takeaways

Testing app restarts and crash recovery is challenging with traditional automation tools, but it is possible – and nowadays can even be made reliable. The flakiness that plagues such tests usually stems from lost app state and fragile test logic when an app is killed mid-run. Industry workarounds (like writing custom scripts or splitting tests with orchestrators) can partially address the issues but at the cost of increased complexity and maintenance.

AI-enhanced solutions like GPT Driver offer a promising path to make crash/relaunch tests as stable as any other test. By using high-level commands and adaptive intelligence, they eliminate much of the brittle glue code and timing issues that human engineers used to struggle with. For QA leads and senior engineers, the lesson is to embrace tools and practices that manage state and unexpected events robustly. Ensure your test suites can handle the “unhappy paths” like crashes: whether that’s by leveraging an AI-driven test agent or by diligently coding in recovery steps, don’t let those scenarios remain untested.

In direct answer to the question: Yes, it’s possible to restart an app during a test run to validate functionality (like a mini-player after a crash) – and with the right approach, it can be done reliably. The combination of careful state management (or an AI that does it for you) and strategic test design will turn flaky crash tests into solid, trustworthy checks. This means fewer false failures in CI and more confidence that your app will recover gracefully from crashes in the wild, where it matters most.