With the rise of Large Language Models (LLMs), software testing is progressing towards increased automation of testing activities. What about AI test automation? Although current test automation frameworks automate test execution, the future converges towards LLM agents supervising these frameworks. As discussed in a previous article, one of our long-term goals with Gravity is to “automate automation.”
Is your Test Automation Framework ready for the AI Era?
Test automation frameworks tend to be analyzed, configured, or driven by autonomous algorithms. We recently took part in a research project, which raised concerns about the major frameworks’ adequacy in adapting to novel AI-powered testing practices.
Our attempt to optimize a continuous integration process with Machine Learning
In a Continuous Integrating (CI) pipeline, developers’ changes get merged into the main code base daily. Tests get executed at different levels as soon as a change is submitted. One crucial aspect of CI being effective concerns fast feedback between the code commit and test execution reports. Test case prioritization is about informing developers of any errors in the code as soon as possible. So, they can fix them before making more changes. Reducing the time it takes to run tests is also vital. When certain tests often fail, it’s necessary to run them in priority to accelerate the feedback.
Most test case prioritization methods fall into one of these categories:
- Prioritize based on test case diversity: Ensuring a varied mix of test cases can speed up detecting faults. This can be achieved by executing the test case the “most different” to the ones that have been already executed. The main principle is to cover the diverse ways the system is used quickly.
- Prioritize using execution history: The goal is to predict each test case’s verdict based on their execution history. The assumption is that a test case that has frequently identified a fault in the past will be more likely to deliver an unfavorable verdict in the future.
- Prioritize according to code changes and code coverage: the test cases that cover the most code are more inclined to detect a fault. If code coverage is combined with code changes, it is also possible to execute first the test cases that cover the most changed code. However, these approaches are not always practical and cannot be applied to all test types (e.g. system/e2e tests).
Most of the methods that fit into the above categories rely on a specific type of information. There is another category. It’s the machine learning-based methods, that can merge all data sources (diversity, history, coverage) to take advantage of their benefits.
Test case prioritization by machine learning
The use of machine learning for test case prioritization allows for integrating extensive and diverse data sources to identify intricate relationships and forecast the outcome of each test. At Smartesting, we developed a test case prioritization service fueled by machine learning. After each continuous integration cycle, it analyzes multiple data sources:
- test execution history of each test,
- content of the latest commits,
- text proximity between test cases and changed code,
… And gives a sorted list of test cases. The service can detect failures at a much faster rate than the standard, basic ranking.
Reluctance to change among test automation frameworks developers
Regardless of which method is used to prioritize, the main focus when creating a test prioritization service is to make sure data is gathered at each continuous integration cycle. Two kinds of data must be captured and assembled for prioritization:
- data that is collected before running tests, such as code changes,
- data that is collected following test execution, such as test results or execution duration.
The first kind of data participates in prioritizing test cases for the current cycle. The second is used for future cycles (for instance, the result history). To integrate the test prioritization into our CI pipeline, we developed a prioritization service based on an HTTP API to store collected data and provide test ranking thanks to a trained machine learning model.
With enough gathered data, and upon receiving a test execution order from the prioritization service, the last step is to execute the tests in the received order. This step is easy to do with manual tests (the tester performs each test w.r.t the execution order). But it becomes much more challenging with automated test suites. Indeed, test automation frameworks usually don’t let users choose the order in which tests are executed. This is because the developers of these tools want to ensure that each test is independent of the others, so they don’t let users pick the execution order.
There are ways to work around the problem in some instances. Despite being mostly dirty fixes, such as a Playwright developer advising to rename test files to execute them in the needed order. For Cypress, reordering tests (only by file) is possible, but it requires diving into the Cypress configuration file.
Conclusion: What did we learn about test automation and AI?
Our experience in creating a test prioritization service showed us that the rise of machine learning in testing presents a challenge for test automation framework developers. They seem hesitant if not reluctant to adapt to these new methods. Things will only get worse when LLM-based techniques start to pop up, with autonomous LLM agents trying to drive reluctant test automation frameworks.
Test automation frameworks are not ready for the big changes that Generative AI is about to have on software testing. Given the reluctance of tool developers to adapt to these new challenges, one can wonder if these tools will be able to make it, or if new, more easily automated tools will eventually emerge.
If you’d like to learn more about the contribution of artificial intelligence to software testing, you can book a Gravity demo to discover what we’re working on. Gravity is currently in beta, and we value your feedback.
Smoke Testing, Sanity Testing, and Regression testing: the Trifecta Understanding the differences between Smoke, Sanity, and Regression Testing is crucial…
Historically, software testing was confined to pre-production environments to validate the application against the written requirements. (also known as requirement-based…
The surge of Large Language Models (LLM) like GPT has undoubtedly revolutionized the way we approach natural language understanding and…