It goes without saying that in the project so big as ours, there are tons of tests and regressions created for the code. It is necessary to ensure code correctness and compatibility with several platforms, but each additional test to run increases the time PR waits for being merged. To improve that we developed several optimizations for the process of test selection and execution.
The solution - highly optimized testing framework
We created a system that allows us to run distributed tests on multiple lightweight containers with results aggregated at the end. We do not need to run all tests sequentially and we can utilize our machines better. That results in a few times faster test execution with 1-2h saved on every PR.
We implemented fine-grained test caching able to track changes in individual classes and their impact on individual tests. Thanks to it we can run only relevant tests which in many cases can limit the number of tests run by up to 70%.
We built flaky tests tracking and rerunning framework reducing our time on investigating flakiness but at the same time collecting detailed information on places where extra care is required. Besides improvements in the quality department it also resulted in a few hours per day saved on the investigation of flaky tests.
We switched to fully programmatic definitions of all our jobs so we can change them quickly and which much higher confidence.
We introduced scoped tests that allow us to run only relevant tests based on affected projects, which often translates to 50% fewer tests that need to be run.
We developed a special workflow for flagging and improving flaky tests and quick discovery of problems impacting the general population. Thanks to it every pull-request author can quickly identify and address problems plaguing his or her pull-request, and significantly decrease the time required to merge it.
The final result
We completely rebuilt our testing infrastructure making it the order of magnitude more robust. We can now run more relevant tests with fewer resources needed, and quicker. It increased developers’ productivity, code quality, while at the same time reduced the costs and time to market.
We also created efficient workflows and support processes around the whole process, to make sure that great efficiency is always at its peak.