One of the major problems the client was facing were slow and cumbersome deployment and scaling procedures. Our task was to migrate the system to the Microsoft Azure cloud. We used tools such as Kubernetes, Terraform, and Helm we:
Managed to cut deployment times from weeks to hours and scale the system within minutes.
Reduced the risk of failure for the existing deployment process by introducing a continuous delivery pipeline that can be run for each Pull Request.
Scaling in the cloud
Moving to the cloud enables the client to cut costs by leveraging low priority VMs and auto-scaling. A huge part of our work was to replace the client’s custom solutions with globally acclaimed software such as JFrog Artifactory and HashiCorp Vault. This will open the way to make the project open-sourced which is our goal for the future.
Automated testing pipelines
In such an extensive system testing is crucial but also complicated. Our client had a bunch of e2e test cases that were performed manually during each release cycle. We were responsible not only for automating those tests, but also for creating a test framework that was supposed to lay foundations for further testing.
Over fifty implemented test cases are run multiple times a day as a part of our Jenkins pipeline.
It wasn’t feasible before, as the client’s QA team was performing test steps manually for hours on just one dedicated environment. We integrated our framework with the client’s private cloud so that it enables everyone to compose an environment with all the necessary components like MongoDB replica sets, brokers, and caches for each test. Therefore, the tests can be run in parallel and are independent of each other. It makes the results much more predictable.
We are now able to test our releases much quicker (saving up to a few days on each of them) and at the same time achieve better test coverage.
The final result - Easily scalable system in the cloud
We helped with the evolution of the distributed system which consists of many components including read, write, replication brokers, several cache layers, databases with terabytes of data, and a couple of thousands of computing nodes. We managed to greatly improve test coverage, deployment times, scaling and adoption of state of the art technical solutions which will enable further improvements to the whole system.