The challenge
The client’s Data Science Team is responsible for gathering insights from complex data sets. They were dealing with the limitations of a computing cluster, requiring numerous manual processes at the beginning of each project. The team’s flexibility was constrained by the costly and time-consuming process of implementing changes in the cluster, which left them with limited control over their resources. As a result, the team had to rely on their own resources to learn how to use their preferred tools and create a considerable amount of automation to replicate any processes. This had a negative impact on several areas of the company:
- The product team, in particular, faced high costs due to the extended duration of each procedure. In addition, managing different solutions and the need for individuals to acquire competencies for each tool posed significant risks, making it extremely challenging for team members to switch teams or projects.
- The management team had a lack of control over the technology that the team was using, which resulted in difficulty with scaling.
- The Data Scientists spent the time they wanted to dedicate to research on these processes, so it greatly limited their availability and raised costs.
Overall, there was a lack of a clear community among developers due to the chaos of information belonging solely to the developers working on a particular project.
The solution
VirtusLab’s Data Team automated onboarding processes for each project on the cloud. The priority was a solution with the ability to replicate cloud infrastructure across different projects in the organisation. The first step was conducting an analysis comparing the resources of each data scientist in the team. This enabled us to get a better understanding of the team’s competencies and needs. The needs varied from each other.
VirtusLab successfully developed comprehensive automation for the DS team to use without difficulty, while also ensuring that adding new modules would be a simple process. Our Cloud and Data department were able to build a core foundation that primarily consists of Terraform, as well as Terragrunt. We provided the client’s team with tools they are familiar with to ensure a smooth deployment. The cooperation with the client still continues, as becoming a data-driven solution is an ongoing process.
The results
The solution is a foundation that is flexible enough to work with any technology or component added on top of it. This enables the tool to fit into any project and meet any needs of the team while having a unified element to it, eliminating the challenges the client faced and increasing the Data Team’s efficiency.
- The speed of starting a new project increased dramatically. On average, it went from 2 weeks to 1 sprint for projects in an already existing domain and from 1 month to 1 sprint when it comes to projects in a new domain.
- Moving Data Scientists between projects is simple since the tools being used are the same department-wise.
- Sustainable growth is possible due to the fact that establishing costs is easier with a known tool.
- The stability is greatly increased because of using trusted technology which has been tested and used before in previous projects.
- Fixing bugs is easier since everyone on the team is familiar with the technology and can work together on solving any problem that arises. This type of cooperation also contributes to building a community among the developers.