The challenge
The forecasting framework achieved significant usability within the client’s organization. Because of its success, the retailer decided to migrate the platform to other organizations within the group. The migration aimed to enhance support for day-to-day forecasting and research activities related to mobile products, including phones, subscriptions, and accessories.
The scalable framework itself was a large project, written in pySpark on a legacy Hadoop platform. It encompassed:
- A pySpark project with tens of data processing pipelines.
- A set of machine learning models featuring custom configuration and hyperparameter tuning code, running in parallel through Spark User Defined Functions (UDFs).
- Jenkins CI/CD pipelines implemented in Groovy for automation.
- Management of the Python dependency environment using Conda.
- Jupyter notebooks provided for research purposes, leveraging the forecasting framework code.
Our client aimed to migrate this large-scale project to a completely new Azure Databricks environment. This endeavor demanded to:
- Revamp the framework for use in a different environment, moving from Hadoop to Azure Databricks.
- Make the framework domain and company agnostic.
- Move selected data pipelines and all libraries to Azure Databricks.
- Enable merging central Hadoop cluster datasets with new custom datasets from external sources.
- Migrate Jenkins CI/CD pipelines to Azure DevOps.
- Empower Data Scientists to perform research using notebooks in the new environment, accessing the code as a library.
- Ensure strong security measures to protect access to code, data, and artifacts in the new environment.
Relying on trust as the foundation of our partnership, our client reached out to VirtusLab for assistance.
The solution
VirtusLab refactored the framework to suit new organisations and incorporated the option to integrate additional data sources in two major steps: Refactoring the code for migration and preparing the infrastructure for seamless framework execution.
Code preparation
We enhanced the versatility of the forecasting framework by removing domain-dependent code elements like column and table names, as well as specific config settings:
- Removing scheduler-dependent code
- Revamping the definition of defining clusters to make them applicable in any environment
This made it adaptable for implementation in various organizations. We also extracted Hadoop-specific components from the code, facilitating its execution in Azure Databricks and other environments. For instance, we extracted the Oozie workflow generation used for Hadoop deployments.
Infrastructure preparation
We helped to set up the infrastructure on Azure Databricks to enable the smooth execution of the framework. This involved:
- Creating new Databricks Dev and Prod compute clusters with preinstalled environments
- Automating updates for the forecasting framework and Python dependencies through Conda
- Migrating all CI/CD pipelines to Azure DevOps while hosting the framework as Azure Artifacts
- Integrating the preinstalled Forecasting Framework into Azure Data Factory for new projects
- Enabling the use of the framework for research via notebooks
- Implementing regular data exports from the Hadoop cluster to the new Azure Cloud
- Employing Azure Key Vault for secure secrets management.
The results
VirtusLab deployed the generalized forecasting framework in both Hadoop and Azure Databricks. Our client’s subsidiaries used the framework within six months, following its successful restructuring and implementation in the new Azure Databricks environment. They also gained:
- Customized Forecast Generation – Regularly generated forecasts utilizing bespoke ML and statistical models tailored for the new domains, incorporating their individual models.
- Migration of Engineering Best Practices – Successfully transitioned all best practices such as CI/CD, tests, code reviews, and the creation of separate DEV and PROD environments to the new ecosystems and teams.
- Immediate Project Implementation – Promptly implemented five distinct new projects in the updated environment using the migrated framework.
The tech-stack
Cloud environment: DataFactory, Blob Storage, Artifacts, Key Vault, Azure DevOps, Databricks
Languages: Conda, python
Frameworks: Spark
The tech-stack
Cloud environment
Languages
Frameworks
Partner flexibly with VirtusLab
Use one or a combination of engagement models to suit your needs.
Take the first step to a sustained competitive edge for your business
Let's connectVirtusLab's work has met the mark several times over, and their latest project is no exception. The team is efficient, hard-working, and trustworthy. Customers can expect a proactive team that drives results.
VirtusLab's engineers are truly Strapi extensions experts. Their knowledge and expertise in the area of Strapi plugins gave us the opportunity to lift our multi-brand CMS implementation to a different level.
VirtusLab has been an incredible partner since the early development of Scala 3, essential to a mature and stable Scala 3 ecosystem.
VirtusLab's strength is its knowledge of the latest trends and technologies for creating UIs and its ability to design complex applications. The VirtusLab team's in-depth knowledge, understanding, and experience of MIS systems have been invaluable to us in developing our product. The team is professional and delivers on time – we greatly appreciated this efficiency when working with them.