Let's connect
Let's connect

Automatization of the end-to-end Machine Learning pipeline

9 minutes read

Our client, a global retailer, aimed to automate their Machine Learning (ML) pipeline to leverage user behavior data and gain a competitive edge by predicting customer needs more accurately. To do so, the retailer lacked the expertise in Big Data, software development, and data transformation. They turned to VirtusLab. They created a new end-to-end ML process using Python language with TensorFlow and PySpark frameworks. This new system helped our client to deliver a more personalized experience on their website while saving time and resources.

Download this success story as PDF

Print it out, take it with you to read later, or share it with your peers.Free download

The challenge

Our client aimed to automate their Machine Learning pipeline. It would allow them to take full advantage of existing customer behavior data and deliver a more personalized experience on their website. The client's Data Science team was already developing ML models but lacked the engineering expertise and manpower necessary to scale up their production. 

In order to maintain the project timeline of delivering new solutions to the end-user the existing pipeline had to be fully automated. This called for a holistic approach in multiple areas:

  • Data ingestion
  • Feature generation
  • Model training 
  • Model deployment and serving

Having previously collaborated with the client, VirtusLab assembled a dedicated team proficient in Big Data, software development, and data transformation. 

The solution

VirtusLab built a fully automated end-to-end Machine Learning process that delivers new models on demand. They created small, manageable code pipelines, using decoupled component methodology. VirtusLab designed the pipelines, making it easier to understand individual data transformations, features, or models. They leveraged PySpark and TensorFlow frameworks only where necessary.

VirtusLab introduced a composable configuration for ease of production and scalability. They also developed common building blocks to extract complex logic and ensure consistency across modules. Overall, they built a fully automated end-to-end Machine Learning process that delivers new models on demand.

Working in collaboration with the client's Data Science team, rigorous engineering practices were integrated, including:

  • Unit and acceptance testing 
  • Static type checking
  • Linting 
  • Code reviews 
  • Continuous integration and deployment (CICD)

The results

VirtusLab’s solution helped the client's Data Science team successfully productionise 6 Machine Learning models and over 30 pipelines. 

Automated ML pipelines enabled our client to better personalize the online shopping experience. Our client introduced tailored product recommendations, customized promotions, and dynamic pricing that increased customer satisfaction and loyalty.

VirtusLab's small code pipelines enhanced the client's ability to manage complex data pipelines and Machine Learning processes. Using decoupled component methodology ensured that the system is:

  • Robust 
  • Adaptable 
  • Capable of handling the demands of large-scale data processing and model deployment

VirtusLab also streamlined the process of monitoring the performance of ML models, making it easier to address issues as they arose. For example, the client is now effectively counteracting problems like data drift — an unexpected change in input data that can affect the quality of the model's result.

Tech stack

Programming language: Python

Libraries: Pandas 

Big Data Processing Framework: PySpark, Hadoop, Hive

Machine Learning Framework: TensorFlow

Cloud storage: S3

Other tools: Jenkins, Conda, mypy, Jinja, Oozie

Take the first step to a sustained competitive edge for your business

Let's connect

VirtusLab's work has met the mark several times over, and their latest project is no exception. The team is efficient, hard-working, and trustworthy. Customers can expect a proactive team that drives results.

Stephen Rooke
Stephen RookeDirector of Software Development @ Extreme Reach

VirtusLab's engineers are truly Strapi extensions experts. Their knowledge and expertise in the area of Strapi plugins gave us the opportunity to lift our multi-brand CMS implementation to a different level.

facile logo
Leonardo PoddaEngineering Manager @ Facile.it

VirtusLab has been an incredible partner since the early development of Scala 3, essential to a mature and stable Scala 3 ecosystem.

Martin OderskyHead of Programming Research Group @ EPFL

The VirtusLab team's in-depth knowledge, understanding, and experience of technology have been invaluable to us in developing our product. The team is professional and delivers on time – we greatly appreciated this efficiency when working with them.

Michael GrantDirector of Development @ Cyber Sec Company