End-to-end ML pipeline: ingest to transformation to machine learning to model deployment

Leveraging the capabilities of the cloud for Machine Learning is yet another step to develop and deliver complex models faster.

recommendations-min
Industry
Retail
Technology
PySpark, Tensorflow
Scope
Data Ingestion, Feature generation, Model Training, Model Deployment and Serving
Providing personalized recommendations from the terabytes of raw data to scalable services for the global retailer was a real challenge we dealt with.
The need for full automatization

The client’s Data Science department wished to introduce a fully automated end-to-end solution to deliver recommendations to their webpage. Automatization should be achieved in a number of areas:

  • Data Ingestion
  • Feature generation
  • Model Training
  • Model Deployment and Serving

The existing solution was built on semi-automatic processes obstructing the delivery of new solutions of the Data Science department to their end-client.

 

image for article: End-to-end ML pipeline: ingest to transformation to machine learning to model deployment

The solution – Fully automated end-to-end process
1.

To achieve improvements, we decided to approach the problem with decoupled components methodology. The small code pipelines representing single data transformation, feature or model were easily understandable, testable, and extendable. We could use libraries like PySpark for Big Data processing and Tensorflow for machine learning only where applicable.

2.

To maintain control over the growing amount of pipelines, we proposed a composable configuration. Thanks to that, we enabled sharing the common configuration between a number of environments which makes scalability and productionization easier.

3.

We constructed a number of common building blocks extracting complex logic out of the Data Science code and ensuring common behavior across decoupled modules. A prominent example would be a mechanism for validation of data which does not obstruct the business logic anymore.

4.

To achieve the best results, we cooperated closely with the client’s Data Science team to build a solution integrating well in their ecosystems. We continuously support the adoption of best practices and build the solution having top engineering quality in mind and using unit and acceptance tests, static type checking, linting, code reviews, and continuous integration.

The results
30+
productionized machine learning pipelines
icon
5+
models delivered
icon
Fully automated end-to-end process: the new model can be delivered as often as every day and served on-demand, saving time and resources of the team while providing the most accurate recommendations available.
What the future holds

Leveraging the capabilities of the cloud for Machine Learning is yet another step to develop and deliver complex models faster. Such solutions, however, come with the complexity that must be tamed with the well-built architecture, set of guidelines, and building a common understanding between engineers and data scientists.

image for article: End-to-end ML pipeline: ingest to transformation to machine learning to model deployment

What can we do for you?
  • We can help you deliver end to end machine learning pipelines: from the raw data to the models served to your clients
  • We deliver state-of-the-art engineering solutions for machine learning, including expertise in Spark and machine learning engineering
  • We can bring your solution based on Python and Hadoop to scale
  • We can continuously deliver assistance to your team to improve the efficiency of existing solutions and processes
We will provide you with the best ML solutions
Contact our experts!

"*" indicates required fields

If you click the “Send” button you agree to the privacy policy. Your personal data given in the contact form above will be processed for purposes of answering your inquiry and for any further correspondence regarding this inquiry. The controller of your personal data is VirtusLab Sp. z o.o. For more information, see our Privacy Policy