End-to-end ML pipeline: ingest to transformation to machine learning to model deployment

Leveraging the capabilities of the cloud for Machine Learning is yet another step to develop and deliver complex models faster.

PySpark, Tensorflow
Data Ingestion, Feature generation, Model Training, Model Deployment and Serving
Providing personalized recommendations from the terabytes of raw data to scalable services for the global retailer was a real challenge we dealt with.

The need for full automatization

The client’s Data Science department wished to introduce a fully automated end-to-end solution to deliver recommendations to their webpage. Automatization should be achieved in a number of areas:

  • Data Ingestion
  • Feature generation
  • Model Training
  • Model Deployment and Serving

The existing solution was built on semi-automatic processes obstructing the delivery of new solutions of the Data Science department to their end-client.


The solution - Fully automated end-to-end process


To achieve improvements, we decided to approach the problem with decoupled components methodology. The small code pipelines representing single data transformation, feature or model were easily understandable, testable, and extendable. We could use libraries like PySpark for Big Data processing and Tensorflow for machine learning only where applicable.


To maintain control over the growing amount of pipelines, we proposed a composable configuration. Thanks to that, we enabled sharing the common configuration between a number of environments which makes scalability and productionization easier.


We constructed a number of common building blocks extracting complex logic out of the Data Science code and ensuring common behavior across decoupled modules. A prominent example would be a mechanism for validation of data which does not obstruct the business logic anymore.


To achieve the best results, we cooperated closely with the client’s Data Science team to build a solution integrating well in their ecosystems. We continuously support the adoption of best practices and build the solution having top engineering quality in mind and using unit and acceptance tests, static type checking, linting, code reviews, and continuous integration.

The results

productionized machine learning pipelines
models delivered
Fully automated end-to-end process: the new model can be delivered as often as every day and served on-demand, saving time and resources of the team while providing the most accurate recommendations available.

What the future holds

Leveraging the capabilities of the cloud for Machine Learning is yet another step to develop and deliver complex models faster. Such solutions, however, come with the complexity that must be tamed with the well-built architecture, set of guidelines, and building a common understanding between engineers and data scientists.


What can we do for you?

  • We can help you deliver end to end machine learning pipelines: from the raw data to the models served to your clients
  • We deliver state-of-the-art engineering solutions for machine learning, including expertise in Spark and machine learning engineering
  • We can bring your solution based on Python and Hadoop to scale
  • We can continuously deliver assistance to your team to improve the efficiency of existing solutions and processes

We will provide you with the best ML solutions

Contact our experts!