The need for full automatization
The client’s Data Science department wished to introduce a fully automated end-to-end solution to deliver recommendations to their webpage. Automatization should be achieved in a number of areas:
- Data Ingestion
- Feature generation
- Model Training
- Model Deployment and Serving
The existing solution was built on semi-automatic processes obstructing the delivery of new solutions of the Data Science department to their end-client.
The solution - Fully automated end-to-end process
To achieve improvements, we decided to approach the problem with decoupled components methodology. The small code pipelines representing single data transformation, feature or model were easily understandable, testable, and extendable. We could use libraries like PySpark for Big Data processing and Tensorflow for machine learning only where applicable.
To maintain control over the growing amount of pipelines, we proposed a composable configuration. Thanks to that, we enabled sharing the common configuration between a number of environments which makes scalability and productionization easier.
We constructed a number of common building blocks extracting complex logic out of the Data Science code and ensuring common behavior across decoupled modules. A prominent example would be a mechanism for validation of data which does not obstruct the business logic anymore.
To achieve the best results, we cooperated closely with the client’s Data Science team to build a solution integrating well in their ecosystems. We continuously support the adoption of best practices and build the solution having top engineering quality in mind and using unit and acceptance tests, static type checking, linting, code reviews, and continuous integration.
What the future holds
Leveraging the capabilities of the cloud for Machine Learning is yet another step to develop and deliver complex models faster. Such solutions, however, come with the complexity that must be tamed with the well-built architecture, set of guidelines, and building a common understanding between engineers and data scientists.
What can we do for you?
- We can help you deliver end to end machine learning pipelines: from the raw data to the models served to your clients
- We deliver state-of-the-art engineering solutionsfor machine learning, including expertise in Spark and machine learning engineering
- We can bring your solution based on Python and Hadoop to scale
- We can continuously deliver assistance to your team to improve the efficiency of existing solutions and processes