We provide price optimisation solutions in close collaboration with a major UK retailer's data science team. Together, we build projects to enable quick exploration and productization of the ML models and respective optimisation algorithms in a hybrid-cloud environment. The end goal is to provide APIs for the optimiser to solve pricing class problems across multiple business domains.
Use pySpark on Hadoop to load and transform Big Data to produce smaller, significant features used in modelling and analytics.
Provide the hybrid-cloud environment to support model development from beginning to full productization.
Self-dependently select the best architectural patterns to solve business problems.
Build robust code in the cloud and on-prem to bring models fast and reliably to production
Build clear DevOps cultures and solutions for reproducible patterns in multiple business domains.
ETL: pySpark, Oozie & Airflow
Platform: Hadoop & Azure (incl. Azure ML)
CI/CD: Jenkins & Azure DevOps
Git @ Github
There are many pricing problems in the global retailer world: many of these problems share similar structures and constraints. Due to this, building robust solutions reusable across multiple challenges, leveraging on-prem cluster and cloud environments and ensuring the top quality while maintaining quick iterations is our bread and butter.
The core engineering team consists of 4-5 people in Poland. We collaborate closely with the client’s product management and data science team. Additionally, we collaborate with engineers that work for the client directly.
In collaboration with a prominent UK-based retailer's data science team, we create solutions for personalisation challenges, such as web-page product recommendations. The main objective is to deliver well-engineered components that enable work on modelling and experimentation in a hybrid-cloud environment. The ultimate goal is to provide an end-to-end experience stretching from pure data, training different models, and creating APIs to deliver self-improving models.
Work on components to support Big Data processing on an on-prem Hadoop cluster.
Ensure that high-quality code is delivered.
Choose the best architecture and tools for the business requirement.
Push the data to the cloud and select the best way to store and process it.
Build robust code and architecture to allow easy productization of data scientists' models with minimal time and effort.
Enhance monitoring capabilities and reliability.
Python: (3.7+) with complete typing
PySpark: base for our ETL
Azure (incl. Azure ML): model training and serving
Tensorflow, pandas, scikit-learn, scipy, nltk
Jenkins, Azure DevOps, Terraform, Git @ Github
Personalisation models use tens of terabytes of input data. We leverage the Hadoop on-prem cluster to extract significant business features and transfer them to the cloud. We adopted the hybrid-cloud model to iterate faster on the business use-cases given to us by data scientists.
The team consists of a technical lead and 3-4 engineers in Poland. These people collaborate closely with the client's UK-based data science unit.
What & How
We value a good understanding of the best code practices and at the same time friendly and open atmosphere at work. You’ll be working in a cross-functional team for a major global retailer, shaping solutions for hybrid cloud infrastructure (Hadoop, Apache Spark, Azure), automated CI/CD pipelines, and IaaC solutions. We value rapid delivery and use either a Scrum or a Kanban approach. We peer-review 100% of our code and yes, we test the code thoroughly. Last, but not least, we cooperate with each other and value teamwork. We believe that good work-life balance is important for your development and satisfaction and we do highly value your time, passion, and dedication.
What we expect
We are looking for a team player who:
- has proven experience in Python (knowledge of JVM languages is a plus),
- has at least basic knowledge of (py)Spark and Hadoop stack,
- has experience with Linux environments,
- maintains high code quality and is able to manage software complexity by good design choices and proper testing,
- understands the best practices principles and has knowledge of fundamental data structures and algorithms,
- has fluency in English language, as seamless communication is one of the most important aspects of software projects,
- last, but not least, is a team player (happy to learn, help, share responsibilities and contribute to the team success.