Developing a generic Big Data ingestion and transformation framework for a new, but highly-funded, UK-based, insurance company. We work across two teams: one responsible for developing the framework and extending the capabilities of the platform and another one focusing on using the framework to ingest hundred of datasets from various data sources.
There are lots of greenfield areas and possibilities to create something new, having a great effect on the whole company.
Scala, Spark (mostly Core, some SQL), AWS (EMR, EC2, S3, Codebuild)
Creating a generic Big Data framework on top of Spark able to ingest and transform hundreds of datasets from various sources.
Making Apache Spark reliable and conforming to Functional Programming style.
Automating any possible part of the development process - the less manual work the better.
Around 20 engineers (both Scala/Spark and Snowflake) in the team, of which about a half are from well known consultancies.
Most of the Engineers have Scala background and put a lot of focus on Functional Programming - some of the consultants have been in the Scala community for more than 10 years.
We are developing an analytical platform (e-commerce industry), operating on a cluster of hundreds of machines, with hundreads of RAM’s terabytes, as well as thousands of cores. We integrate, process and analyze data. We need all this to run and test the applications we have developed that use state-of-the-art technologies from the Big Data world.
We are also optimizing complex machine learning applications (ML pipelines), improving the generation of huge analytical views (joining tables with millions of records in a few minutes is our speciality!). We have an enormous cluster available to test all the developed apps.
Apache Spark (core, SQL, pySpark, Streaming, mllib), Scala, Kafka, Hive, HBase, Hadoop, Teradata, Azure, Jenkins, Ansible, SBT, GIT.
Over thirty developers with experience in building Big Data solutions divided into teams of 4-6 people. We have a real influence on the selection of tools or the possibility of architectural changes.
What we expect in general
- Able and eager to lead by example
- Hands-on experience in designing and developing scalable, distributed, highly available solutions
- Interest in solving challenging data engineering problems using state-of-the-art techniques
- Grounded knowledge and understanding of data structures, algorithms, and distributed computing
- Fluent with Scala and/or Spark
- A functional Programming mindset is nice to have
- Able to develop and maintain high-quality code
- Can communicate with the business using good English (both written and verbal)
- Experienced in using big data and cloud solutions (eg Hadoop, AWS, Azure)
- Has a deep understanding of data-intensive distributed systems