The challenge
The insurer had to handle various data sources with different formats and integration methods. Each department downloaded and cleaned data separately to generate meaningful reports, leaving room for errors. The process often resulted in duplicate work on the same data, making maintaining high data quality difficult.
During the cleansing process, conflicting or problematic data was dropped, leading to inconsistencies. A data engineer and a business analyst had to carefully review the data set, engaging in close cooperation since both couldn’t find errors individually. This process consumed time and resources, increasing operational costs due to expensive engineer engagement.
Managing intricate data transformations and cleaning procedures was at the forefront of their concerns. Our client aimed to avoid vendor lock-in, seeking to establish a custom and reliable single source of truth to streamline their operations. This was the point when the insurer sought the expertise of VirtusLab to tackle data quality challenges efficiently.
The solution
VirtusLab helped incorporate a medallion architecture as a central hub to collect and refine data. This cloud-based data lake solution was designed with three key layers:
- The Raw Layer houses data “as-is” from diverse sources in a cloud-based storage
- The Curated Layer ensures that the data is cleaned and undergoes simple transformations to enhance its quality
- The Business Aggregates Layer stores combined and transformed data, offering a unified, single source of truth, and perspective for end users
The solution, built on Scala and Apache Spark, adopted a domain-driven design approach. Users could effortlessly access the data they sought by creating distinct aggregates for each domain. The implementation of the new solution also led to a reduction in data quality issues. This solution enabled Business Analysts to spend less time analyzing data quality problems. The time savings came in two ways:
- Firstly, troublesome data was stored separately and shown in an easy-to-consume manner, removing the need for analysts to create complex queries;
- Secondly, as it was a self-served process, there was no waiting for data engineers to participate.
By keeping the platform simple and free from external dependencies, VirtusLab avoided unnecessary complexities. This approach evades vendor lock-in and simplifies the addition of new components, enabling scalability. VirtusLab fortified the code with the Scala-type system, elevating code correctness and minimising maintenance costs.
The results
Implementing the custom analytical data platform resulted in a unified, error-free, and scalable solution, meeting the insurer’s goal of becoming a modern and data-driven company. They:
- Processed thousands of tables daily and provided data to various business users and executives
- Reduced time and costs by eliminating duplicate coding and fixing data quality issues promptly
- Avoided vendor lock-in and established a future-proof, maintainable data platform
- Improved data quality and accuracy, leading to better business insights and decision-making
- Obtained a sought-after solution that is highly desired within the industry but seldom attainable
- Reduced time and operational costs