An automated single source of truth

The challenge

The insurer had to handle various data sources with different formats and integration methods. Each department downloaded and cleaned data separately to generate meaningful reports, leaving room for errors. The process often resulted in duplicate work on the same data, making maintaining high data quality difficult.

During the cleansing process, conflicting or problematic data was dropped, leading to inconsistencies. A data engineer and a business analyst had to carefully review the data set, engaging in close cooperation since both couldn’t find errors individually. This process consumed time and resources, increasing operational costs due to expensive engineer engagement.

Managing intricate data transformations and cleaning procedures was at the forefront of their concerns. Our client aimed to avoid vendor lock-in, seeking to establish a custom and reliable single source of truth to streamline their operations. This was the point when the insurer sought the expertise of VirtusLab to tackle data quality challenges efficiently.

The solution

VirtusLab helped incorporate a medallion architecture as a central hub to collect and refine data. This cloud-based data lake solution was designed with three key layers:

The Raw Layer houses data “as-is” from diverse sources in a cloud-based storage
The Curated Layer ensures that the data is cleaned and undergoes simple transformations to enhance its quality
The Business Aggregates Layer stores combined and transformed data, offering a unified, single source of truth, and perspective for end users

The solution, built on Scala and Apache Spark, adopted a domain-driven design approach. Users could effortlessly access the data they sought by creating distinct aggregates for each domain. The implementation of the new solution also led to a reduction in data quality issues. This solution enabled Business Analysts to spend less time analyzing data quality problems. The time savings came in two ways:

Firstly, troublesome data was stored separately and shown in an easy-to-consume manner, removing the need for analysts to create complex queries;
Secondly, as it was a self-served process, there was no waiting for data engineers to participate.

By keeping the platform simple and free from external dependencies, VirtusLab avoided unnecessary complexities. This approach evades vendor lock-in and simplifies the addition of new components, enabling scalability. VirtusLab fortified the code with the Scala-type system, elevating code correctness and minimising maintenance costs.

The results

Implementing the custom analytical data platform resulted in a unified, error-free, and scalable solution, meeting the insurer’s goal of becoming a modern and data-driven company. They:

Processed thousands of tables daily and provided data to various business users and executives
Reduced time and costs by eliminating duplicate coding and fixing data quality issues promptly
Avoided vendor lock-in and established a future-proof, maintainable data platform
Improved data quality and accuracy, leading to better business insights and decision-making
Obtained a sought-after solution that is highly desired within the industry but seldom attainable
Reduced time and operational costs

Boosting business analyses with an automated single source of truth

The challenge

The solution

The results

Tech stack

Language

Infrastructure

Frameworks

Libraries