Data in the cloud with Apache Spark

The client has been collecting data for 30+ years. The size of the data went beyond what a standard SQL storage could cope with (even specialized clustered Teradata).

Yet, the business was afraid that moving the data to a Hadoop cluster would worsen the performance of their daily queries.

VirtusLab’s Sherlock team used their Big Data technology stack expertise to build multiple-query engine prototypes, benchmarked them using openly available cloud services (on representative anonymized data) and showed that using an out of the box Apache Spark based approach can bring actual query times down without any impact on the client’s computing resources.

The result

‘The client took a well-informed business decision on the migration, thereby minimising the risk and costs, whilst maximising the benefits.

Grzegorz, Head of the Team