The challenge
Our client’s data analysis was hindered by the inability to isolate price changes. They had to recalculate the entire dataset to address this issue, resulting in billions of rows in the price table. This caused delays in data availability for other users, with some experiencing up to a 3-hour wait time. As a result, employees from different departments had difficulty performing their work efficiently, impacting our client's overall productivity. Given the global nature of our client's business and the diverse time zones of its employees, adjusting data transformation times was not a viable solution. Clearly, a new approach to data processing was needed to maintain a competitive edge. This was when our client reached out to VirtusLab.
The solution
Working within our client's tight schedule, VirtusLab (VL) proposed an interim solution to enhance their existing construct using our extensive knowledge and experience in Spark and BigData. VL enhanced the default method of overwriting the entire table in Spark by using file manipulation. Our team utilized various solutions to facilitate background saving and faster file movement in the dedicated data storage file system. As a result, we were able to:
- Save recalculated data files separately from the table
- Replace the original table files with the moved data files
- Repair the table's metadata to ensure data quality and completion
Our proposed solution benefits our client by significantly reducing the time of data unavailability, allowing their employees to work more efficiently. Moreover, our solution leverages the latest industry practices, positioning our client as a competitive and forward-thinking organization.
The results
Overall, our consultancy services have enabled our client to optimize their data processing and achieve measurable improvements in efficiency and productivity. Our services delivered significant results for our global retail client, including:
The tech stack
Languages: scala, SQL, HiveQL
Database: Hive
Eventing platform: Kafka
Infrastructure: Hortonworks Data Platform / Spark, Hive, HDFS, YARN, Oozie, Sqoop, Ranger
Partner flexibly with VirtusLab
Use one or a combination of engagement models to suit your needs.