The challenge
Our client’s data analysis was hindered by the inability to isolate price changes. They had to recalculate the entire dataset to address this issue, resulting in billions of rows in the price table. This caused delays in data availability for other users, with some experiencing up to a 3-hour wait time. As a result, employees from different departments had difficulty performing their work efficiently, impacting our client's overall productivity. Given the global nature of our client's business and the diverse time zones of its employees, adjusting data transformation times was not a viable solution. Clearly, a new approach to data processing was needed to maintain a competitive edge. This was when our client reached out to VirtusLab.
The solution
Working within our client's tight schedule, VirtusLab (VL) proposed an interim solution to enhance their existing construct using our extensive knowledge and experience in Spark and BigData. VL enhanced the default method of overwriting the entire table in Spark by using file manipulation. Our team utilized various solutions to facilitate background saving and faster file movement in the dedicated data storage file system. As a result, we were able to:
- Save recalculated data files separately from the table
- Replace the original table files with the moved data files
- Repair the table's metadata to ensure data quality and completion
Our proposed solution benefits our client by significantly reducing the time of data unavailability, allowing their employees to work more efficiently. Moreover, our solution leverages the latest industry practices, positioning our client as a competitive and forward-thinking organization.
The results
Overall, our consultancy services have enabled our client to optimize their data processing and achieve measurable improvements in efficiency and productivity. Our services delivered significant results for our global retail client, including:
- Higher availability: reducing waiting time from several hours to a matter of seconds.
- Increased availability of the table, allowing users to perform their daily work activities more efficiently.
- Adoption of state-of-the-art data processing methods, resulting in proper data handling and management.
- Our solution is universal and easily adaptable, ensuring our client can apply the same approach to their future processing tasks.
The tech stack
Languages: scala, SQL, HiveQL
Database: Hive
Eventing platform: Kafka
Infrastructure: Hortonworks Data Platform / Spark, Hive, HDFS, YARN, Oozie, Sqoop, Ranger
Partner flexibly with VirtusLab
Use one or a combination of engagement models to suit your needs.
Take the first step to a sustained competitive edge for your business
Let's connectVirtusLab's work has met the mark several times over, and their latest project is no exception. The team is efficient, hard-working, and trustworthy. Customers can expect a proactive team that drives results.
VirtusLab's engineers are truly Strapi extensions experts. Their knowledge and expertise in the area of Strapi plugins gave us the opportunity to lift our multi-brand CMS implementation to a different level.
VirtusLab has been an incredible partner since the early development of Scala 3, essential to a mature and stable Scala 3 ecosystem.
VirtusLab's strength is its knowledge of the latest trends and technologies for creating UIs and its ability to design complex applications. The VirtusLab team's in-depth knowledge, understanding, and experience of MIS systems have been invaluable to us in developing our product. The team is professional and delivers on time – we greatly appreciated this efficiency when working with them.