Skip to main content

VirtusLab's ArticlesRSS

Data Engineering|Mar 23, 2023

Is Hadoop still relevant: Is it our future, or does it belong to the past?

As business needs and market trends change, Hadoop and cloud data platforms will evolve together. Will Hadoop remain a data solution companies rely on?

Is_Hadoop_still_relevant__Is_it_our_future,_or_does_it_belong_to_the_past_image-min.jpg
Data Engineering|Oct 21, 2021

Scala 3 and Spark?

Spark 3.2.0’s support for Scala 2.13 technically allows Scala 3 Spark jobs—but it remains “an uphill path,” requiring workarounds for encoders and data shapes. Using libraries like Iskra smooths the path, yet production readiness is still experimental.

Data Engineering|Aug 26, 2021

Pandas-stubs — how we enhanced pandas with type annotations

VirtusLab created the pandas‑stubs library to enhance pandas with type information, enabling stronger type‑safety in pandas‑dependent projects. It emerged from challenges integrating pandas and pyspark, where missing stubs led to API conflicts and unchecked code.

Pandas-stubs — how we enhanced pandas with type annotations
Data Engineering|Aug 20, 2021

Table schemas in data pipelines Spark: How to handle large, nested & growing ones

In this post, we describe how we built a pipeline for the type of “incoming data” situation, and how we came up with a good solution in the end.

Table_schemas_in_data_pipelines_Spark_How_to_handle_large,_nested_&_growing_ones_image-min.jpg