7 August 2017 / Jan Paw

Diving in the data lake

, , ,


Rapid growth of unstructured data is a serious business challenge for organizations. Data repositories, known as data lakes, have a great chance to play an important role in extracting valuable business information from enormous amounts of data. Storing and processing data on such a scale is a very complex and demanding task. Existing RDBMS-based systems […]

Read more

31 August 2017 / Bartłomiej Tomala

Navigating data lakes using Atlas

, , , , ,


Nowadays almost every company wants to have their own Big Data system to analyse client behaviour and optimise operating costs. One of the most popular solutions for implementing such systems is a Data Lake based on the Hadoop ecosystem. If you don’t know what exactly a Data Lake is, you can read about it in […]

Read more

19 September 2017 / Jan Paw

Hadoop legacy

, , ,


In the previous blog post I explained the basic concepts of data lakes. Some core problems which can occur in data lakes were defined and I gave some hints to avoid them. Most of these pitfalls are caused by the traits of data lakes. Unfortunately, current Hadoop distributions can’t resolve them entirely. Additionally, the architecture […]

Read more