Building risk detection engine with Kafka and stream processing for top global retailer

Every organisation exposing online services needs to have security mechanisms that prevent a variety of different attacks and frauds attempted by malicious users. Due to the cloud revolution, access to computing power distributed across the globe is easier than ever. Encouraged by that fact, attackers build massive botnets, imitating real customers, in order to steal value, PII or any other goods from their accounts. Recognizing account takeovers in real-time at scale is a key responsibility of risk engine platform, that we have built.
Context statistics calculation
Authentication attempt categorization
Taking reactive and proactive targeted actions to prevent different types of attacks
Recognizing login attempts from unknown and untrusted devices for given user
Recognizing login attempts from new locations for a given user
Recognizing login attempts from botnet agents
Recognizing brute-force attacks
The final solution offers several different types of analysis. Analyses are performed in parallel, using specific data pipelines. Each pipeline’s execution is triggered by every new login event, which makes the platform event-driven. An event is just a statement of the fact – something that has happened in the real world. Events of certain type are grouped together in an event stream. Building a reliable, scalable and fault-tolerant event streaming platform is a very complex task on its own, so we have decided to use an industry-standard in the area – Apache Kafka.
Apache Kafka provides all the necessary abstractions to build a platform based on stream processing paradigm. Moreover, it offers multiple approaches for building stream processor applications. Starting from the most basic services using Customer/Producer API, through applications utilizing Kafka Streams, finishing on SQL interface provided by ksqlDB engine.
Most of our stream processors are based on Kafka Streams, which is a library that can be used with any application built on a JVM stack. Kafka Streams based applications do not have any specific requirements about the deployment platform, thus our infrastructure is built on top of Kubernetes, which allows scaling up and down according to the traffic volume. Additionally, there are also specific stream processors that use ksqlDB, together with multiple integrations with third-party systems through Kafka Connect.
“The final system has been built based on an event-at-a-time processing model that operates with millisecond latency with Kafka in its core. Kafka Streams framework supplies features that help engineers to focus on delivering real business value. The solution illustrates how the stream processing paradigm could be applied to build software that accommodates a requirement to provide results in real-time at scale.” – says Daniel Jagielski, TechLead at VirtusLab
The given results are 1-day statistics:
- ~1000 unique IPs blocked
- ~30 000 total amount of IP blocked
- ~500 user accounts locked

Read full article about reference architecture here.