On the 1st part of this multi-part series on Apache Flink CEP library, I briefly covered the case for a dedicated CEP framework among the toolsets of open-source stream processing frameworks. Quick recap on the use case
For a customer, an ATM Withdrawal Txn >= 10,000 made more than ‘3 times’ in a location > 50 mile radius from the next txn location within ‘2 hours’ should be flagged as a possible fraudulent transaction and an alert should be triggered.
Business users need a simple yet intuitive interface to define such complex events and alerts. But the focus of this post is how well the Flink CEP library lends itself to implement typical CEP uses-cases. Here is a early draft version of a code. (Note: for testing purposes I have kept the Txn count 2 instead of 3 and Txn window to 20 seconds, you could always change it)
Observations so far..
- I have used Kafka as the event source, the code is couple of lines. (Matching Kafka connector version with Flink version wasn’t straightforward)
- Creating Patterns with the pattern API for the CEP is very expressive. Comes with the classic “next” and “followedby” semantics built-in.
- Java 8 lambda support makes things readable and easy but the IDE /JDT limitations is rather painful. I use Eclipse Luna 4.4.2 SR2 and JDT generate doesn’t work.
Need to explore below pointers in this order.
- For the duration window in which if events happened, I have used ingest time as event time for simplicity. Flink provides options to extract event time from event packet itself. This needs to be explored, changed and tested for performance at scale.
- Expand the filter function placeholder with logic to do the GeoSpatial filter between ATM location and customer homesite using Elastic search integration.
- Verify how CEP states are maintained between failures and check recovery.
- Create an alert like a mobile push notification or email to make it more real.
- Combine different events to form a Complex event.