What is Observability ?

This is a 2 min readThere has been a lot of movement around observability (more precisely system observability) in the DevOps world, which is nothing but a control systems terminology for monitoring production systems. Here is a much-needed rant about it. So what is Observability? Observability is a feature that a system offers which users can use to understand how it is… Continue reading

Continuous streaming integration with streaming data platforms

This is a 4 min readSomeone asked me in Quora “Should I use Gobblin or Spark Streaming to ingest data from Kafka to HDFS?” Here is what I wrote: This introduces a new architecture pattern called continuous streaming integration (CSI) with streaming data platforms (SDP) for solving the app and data integration challenges. Short answer: If your data sink is… Continue reading

Modernising data architecture for enterprises

This is a 1 min readPrelude Before getting into the topic of focus i.e. how to modernise data architecture for large enterprises (which typically comes with lot of legacy baggage and organisational memory), I would like to set the context by clearing the air around one thing that is related to this subject.  First step in Big data solutions consulting like any… Continue reading

Experiences with Kafka and exactly-once processing in IoT apps

This is a 5 min readSome context on message brokers and delivery guarantees (If you have fair amount of experiences with message processing and delivery guarantees please skip to the next part of this post.) Message delivery guarantee is one of the canonical requirements for message brokers and they are very relevant for all types of brokers: the ones based on queue semantics and the ones… Continue reading

TAPO for Airports – A Streaming usecase

This is a 1 min readAirports, especially the busy ones face an interesting challenge when it comes to serving the commuters, they need a smoother way to handle passengers in queues without long frustrating waits and thereby elevate the overall experience. No one likes to wait/stand in long queues. But airports, unfortunately, have lots of queues one for check-in, baggage… Continue reading

Apache Flink CEP and ATM Fraud usecase- Part 2

This is a 2 min readOn the 1st part of this multi-part series on Apache Flink CEP library, I briefly covered the case for a dedicated CEP framework among the toolsets of open-source stream processing frameworks. Quick recap on the use case For a customer, an ATM Withdrawal Txn >= 10,000 made more than ‘3 times’ in a location >  50 mile radius… Continue reading

Apache Flink CEP Library – Part 1

This is a 4 min readI am presuming you know the what’s and why’s of Apache Flink, touted as a one of the best data processing framework that can do both batch and streaming processing. Recently Flink announced a cool new CEP Library.  Just hang on with me, before going any further let me just say the reason for this post, some time back a lot… Continue reading