As the title suggests the scope of this multipart post is to evaluate how exactly-once processing is proposed in Google cloud data flow paper (link shared below) and hence implemented in the data flow service (which is the basis for Apache Beam). Although the titles are different these posts shall be considered as precursors for this post (here… Continue reading
Author Archives → Prithiviraj Damodaran
Experiences with Kafka and exactly-once processing in IoT apps
Some context on message brokers and delivery guarantees (If you have fair amount of experiences with message processing and delivery guarantees please skip to the next part of this post.) Message delivery guarantee is one of the canonical requirements for message brokers and they are very relevant for all types of brokers: the ones based on queue semantics and the ones… Continue reading
TAPO for Airports – A Streaming usecase
Airports, especially the busy ones face an interesting challenge when it comes to serving the commuters, they need a smoother way to handle passengers in queues without long frustrating waits and thereby elevate the overall experience. No one likes to wait/stand in long queues. But airports, unfortunately, have lots of queues one for check-in, baggage… Continue reading
Apache Flink CEP and ATM Fraud usecase- Part 2
On the 1st part of this multi-part series on Apache Flink CEP library, I briefly covered the case for a dedicated CEP framework among the toolsets of open-source stream processing frameworks. Quick recap on the use case For a customer, an ATM Withdrawal Txn >= 10,000 made more than ‘3 times’ in a location > 50 mile radius… Continue reading
Apache Flink CEP Library – Part 1
I am presuming you know the what’s and why’s of Apache Flink, touted as a one of the best data processing framework that can do both batch and streaming processing. Recently Flink announced a cool new CEP Library. Just hang on with me, before going any further let me just say the reason for this post, some time back a lot… Continue reading
Wide row data modelling with Apache Cassandra
I have always been intrigued by the performance claims of Apache cassandra. So, I wanted to put the whole “wide rows” and the performance edge claims that wide-row data model said to offer to the test. Rumour has it, Facebook hired ex-Amazon engineers who wrote Dynamo to build cassandra. Anyways, a sound starting point is to… Continue reading
Why Zookeeper is always configured with odd number of nodes ?
Someone in Quora.com asked me “Why Zookeeper is always configured with odd number of nodes ?”. Well, thats a great question but sad part is, not even many practitioners, even those who use Zookeeper in production can explain it simply. I will try to keep this really simple, I promise. ZooKeeper (ZK) is a highly-available, highly-reliable and… Continue reading
When to MapReduce ?
Someone asked me for what problems MapReduce is not good for, I am flipping the question and answering what problems it works best for. Say, someone asked when to use recursive logic vs iterative logic, there is a bit of grey area there even though some problems clearly lend itself to recursion like graph traversal,… Continue reading
Terminology confusion: Column Stores and Column oriented databases
This is my attempt to clear the air in the subjects of Column Stores and Column oriented databases (both at terminology and at understanding level). I will be talking a bit about how terrible is the idea of grouping column oriented databases as flavour of NoSQL data stores. What is a column store really ? There is no scope… Continue reading
How does the Log-Structured-Merge-Tree work?
If you are wondering why should you care about LSM Tree, In one of my previous posts Art of choosing a datastore , I have briefly touched upon LSM-Trees. But this writeup is the best out there if you want to learn the inner workings of a LSM-Tree. How does the Log-Structured-Merge-Tree work? This was Quora answer by David Jeske…. Continue reading