Natural language searches: Lessons in spellcheck and autocorrect

This is a 6 min readSpellcheck and autocorrect are two of the coolest features of search engines. They are not only cool they are also key for a smooth search experience. For the context of this post, I am going to stick with English language searches. Peter Norvig, a Google scientist has broken this subject down beautifully here to show the… Continue reading

What is so sexy about Unikernels ?

This is a 2 min readUnikernels (sounds almost like unicorns) are the newest advancement or the latest buzzword in the infrastructure virtualisation space to the say the least. Unikernel.org and Wikipedia offer great definitions for unikernels, but I felt stacking it against other virtualization techniques will be a good addition to those definitions. So, here is a quick comparison and a… Continue reading

Continuous streaming integration with streaming data platforms

This is a 4 min readSomeone asked me in Quora “Should I use Gobblin or Spark Streaming to ingest data from Kafka to HDFS?” Here is what I wrote: This introduces a new architecture pattern called continuous streaming integration (CSI) with streaming data platforms (SDP) for solving the app and data integration challenges. Short answer: If your data sink is… Continue reading

Modernising data architecture for enterprises

This is a 1 min readPrelude Before getting into the topic of focus i.e. how to modernise data architecture for large enterprises (which typically comes with lot of legacy baggage and organisational memory), I would like to set the context by clearing the air around one thing that is related to this subject.  First step in Big data solutions consulting like any… Continue reading

Doing “exactly-once” in stream processing, a Google cloud data flow perspective

This is a 5 min readAs the title suggests the scope of this multipart post is to evaluate how exactly-once processing is proposed in Google cloud data flow paper (link shared below) and hence implemented in the data flow service (which is the basis for Apache Beam). Although the titles are different these posts shall be considered as precursors for this post (here… Continue reading

Experiences with Kafka and exactly-once processing in IoT apps

This is a 5 min readSome context on message brokers and delivery guarantees (If you have fair amount of experiences with message processing and delivery guarantees please skip to the next part of this post.) Message delivery guarantee is one of the canonical requirements for message brokers and they are very relevant for all types of brokers: the ones based on queue semantics and the ones… Continue reading

TAPO for Airports – A Streaming usecase

This is a 1 min readAirports, especially the busy ones face an interesting challenge when it comes to serving the commuters, they need a smoother way to handle passengers in queues without long frustrating waits and thereby elevate the overall experience. No one likes to wait/stand in long queues. But airports, unfortunately, have lots of queues one for check-in, baggage… Continue reading