This is a 2 min readThere has been a lot of movement around observability (more precisely system observability) in the DevOps world of late, which is nothing but a control system terminology for monitoring production systems. Here is a much-needed rant about it. So what is Observability? Observability is an attribute of a system that lets the users understand how “it… Continue reading
Post Category → Big Data
The μ-services rant
This is a 2 min readThe economy is powered by the bytes today and in byte economy, the cost of storage, network and compute power is rapidly declining. This trend has made quite a few feats viable in the recent years. Some of them aren’t completely new. “Big data”, “Fast data”, “IoT” and “Deep learning” are few to mention. Yet another trend byte economy created… Continue reading
A Deep Learning primer for all
This is a 10 min readIMHO there are a ton of resources on Deep learning but none of them and even practitioners are NOT able to articulate what is deep learning to a layperson, hence this post! We are living in an awesome time. One of the reasons is, it is not normal for any field to be dormant for… Continue reading
Data engineering competency sheet
This is a 1 min readAfter explaining to many people who came to me asking “how should I go about evaluating a data engineer or big data developer profile”, I created a competency sheet which the candidates can fill-in. Hopefully, this should give some idea. What I observed was interesting, people who haven’t actually done any work didn’t get back with… Continue reading
Bigdata 2.0 talk at Microsoft
This is a 1 min readI gave a talk at M$FT about Streaming data platforms in Big data architectures. As promised here is the presentation – here you go
What is so sexy about Unikernels ?
This is a 2 min readUnikernels (sounds almost like unicorns) are the newest advancement or the latest buzzword in the infrastructure virtualisation space to the say the least. Unikernel.org and Wikipedia offer great definitions for unikernels, but I felt stacking it against other virtualization techniques will be a good addition to those definitions. So, here is a quick comparison and a… Continue reading
NoSQL datastore for IoT
This is a 4 min readI recently answered “Which NoSQL DB is more advantageous for IoT data?” in quora. ” Let me answer the question in 2 different aspects: 1. Design and 2. Choosing 1. Design: The question: what kind of data store and an upstream system is warranted to back a large IoT (or even an Industrial IoT )… Continue reading
Continuous streaming integration with streaming data platforms
This is a 4 min readSomeone asked me in Quora “Should I use Gobblin or Spark Streaming to ingest data from Kafka to HDFS?” Here is what I wrote: This introduces a new architecture pattern called continuous streaming integration (CSI) with streaming data platforms (SDP) for solving the app and data integration challenges. Short answer: If your data sink is… Continue reading
Modernising data architecture for enterprises
This is a 1 min readPrelude Before getting into the topic of focus i.e. how to modernise data architecture for large enterprises (which typically comes with lot of legacy baggage and organisational memory), I would like to set the context by clearing the air around one thing that is related to this subject. First step in Big data solutions consulting like any… Continue reading
Doing “exactly-once” in stream processing, a Google cloud data flow perspective
This is a 5 min readAs the title suggests the scope of this multipart post is to evaluate how exactly-once processing is proposed in Google cloud data flow paper (link shared below) and hence implemented in the data flow service (which is the basis for Apache Beam). Although the titles are different these posts shall be considered as precursors for this post (here… Continue reading