There has been a lot of movement around observability (more precisely system observability) in the DevOps world of late, which is nothing but a control system terminology for monitoring production systems. Here is a much-needed rant about it. So what is Observability? Observability is an attribute of a system that lets the users understand how “it… Continue reading
Post Category → Big Data
The μ-services rant
The economy is powered by the bytes today and in byte economy, the cost of storage, network and compute power is rapidly declining. This trend has made quite a few feats viable in the recent years. Some of them aren’t completely new. “Big data”, “Fast data”, “IoT” and “Deep learning” are few to mention. Yet another trend byte economy created… Continue reading
A Deep Learning primer for all
IMHO there are a ton of resources on Deep learning but none of them and even practitioners are NOT able to articulate what is deep learning to a layperson, hence this post! We are living in an awesome time. One of the reasons is, it is not normal for any field to be dormant for… Continue reading
Data engineering competency sheet
After explaining to many people who came to me asking “how should I go about evaluating a data engineer or big data developer profile”, I created a competency sheet which the candidates can fill-in. Hopefully, this should give some idea. What I observed was interesting, people who haven’t actually done any work didn’t get back with… Continue reading
Bigdata 2.0 talk at Microsoft
I gave a talk at M$FT about Streaming data platforms in Big data architectures. As promised here is the presentation – here you go
What is so sexy about Unikernels ?
Unikernels (sounds almost like unicorns) are the newest advancement or the latest buzzword in the infrastructure virtualisation space to the say the least. Unikernel.org and Wikipedia offer great definitions for unikernels, but I felt stacking it against other virtualization techniques will be a good addition to those definitions. So, here is a quick comparison and a… Continue reading
NoSQL datastore for IoT
I recently answered “Which NoSQL DB is more advantageous for IoT data?” in quora. ” Let me answer the question in 2 different aspects: 1. Design and 2. Choosing 1. Design: The question: what kind of data store and an upstream system is warranted to back a large IoT (or even an Industrial IoT )… Continue reading
Continuous streaming integration with streaming data platforms
Someone asked me in Quora “Should I use Gobblin or Spark Streaming to ingest data from Kafka to HDFS?” Here is what I wrote: This introduces a new architecture pattern called continuous streaming integration (CSI) with streaming data platforms (SDP) for solving the app and data integration challenges. Short answer: If your data sink is… Continue reading
Modernising data architecture for enterprises
Prelude Before getting into the topic of focus i.e. how to modernise data architecture for large enterprises (which typically comes with lot of legacy baggage and organisational memory), I would like to set the context by clearing the air around one thing that is related to this subject. First step in Big data solutions consulting like any… Continue reading
Doing “exactly-once” in stream processing, a Google cloud data flow perspective
As the title suggests the scope of this multipart post is to evaluate how exactly-once processing is proposed in Google cloud data flow paper (link shared below) and hence implemented in the data flow service (which is the basis for Apache Beam). Although the titles are different these posts shall be considered as precursors for this post (here… Continue reading