When I first saw the twitter video message, in which Christiana Figueres (then chairman of UNFCCC ) inviting global citizens to be climate neutral by offsetting unavoidable emissions by proportionately funding carbon neutralizing projects, I instantly felt it is a great idea. I still think it is and have a great respect for all the efforts from UNFCCC and… Continue reading
The μ-services rant
The economy is powered by the bytes today and in byte economy, the cost of storage, network and compute power is rapidly declining. This trend has made quite a few feats viable in the recent years. Some of them aren’t completely new. “Big data”, “Fast data”, “IoT” and “Deep learning” are few to mention. Yet another trend byte economy created… Continue reading
A Deep Learning primer for all
IMHO there are a ton of resources on Deep learning but none of them and even practitioners are NOT able to articulate what is deep learning to a layperson, hence this post! We are living in an awesome time. One of the reasons is, it is not normal for any field to be dormant for… Continue reading
Natural language searches: Lessons in spellcheck and autocorrect
Spellcheck and autocorrect are two of the coolest features of search engines. They are not only cool they are also key for a smooth search experience. For the context of this post, I am going to stick with English language searches. Peter Norvig, a Google scientist has broken this subject down beautifully here to show the… Continue reading
Data engineering competency sheet
After explaining to many people who came to me asking “how should I go about evaluating a data engineer or big data developer profile”, I created a competency sheet which the candidates can fill-in. Hopefully, this should give some idea. What I observed was interesting, people who haven’t actually done any work didn’t get back with… Continue reading
Bigdata 2.0 talk at Microsoft
I gave a talk at M$FT about Streaming data platforms in Big data architectures. As promised here is the presentation – here you go
What is so sexy about Unikernels ?
Unikernels (sounds almost like unicorns) are the newest advancement or the latest buzzword in the infrastructure virtualisation space to the say the least. Unikernel.org and Wikipedia offer great definitions for unikernels, but I felt stacking it against other virtualization techniques will be a good addition to those definitions. So, here is a quick comparison and a… Continue reading
NoSQL datastore for IoT
I recently answered “Which NoSQL DB is more advantageous for IoT data?” in quora. ” Let me answer the question in 2 different aspects: 1. Design and 2. Choosing 1. Design: The question: what kind of data store and an upstream system is warranted to back a large IoT (or even an Industrial IoT )… Continue reading
Continuous streaming integration with streaming data platforms
Someone asked me in Quora “Should I use Gobblin or Spark Streaming to ingest data from Kafka to HDFS?” Here is what I wrote: This introduces a new architecture pattern called continuous streaming integration (CSI) with streaming data platforms (SDP) for solving the app and data integration challenges. Short answer: If your data sink is… Continue reading
Modernising data architecture for enterprises
Prelude Before getting into the topic of focus i.e. how to modernise data architecture for large enterprises (which typically comes with lot of legacy baggage and organisational memory), I would like to set the context by clearing the air around one thing that is related to this subject. First step in Big data solutions consulting like any… Continue reading