Update, 11/Nov/2016: Originally this post was titled “Please don’t call Kafka as a messaging system”, I had to change it as some people went “what else would you call it ?”. The Kafka tagline used to be something like “Message processing rethought”, but looks like later they changed it to “A Distributed streaming platform”. So, I am changing the title of my post back ;-). I feel validated, yet again !!
Original post:
There are quite a few questions in Q&A forums like Quora and Reddit in the following lines,
- “Whats the difference between Kafka and RabbitMQ ?”
- “Whats the difference between Kafka and other queue based messaging systems like ActiveMQ ?”
- “What are some of the use cases for Kafka”
- “What kind of systems are a good fit for Kafka”
While nothing wrong with these questions but definitely tells a good subset of people aren’t clear about the purpose of Kafka. Most of them use terms like Kafka queues and compare it with traditional queue based message oriented middleware. So this post serves to clear the air, now before getting into any controversies let me say – Kafka is a messaging system but with a twist, as they say “messaging rethought”.
Some context..
Well it is a distributed, partitioned, replicated, high performance commit log but for a simpler use case it can be used for simple publish and subscribe messaging. But then its only because of the underlying mechanism you might be tempted to use it that way and there is so much more to it. I feel using Kafka in tandem with message queues undermines the purpose for which it is built. Consider my argument as a logical extension of what Jay Kreps, confluent CEO explained in his slides, videos and blog posts. Here check this video, “What problem does Kafka solve ?” simply explained. Calling it a message queue is ridiculous, I am not unhappy or making fun, just concerned about the level of awareness.
Before moving on, I would like to talk about EAI (Enterprise Application Integration) quickly. For newbies, EAI is a discipline that advocates bunch of patterns and models for integrating disparate enterprise application technologies and services. There is a lot to know about EAI as such, feel free.
But key takeaway for us is connecting enterprise applications with custom built point to point pipelines is planning for a disaster.
(via packtpub.com)
Instead, EAI advocates couple of topologies for applications to communicate one is hub and spoke as shown below and other is bus based integration. The bus based integration later proliferated into something of its own called ESB (enterprise service bus). ESB was believed to be the next big thing alongside SOA. But we are interested in the hub and spoke style for this post.
Simply put, EAI just tells you how to be a better plumber.
(via packtpub.com)
Adding more complexity to the integration part , consider how data we care about and data systems have changed recently.
Types of data stored
- Not just Database data a.k.a business transactions
- Events (user activity stream)
- Metrics
- Logs
Types of data requests served
- Low latency Req/Res
- Search
- Complex Relations (Graphs )
- Real-time alerting and monitoring
- Batch analytics and Reporting
Types of data systems and polyglot persistence
- We no more use general purpose data stores. we have the luxury to choose data stores/models to suit the type of data ? All data stores and the data requests they serve are special purpose.
Yes Kafka comes for rescue and its solves the data and app integration for your enterprise. Kafka is the hub in your hub and spoke (why do you think the Kafka icon was designed like how it looks ?, ok it looks like the letter “K” , but also looks like the hub and spoke) . Although Kafka gives a different terminology – producers, consumers and brokers, as for as integration Kafka is only an abstraction of Hub and spoke architecture.
Bad app and data integration
(via linkedin engineering blog)
Good data and app integration
(via linkedin engineering blog)
and you know what , just because your data stores are special purpose doesn’t mean your integration has to be special purpose ;-), go general purpose. Your sources and destinations shouldn’t even know the existence of the other.
Go for Kafka.
Pingback: Continuous streaming integration with streaming data platforms » Prithiviraj Damodaran