Pub/sub messaging: Apache Kafka versus Apache Pulsar
Apache Kafka set the bar for large-scale distributed messaging, but Apache Pulsar has some neat tricks of its own
These days, massively scalable pub/sub messaging is virtually synonymous with Apache Kafka. Apache Kafka continues to be the rock-solid, open-source, go-to choice for distributed streaming applications, whether you’re adding something like Apache Storm or Apache Spark for processing or using the processing tools provided by Apache Kafka itself. But it isn’t the only game in town.
Developed by Yahoo and now an Apache Software Foundation project, Apache Pulsar is going for the crown of messaging that Apache Kafka has worn for many years. Apache Pulsar offers the potential of faster throughput and lower latency, along with a compatible API that allows developers to switch from one to another with relative ease.
How should one choose between the venerable stalwart Apache Kafka and the upstart Apache Pulsar? Let’s look at their core open source offerings and what the core maintainers’ enterprise editions bring to the table.
Developed by LinkedIn and released as open source back in 2011, Apache Kafka has spread far and wide, pretty much becoming the default choice for many when thinking about adding a service bus or pub/sub system to an architecture. Since it’s debut, the ecosystem has grown considerably, adding the Scheme Registry to enforce schemas in messaging, Kafka Connect for easy streaming from other data sources such as databases to Kafka, Kafka Streams for distributed stream processing, and most recently KSQL for performing SQL-like querying over Kafka topics. (A topic in Kafka is the name for a particular channel.)
The standard use-case for many real-time pipelines built over the past few years has been to push data into Apache Kafka and then use a stream processor such as Apache Storm or Apache Spark to pull in data, perform and processing, and then publish output to another topic for downstream consumption. With Kafka Streams and KSQL, all of your data pipeline needs can be handled without having to leave the Apache Kafka project at any time, though of course, you can still use an external service to process your data if required.
This article originally appeared on InfoWorld.com. To read the full article, click here.
Nastel Technologies uses machine learning to detect anomalies, behavior and sentiment, accelerate decisions, satisfy customers, innovate continuously. To answer business-centric questions and provide actionable guidance for decision-makers, Nastel’s AutoPilot® for Analytics fuses:
- Advanced predictive anomaly detection, Bayesian Classification and other machine learning algorithms
- Raw information handling and analytics speed
- End-to-end business transaction tracking that spans technologies, tiers, and organizations
- Intuitive, easy-to-use data visualizations and dashboards
If you would like to learn more, click here