Expanding Visibility With Apache Kafka – What would you do if you had terabytes of operational data being generated in production each day, and hundreds of engineering teams wanting to use that data to improve their services … but no way to connect the two?

A Broader Vision

This started us thinking bigger. There are a vast number of different components that make up Salesforce’s systems, generating massive amounts of data, and we need a robust way to collect, transport, process and transform all of these streams of information. In broad strokes, the architecture we’re evolving towards looks like this:

Expanding Visibility With Apache Kafka

The key simplifying point in this picture for us is Apache Kafka. It allows us to use a unified, near-real-time transport for a wide variety of data types that we’re ingesting, including system metrics and state information, system logs, network flow data, and application logs.

As we delivered the initial implementations of our streaming systems, we started to realize that this was a concept that extended well beyond operational visibility data. We started conversations with other teams who wanted to use a similar architecture: decoupling the systems that produce and consume event data, where the pub / sub model is a natural fit. And we began to realize that this is actually a platform, rather than a solution to one specific problem.



This article originally appeared on Salesforce Engineering.  To read the full article, click here.


Nastel Comments:

Apache Kafka is a unified, high-throughput, low-latency platform for handling real-time data feeds. It utilizes a massively scalable pub/sub message queue—designed as a distributed transaction log—as its storage layer. It is often used by Nastel clients as a transport mechanism for streaming data interconnected with other messaging and processing systems. AutoPilot for Apache Kafka provides operational and transactional monitoring for Apache Kafka.