A day with Kafka

Many of us have heard the story about an elephant and a group of blind men. None of the men had come across an elephant before. One blind man approaches the leg and declares, “It’s like a tree.” Another man approaches the tail and declares, “It’s likea rope.” A third approaches the trunk and declares, “It’s like a snake.” So, each blind man senses the elephant from their point of view, and comes to a subtly different conclusion as to what an elephant is. Of course, the elephant isn’t any of these things, but all of them at the same time. It’s an elephant! Similarly, when we speak about Kafka, people often see things from different viewpoints as below: –  

  • Kafka is like REST but Asynchronous? 

Kafka provides an asynchronous protocol for connecting programs together, but it is undoubtedly different from say, TCP (transmission control protocol), HTTP, or an RPC protocol. The difference is the presence of a broker. A broker is a separate piece of infrastructure that broadcasts messages to any programs that are interested in them, as well as storing them for as long as is needed. Thus, it’s perfect for streaming and fire-and-forget messaging. 

A Rest Gateway provides an efficient Request-Response bridge to Kafka. This is in some ways a logical extension of the REST Proxy, wrapping the concepts of both a request and a response. 

What problem does it solve? 

  • Allows you to contact a service, and get a response back, for example: 
  • to display the contents of the user’s shopping basket 
  • to validate and create a new order. 
  • Access many different services, with their implementation abstracted behind a topic name. 
  • Simple Restful interface removes the need for asynchronous programming front-side of the gateway. 
  • Kafka Is Like a Service Bus? 

Apache Kafka and Enterprise Service Bus (ESB) are complementary, not competitive! 

If we consider Kafka as a messaging system—with its Connect interface (i.e., Kafka Connect), which pulls data from and pushes data to a wide range of interfaces and datastores, and streaming APIs that can manipulate data in flight—it does look a little like an ESB (enterprise service bus).   

The difference is that an ESB requires integration logic to be implemented in the central ESB infrastructure. In contrast, Kafka decouples the systems and the integration logic can be built in each domain/application/service. To better understand let’s have a pictorial view below for ESB and Kafka. 

Enterprise Service Bus (ESB) 

Vs 

Apache Kafka 

Should we replace our existing MQ and ESB deployments? 

Younger companies like Netflix, LinkedIn, and Zalando built their whole infrastructure on Kafka. Older companies are not that fortunate because they have plenty of mainframes, monoliths and legacy technology. However, a big bang replacement is not the right way to be successful. It’s a lot like transforming your home. Although it might make sense in theory to rebuild it from the ground up, oftentimes it is more practical to extend the house, change certain rooms or redecorate. 

That said, sometimes it may be more cost effective to replace legacy architecture, just like it sometimes does make sense to remodel a house from the ground up. This was the case with Sberbank, the biggest bank in Russia, which built their complete core banking system around Kafka as central nervous system

  • Kafka Is Like a Database? 

Some people like to compare Kafka to a database.It certainly comes with similar features.It provides storage; production topics with hundreds of terabytes are not uncommon. It has a SQL interface that lets users define queries and execute them over the data held in the log.These can be piped into views that users can query directly. It also supports transactions.These are all things that sound quite “databasey” in nature! 

Many of the elements of a traditional database are there, but if anything, Kafkais a database inside out, a tool for storing data, processing it in real time, and creating views. Essentially, it is the idea that a database has a few core components—a commit log, a query engine, indexes, and caching—and rather than conflating these concerns inside a single black-box technology like a database does, we can split them into separate parts using stream processing tools and these parts can exist in different places, joined together by the log. So, Kafka plays the role of the commit log. A stream processor like Kafka Streams is used to create indexes or views, and these views behave like a form of continuously updated cache, living inside or close to your application. 

As an example, we might consider this pattern in the context of a simple GUI application that lets users browse order, payment, and customer information in a scrollable grid. Because the user can scroll the grid quickly up and down, the data would likely need to be cached locally. But in a streaming model, rather than periodically polling the database and then caching the result, we would define a view that represents the exact dataset needed in the scrollable grid, and the stream processor would take care of materializing it for us. So rather than querying data in a database, then layering caching over the top, we explicitly push data to where it is needed and process it there (i.e., it’s inside the GUI, right next to our code). 

But while we call “turning the database inside out” a pattern, it would probably be more accurate to call it an analogy: a different way of explaining what stream processing is. It is a powerful one. One reason that it seems to resonate with people is that we have a deep-seated notion that pushing business logic into a database is a bad idea. But the reverse—pushing data into your code—opens up a wealth of opportunities for blending our data and our code together. Stream processing flips the traditional approach to data and code on its head, encouraging us to bring data into the application layer—to create tables, views, and indexes exactly where we need them. 

  • What Is Kafka Really? A Streaming Platform 

Kafka is a streaming platform. At its core sits a cluster of Kafka brokers. You can  interact with the cluster through a wide range of client APIs in Go, Scala, Python, REST, and more. There are two APIs for stream processing: Kafka Streams and KSQL. These are database engines for data in flight, allowing users to filter streams, join them,  together aggregate, store state, and run arbitrary functions over the evolving dataflow. These APIs can be stateful, which means they can hold data tables much like a regular database. 

The third API is Connect. This has a whole ecosystem of connectors that interface with different types of databases or other endpoints, both to pull data from and push data to Kafka. Finally, there is a suite of utilities—such as Replicator and Mirror Maker, which tie disparate clusters together, and the Schema Registry, which validates and manages schemas—applied to messages passed through Kafka and other tools. 

 

Summary 

Apache Kafka is distributed streaming platform capable of handling trillions of events a day. Kafka Provides low-latency, high throughput, fault-tolerant publish and subscribe pipelines and is able to process streams of events. 

We went over the basic question of “What Kafka actually is” and learned how it can be used in multiple ways and were introduced to its ever-growing powerful streaming abilities.

Kafka has seen large adoption at thousands of companies worldwide, including a third of the Fortune 500. I hope this introduction helped familiarize you with “What Apache Kafka is” and its potential.   

Resources  

Kafka Documentation : Extensive and high-quality documentation  

Confluent blog : A wealth of information regarding Apache Kafka  

Thank you for taking the time to read this. Clap, share, comment, give me feedback. I’d love to hear your thoughts!  
Happy Learning! 

Author
Latest Blogs

SEND US YOUR RESUME

Apply Now