Introducing the core concepts of Kafka

发布时间 2023-11-09 21:01:08作者: 伯安知心

Introduction

I  have learnt the kafka since 5 years, I believe I learnd somthing, It is on time for improving english. So I decided to pick up my blogs, to writing some concepts of kafka for consolidating memory. By the way,  making my english better. However , this is a series of course, I will explain it one by one.

Today, I want introduce basic information about Kafka, including the differences between kafka and other messaging system, and some core concepts of Kafka.

What is messaging system

Obviously, this is a popular question, I do not want to be very technical to difine the messaging system ,  simply, there is a server, it can handle the message, on the other way, how to difine the message,  it so simply, every can be message,  but at this point, message represents some information is that everyone can understand in bussiness. This is very important , we can understand that in business,  maybe you may say, this is not big deal ,  it very simply,  techically speaking , it not very right , if you are the perpon who is chager of technology,  usually, we are nagging in creating models of databases, the fact is we must put the unordered data of real world in order to store in some relational databases, this mean that we need to not only collect data, but aslo find out the important data or the data is what the business people want it. there is not such issuse, Kafka can store all kinds of information, we can don't care what exactly business people want it , Kafka can store all messages no matter what you want. let us recap , messaging system can store many kinds of information , it is very convenient to statisfy the requirment form business, even meet the needs from future, because all the message stored the messaging system . this is the reason why we usually use kafka to collect the buiness information, we ues kafka as a collector of frontend, even this is not only reason, kafka has many useful features, I will do my best cover all concepts of kafka, now let see it .

In this article , I think we have already known the definition of messaging system, I will highlight the features and characteristics of Apache kafka , that will help us to understand how kafka is better than traditional message server , I will compare the traditional message server RabbitMQ and ActiveMQ.

RabbitMQ

RabbitMQ and Kafka are message queue systems you can use in stream processing, so both of them allow produces to send messages to consumers. Producers are applications that publish information, while consumers are applications that subscribe to and process information, but producers and consumers ineract differently in RabbitMQ and kafka . In RabbitMQ, the producer sends and monitors if the message reaches the intended consumer, on the other hand , kafka producers publish messages to the queue regardless of whether consumers have retrieved them. there is a good metaphor to help you understand them , you can think of RabbitMQ as a post office that receives mail and delivers it to the intended recipients , Meanwhile , kafka is similar to a library, which organizes messages on shelves with different groups that producers publish, then, consumers read the messages from respective shelves and remember what they have read. this is good metaphor, isn't it?

Additionally, In RabbitMQ,  there is a routing key as a message attribute, that is used to route message from an exchange  to a specific queue. When a producer sends a message to an exchange , it includes a routing key as part of message. The exchange then uses this routing key to determine which queue the message should be delivered to . In contrast, kafka is a little simple, producers in kafka assign a message key to each message, then, the kafka broker stores the message in the leading partition of that specific topic, it is nothing to do with consumer or deliver.

Moreover, there are some differents in handling messaging betweent in Kafka and RabbitMQ, in fact , RabbitMQ and kafka are designed for different use cases, let us talk about message consumption. In RabbitMQ, the consumer application takes a passive role and wait for the producer to push the message to queue, which means consumer application will lose data if producers send the message and consumers are inactive. this situation do not happen in kafka , because kafka consumers are more proactive in reading and tracking information, which means kafka consumers get the data from topic anytime , anywhere, anyperson, in reality, when kafka consumer get the data, it will keep track of the last message they have read and update their offset tracker accordingly, an offest tracker is a counter that increments after reading a message, with kafka, the producer is not aware of meassage retrieval by consumers.

let us see the message priority , RabbitMQ brokers allow producer software to escalate certain messages by using the priority queue, the broker processes higher priority messages ahead of normal message, for example , a retail application might queue sales transactions every hour. however , if the system administrator issues a priority backup database message, the broker sends it immdiately. and kafka do not hava priority queue , it treats all messages as equal when distributing them to their respective partitions. there is another thing I need to say, RabbitMQ sends and queues messages  in a specific order, Unless a higher priority message is queued into system , consumers recieve messages in the order they were sent. Meanwhile, kafka prodeuers sends message into specific topic and partition. Because kafka dose not support direct producer-consumer exchanges, the consumer pulls messages from the partition in a different order. 

Finally, there is another different about message deletion, A BabbitMQ broker routes the message to the destination queue. once read, the consumer sends an acknowledgement replay to the broker , which then deletes the message from queue, Unlike BabbitMQ, kafka appends the message to a log file, which remains until its retention period expires, That way, consumers can reprocess streamed data at any time within the stipulated period.

I think we have known the differences between  RabittMQ and kafka, totally, RabbitMQ is a traditional message queue system, to compare the kafka, kafka has a higher performance, and there is few computer language to support the RabbitMQ.  now let see the ActiveMQ.

ActiveMQ

ActiveMQ and kafka are two of most popular open source messaging systems, in short, ActiveMQ is a message broker, kafka is a event streaming platform, why do me say that? although both message broker and event streaming platform can be uesd to implement asynchronous, scalable applications,  but there are some differents about using cases, why I said ActiveMQ was a message broker, what is message broker? A message   broker is a software application  that translates messages between formal messaging protocols,  what that mean formal massging protocols? it means ActiveMQ can support a wide range of messaging protocols including JMS, AMQP, and ,MQTT,  or you can say message brokers enable applications and services to effectively exchange data, even if they are written in different languages or are on different platform, they can provide a standardized flow of data.

If I say kafka is a event streaming , so what that's mean? in a nutshell, event stream prcessing is a programming technique that analyzes continuous data, what that mean event? An event is any change in state tracked by a business system, This is can be anything from a transaction to user navigation on a website, so naturlly , event streams are the ordereds sequence of these business events, Event stream processing manages and stores many related events together, not just one event at a time. Unlike message brokers, which often delete data after it received , event stream data is processed and stored, allowing new consumers to repley events. 

One of the biggest differences between kafka and activeMQ is how they handle messages. kafka not only transfers but is capable of permanently storing messages for multiple applications, while permanent storage is possible, retention time for a given topic can be set to whatever the use case dictates, even down to the millisecond . To avoid unnecessary retention of data, the reigning best pratice is to set retention time to as short a time as the use case allows, kafka has the capability to either preserve or ignore order of messaging , This depends on if a partition key is identified and what what partition method is used, In some cases , the order of message will not be maintained , which can be a preferable configuration depending on the use case, With ActiveMQ, it uses a push-type platform where providers push messages to consumers, Unlike kafka, ActiveMQ can filter the messages so consumers only receive messages they are interested in, it is the responsibility of the producers to ensure the message is delivered , To guarantee that messages are received.

It is also important to note that ActiveMQ cannot ensure that message are received in the same order they were sent, In the event of a failure, message can be duplicated, and always will be received. ActiveMQ is not designed for long-term data storage,  once consumed, the message is temporarily retained using virtual memory but then deleted, also , ActiveMQ can be uesd to easily implement one-time message delivery.

Kafka is a distributed system, which allows it to process massive amounts of data, It won't slow down with the addition of new consumers, due to the replication of partions, kafka easily scales , offering higher availability, but speed is not the only thing to conder , in kafka, producers do not wair for delivery acknowledgement from brokers . brokers can write messages at a high rate causing higher throughout , but data can potentially be lost,ActiveMQ is know for speed when manging small amounts of data to mumerous consumers, and is ofen picked for systems requiring lower throughout of messages.

Conclusion

I think all of us have already know the differences betweent kafka and AbbitMQ, between kafka and ActiveMQ. To sum up,  we can learn kafka following these aspects:

Scalability: kafka is designed to handle high throughput, low latency, and high scalability. It uses a publish-subscribe model and is built on a distributed architecture, allowing it to handle large amounts of data and handle high levels of consurrency.RabbitMQ excels in single broker implementation and is typically used for simple scenario. ActiveMQ , on the other hand, is designed for more traditional messaging scenarios and may not be as well suited for extremely high scalability.

Durability: kafka's message are written to disk and replicated across multiple nodes, providing a high level of durability in case of node failures. In RabbitMQ, messages in BabbitMQ are acknowledgment-based, meaning they are deleted as soon as they are acknowleged by a client,  messages are deteted once sucessfully acknowledged by a consumer . In order for multiple consumers to get the same message, multiple queues have to be created , ActiveMQ also provides a high level of durability, but it may not be as robust as kafka in certain scenarios.

Performance: kafka has been designed for very high performance and can handle millions of messages per scecond , RabbitMQ I have to say RabbitMQ is not designed into higher performance,it has a small broker and dump consumer broker, all the routing and decisions are made in the broker, so it is not hard to understand RabbbitMQ dose not have a higher performance , ActiveMQ also provides good performance, but it may not be as fast as kafka in certain scenarios.

Latency: kafka has a lower latency than ActiveMQ as it uses a zero-copy design and a memory-mapped file system ,RabbitMQ has a low latency in a small amount of data  but if a large amount of data kafka is better,ActiveMQ latency is higher as it requires message to be copied between different layers of system.

Use case: kafka is often used in big data and streaming scenarios, such as real-time data pipelines and event-driven architectures,RabbitMQ's message queuing capabilities make it an excellent choice for building decouped and asynchronous systems by enabling loose coupling between components and facilitating fault tolerance and scalability, applications that require reliable message delivery , like order processing systems, email notifications, and task scheduling systems, can leverage RabbitMQ to ensure message persistence and quaranteed delivery . Facebook uses RabbitMQ to implement its real-time chat system ,This allows Facebook users to communicate with each other in real time, ActiveMQ is more commonly used in traditional messaging scenarios, such as enterprise application integration and message-oriented middleware.

Partitioning: kafka supports partitioning of messages across multiple servers which allows it to scale horizontally, Both of  RabbitMQ and ActiveMQ do not have built-in support for partitioning,

I hope this blog can help you to know some knowledge about this messaging system. see you!