In a distributed system, the components need to exchange information between them. This can be done in a synchronous way, where a component sends a message, waits for the response and then continue doing stuff; or it can be done in an asynchronous way where the component sends the message and continues doing stuff without waiting for the response.
The book has a nice example, where if a person wants to return a Amazon package, you go to Fedex give them the package, and leave the store. You do not stay in the store until Amazon gets your package and give you a refund. You operate in an asynchronous way.
Distributed systems do the same, clients (known as producers) can give their messages to a messaging service and trust that it will be handled, so the producers can continue to the next step on its logic.
There is a lot of choice on messaging platforms. In the book they use rabbitmq but the basics are the same. So knowledge is transferable.
A messaging system if made up of the following:
Message queues: where the messages will be stored
Producers: send the messages to the queues
Consumers: get/retrieve the messages from the queues
Message Broker: manages one or more queues
message broker
publisher --> [][][][][] <-- consumer
queue1
The message broker will manage the queues. A producer will send messages to a named queue on the broker, multiple producers can send messages to the same queue. The producer will wait until the broker says it got the message to consider the operation complete.
Many consumers can take messages from the same queue. Each message is consumed by exactly one consumer. There are two ways for the consumer to get the message, pull and push.
For pull (also known as polling), the consumer will request from the broker new messages, if it does not have the consumer must ask until the next message arrive.
For push, the consumer tells the broker that it want a message from the queue, the consumer provides a callback function that should be invoked when a message is available. The consumer can then continue doing other stuff, and when the message is available via the callback function, process it. This method is recommended when available.
Finally, there are two type of ways to acknowledge a message had been consumed, automatic and manual. This is needed because the broker usually removes the message form the queue once it has been consumed.
Automatic is as soon as the consumer get the message it is marked as consumed, manual is when the consumer wants to process the message makes sure everything is okay and then it tells the broker that it can go ahead and mark it as consumed.
Usually message brokers have their queues in memory this making faster to read and write from it. But if the server crash, we would lose all the data from the queue.
To prevent this (data safety) queues can configured to be stored in disk too, this makes things a bit slower, but you are a bit more safe against restarts.
You can also set it up to use both, store in disk, but also use memory, so you get the best of both worlds.
Message queues deliver each message to exactly one consumer. But what if we want many consumers to read from one queue. This pattern is known as "publish-subscribe" architecture. In "publish-subscribe" systems, queues are known as topics.
Publishers and consumers are completely decoupled. As a consumer you can just subscribe to the topic you need and get all the messages. This makes the architecture highly extensible as a new subscriber can be added without any changes to the system.
This can also makes processing messages faster since it can be done in parallel.
The drawback is that this of course will create more of a burden on the broker since it has to acknowledge all the consumers receiving the message. The book says that the "push" method described above is better for this.
The broker is a single point of failure, if it dies, you are left without any messages, which is hardly ideal.
Usually message brokers enable queues and topics to be replicated in different hardware. This works by one node being the leader. Publishers will send messages to it, and consumers will consumer from it. But the broker will also have the task to replicate its state to the other nodes. If it fails, consumers can simply continue working with any other node.
Seems like it is a hard to implement algorithm, and the author suggest using existing software.