Asynchronous messaging has become ubiquitous in software systems. "Publish an event" or "send a message" are commonly heard terms in all design discussions. However, less clear is the difference between words like event, message, notification etc, which builds up into large scale ambiguities when we start using messaging patterns to compose larger architectural patterns like workflow orchestration or event sourcing.
In this article I'm going to lay out some of these frequently used terms, what they mean "exactly", and where/how they should be used. These are not new things, more famous people than me have said it repeatedly, so hopefully my repetition will contribute further to cementing of the industry jargon and bring deeper understanding and context to the discussion.
The infra side of the world
Let's first consider the infrastructure side of messaging world which physically enables messaging.
A record is basically anything that we put on the message broker. From the broker's perspective, it is only a blob of bytes which must be delivered to one or more consumers with a certain delivery guarantee. Beyond this, a record carries no semantic meaning - most brokers never open a record put on them beyond some broker specific metadata (aka headers) which define how the broker should treat the message. This typically includes things like TTL, persistence properties, shard keys for distributed brokers (e.g. partition key for Kafka) etc.
The idea of a dumb "record" is important because it helps us keep in mind there are perspectives where the intent of messaging means nothing. This is the infra platform's neutral perspective of the system - things in a Kafka topic are not intrinsically events or messages or commands, they are just network traffic.
Message Broker/Message Bus/Event Bus
By these or any other name, a message broker is the infrastructure that encapsulates the "queue" part of the messaging architecture. It may be an in-memory system (a wrapper around ArrayBlockingQueue in Java) or a persisted distributed commit log based system like Kafka or something in-between like RabbitMQ/ActiveMQ etc. The broker may give some guarantees around how records will be ordered. Regardless of the implementation, the message broker houses the data sent out by the publishers and makes sure it is delivered to its consumers.
The broker also gives certain type of "delivery guarantees" about how a record is taken form publisher to consumer. There are three main guarantees : best-effort, at-least-once, and at-most-once. Each of these is a subject of considerable theoretical investigation and must be carefully evaluated by the users of the broker to fit their use-cases.
A queue represent a single logical "pipe" of records between publisher of asynchronous data and its consumer. Sometime a single physical pipe in the broker can represent multiple logical queues. Each broker has a different way of defining a queue. RabbitMQ maps an exchange + a routing key to physical queues and every queue created and mapped to a exchange+routing key combo gets a physically different copy of every record. Kafka, on the other hand, keeps a single copy of message in a "topic" which multiple consumers can subscribe to and consume independent of other consumers of the group.
The application side of the world
Moving on to the fun stuff now! Let's talk about the words application developers use when using asynchronous messaging.
As I've written before on this blog, An event is a broadcast, triggered by a system when something happens in its business domain that its wants to tell the outside world. The language of the broadcast is the business domain language of the publisher. Other systems that are interested in learning about these events have to subscribe to the broadcast and interpret the publisher's domain language to act on it. However, even though the publisher is not explicitly aware of who is consuming his event, the mere fact that there is an externally accessible event stream make it a part of the publisher's public interface and a ny change to event structure or meaning should follow the same change management protocols as changes to synchronous APIs.
A message represents directed asynchronous communication from one system to another and is typically expressed as a command (a "do-this" message), which is why "message" and command" are often used interchangeably. Since the communication is peer to peer, this is essentially the asynchronous version of a request-repsonse interaction and similar failure handling semantics can be employed (retries after sometime by re-sending the message, idempotency in the consumer etc.)
This is a lesser used pattern in messaging - one where a system requests (queries) data from a remote system by sending a query over a queue. I have only encountered this pattern in orchestration scenarios using workflow orchestration tools like Camunda, JBPM etc to implement workflows for stitching together data and actions across multiple services.
We may have a use-case to get data from one service and then invoke another service per-element of the retrieved data. The second piece is clearly a candidate for a message/command style interaction, but what about the first. Usually we would invoke a synchronous API on the first service to get the data. However, this suffers from well known brittleness of temporal coupling. However, if the workflow system were stateful, we could issue an asynchronous query (tagged with a unique id) to the first service and pause the workflow. The service would then respond back (also asynchronously) with the outcome of the query and maintaining the original id. The workflow system can now map the incoming response with the open request. This is a message style system ("do-search" request/response) which can scale significantly better.
Another flavour of this is encountered in building actor model based systems. There this pattern is even more explicit because actors can only interact via messaging, and they are necessarily stateful. A workflow composed of actors is therefore stateful and asynchronous by definition. What if one actor wants another actor to give it some data. Ideally, we would not have this dependency among independent actors, but let's say it can't be removed.
In Akka, we would use the "ask" pattern for this, which is nothing but an asynchronous message sent to the second actor with the first actor waiting for response. The second actor responds with a direct message to the first actor containing the required data.
This covers the exact concepts of messaging architectures. These words have a specific meaning, work in specific ways, and result in very different architectural outcomes. Having this specificity in our heads helps in making clearer decisions about what an emitted "record" is expected to achieve. I've seen teams emit generic business events 5 times with slightly different syntax to onboard 5 different use-cases when in fact a single event would have sufficed. I've worked in teams that refused to build P2P messages because they had an event stream and everything had to be done via that. A thing often becomes what you keep calling it, and I believe having an explicit understanding of messaging paradigms helps us build better systems with clearer responsibilities.