
Choosing Between Kafka, RabbitMQ, and REST
Series
Kafka Mastery
1 of 3 in the series
A series about using Kafka in real JVM systems, with each article anchored on a concrete failure mode or design decision.
Choosing between Kafka, RabbitMQ, and REST is a design decision, not a stack preference. This article gives a decision rule that holds up before a line of code is written.
Most Kafka pain starts before any Kafka configuration is written. It starts with a tool decision that was never really made.
Before the rest of this series is useful, the choice to use Kafka has to survive a second look.
The Wrong Starting Point
Most Kafka tutorials start with a producer, a consumer, and a topic. That order assumes the decision to use Kafka has already been made. In practice, the decision is usually inherited from a meeting where someone said "we should decouple this" and nobody pushed back.
The result is a system that uses Kafka for problems Kafka does not solve, while quietly accepting all of Kafka's operational cost: consumer lag, rebalancing, offset management, schema breakage, dual-write, observability gaps. Each of those is a real engineering problem. None of them are problems a REST call has.
The senior-level question is not "how do I use Kafka." It is "what is the smallest tool that solves this communication problem, and would Kafka actually pay for itself here."
Three Tools, Three Different Jobs
REST, RabbitMQ, and Kafka are often grouped under "ways services talk to each other." That grouping hides the differences that matter.
REST: Synchronous Request and Response
REST is a request and a response. The caller waits. The connection is open while the work happens. The caller learns the outcome immediately, including failures.
Client -> POST /orders -> Server -> 201 CreatedREST is the right answer when:
- The caller needs the result before continuing
- The work is short enough to fit inside an HTTP timeout
- One service is asking another for an answer
REST has no replay, no buffering, and no fan-out. If the receiver is down, the call fails.
RabbitMQ (Classic Queue): Task Distribution
A classic RabbitMQ queue is a work queue. A producer enqueues a task. One consumer in a pool picks it up, processes it, and acknowledges it. The message is then removed.
Producer -> [ task ] [ task ] [ task ] -> one of N workersRabbitMQ is the right answer when:
- Work needs to happen reliably but not synchronously
- The task has one logical owner: one job, one worker, one outcome
- Backpressure and retry need to be queue-managed
- Throughput is moderate
A classic queue is not an event log. Once a task is consumed, it is gone. Other systems cannot react to the same task later.
Kafka: Event Log With Independent Consumers
Kafka is an append-only log of events partitioned across brokers. Producers write events to topics. Consumers read at their own pace, track their own position with offsets, and can replay history within the retention window.
Producer -> [ event log ] -> consumer group A
-> consumer group B
-> consumer group CKafka is the right answer when:
- Multiple independent systems need to react to the same event
- Events have value beyond the moment they happen (audit, analytics, replay)
- Throughput is high or the event volume is sustained
- Consumers evolve at different rates and need to read history
Kafka is not a better REST. It is not a better RabbitMQ. It is a different tool that happens to share a network protocol with both.
A Decision Table
This table is scoped to classic RabbitMQ queues, not RabbitMQ Streams or fanout exchange designs. Those change parts of the picture and are out of scope here.
| Use case | REST | RabbitMQ (classic) | Kafka |
|---|---|---|---|
| Request and response, user is waiting | Yes | Limited | No |
| Background jobs, one worker per task | No | Yes | Limited |
| Event streaming with multiple consumers | No | Limited | Yes |
| Replay past events | No | No | Yes |
| High-throughput sustained event flow | No | Limited | Yes |
| One service asking another for data | Yes | No | No |
Most production communication patterns map cleanly onto exactly one row.
The One-Line Decision Rule
Use Kafka when multiple systems need to react independently to the same event over time. Not when one system just needs a response from another.
If a feature can be described as "A asks B for X," it is a REST call. If it can be described as "A hands off a job to be done once," it is a queue. If it can be described as "A publishes that something happened, and several systems care about it now or later," it is an event log.
A Failure Story: Kafka Where REST Was Enough
Consider an order service handling about 50 requests per second over REST. The flow is short: validate the order, write it to PostgreSQL, return 201 Created. End-to-end latency sits comfortably under 100ms. The team wants to "scale" and "decouple."
They introduce Kafka. The order endpoint now writes the order to PostgreSQL and publishes an OrderCreatedEvent. Downstream services consume the event to send confirmation emails and update an internal dashboard. The HTTP response no longer waits for those things.
Six months in, the team is dealing with:
- A 200 to 800ms tail in confirmation email delivery, blamed on consumer lag during peak hours
- A duplicate email incident traced to an offset commit failure after a deploy
- A reconciliation job written to find orders whose
OrderCreatedEventnever published due to a crash between the database commit and the Kafka send - An internal dashboard that shows stale data after every consumer-group rebalance
- A new on-call rotation specifically for Kafka health
None of those problems existed under REST. The system handled 50 requests per second on a thread-per-request model and was easy to reason about. Kafka did not scale the system. It added every operational cost Kafka brings, in exchange for solving a coupling problem that was never really there.
The honest postmortem is not "we configured Kafka wrong." It is "Kafka was the wrong tool for this problem."
When Kafka Earns Its Complexity
A short checklist worth running through before adopting Kafka:
- Multiple systems need to react to the same event, and that list is expected to grow
- At least one consumer needs to replay events later (audit, debugging, late-arriving systems)
- Sustained event volume is high enough that synchronous coupling becomes a bottleneck
- Producers and consumers evolve on independent schedules
- Eventual consistency is acceptable for the affected behaviors
If most of those are true, Kafka is the right tool and the rest of this series is for you. If only one or two are true, REST or a simple queue is probably the better answer, and Kafka can be added later when the second consumer actually exists.
A Lightweight ADR For This Decision
The decision deserves a paper trail. A short ADR keeps the reasoning visible the next time someone asks "why did we pick Kafka here."
# ADR: Communication model for the order flow
status: accepted
context: |
The order flow needs to inform multiple systems that an order
was created: payments, notifications, analytics. Today only
payments needs it. Notifications is planned this quarter.
Analytics is a maybe.
decision: |
Use Kafka for OrderCreatedEvent.
Reasons:
- More than one consumer is expected within the next quarter
- Replay is required for the analytics use case if it ships
- Event volume is projected to grow with order growth
consequences:
- We accept the operational cost of running Kafka in production
- We accept that publishing is asynchronous and best-effort
until the Outbox Pattern is introduced
- We accept the need for consumer-side idempotency
alternatives_considered:
- REST callbacks: rejected, does not support replay or fan-out
- Classic queue: rejected, does not support multiple
independent consumers cleanlyA worked rejection looks like this:
# ADR: Communication model for the user-profile read flow
status: accepted
context: |
The product page needs the user's display name and avatar.
The data lives in the user service.
decision: |
Use REST. The product page calls the user service synchronously.
consequences:
- Latency depends on the user service. Cache where it matters.
alternatives_considered:
- Kafka: rejected, no fan-out requirement and no replay value.
Pushing this through Kafka would add operational cost
for no behavioral benefit.The point of these ADRs is not formality. It is that "we use Kafka here" stops being a stack convention and becomes a decision with reasons.
What Most People Get Wrong
Three claims worth pushing back on:
- Kafka is a better REST. It is not. REST gives the caller an answer. Kafka gives the caller a write to a log.
- Async is inherently scalable. It is not. Asynchronous systems move the bottleneck. They rarely remove it. Consumer lag, retries, and dead-letter queues all need capacity too.
- Decoupling is free. It is not. Decoupled systems are harder to debug, harder to test, and harder to operate. The price is paid by every engineer who reads the system later.
If a system can be reasoned about as a chain of REST calls, the cost of Kafka is unlikely to be justified.
What Comes Next
The next article builds the smallest Kafka system worth running: a Spring Boot producer, a Spring Boot consumer, a single-broker cluster in KRaft mode, and Kafka UI from the first commit. The argument is not "build your first producer." It is that running Kafka without local visibility tooling is the fastest way to make every later debugging session ten times harder.