Morteza Taghdisi

Writing9 min read
Abstract technical illustration of three services communicating through Kafka events
Software EngineeringApril 10, 2026

From REST Calls to Event Flows

Series

Kafka Mastery

3 of 3 in the series

Article 3 of 3

A series about using Kafka in real JVM systems, with each article anchored on a concrete failure mode or design decision.

Moving an order flow from REST calls to Kafka events forces two things into the open: which business state is real, and the gap between a database commit and a Kafka publish.

kafkaevent-drivenspring-bootarchitectureoutbox

Moving a flow from REST calls to Kafka events forces two things into the open that synchronous design papers over: which business state is actually true at any given moment, and the gap between a database commit and a Kafka publish.

This article works through both using an order flow with three services and two topics.

The Synchronous Starting Point

A typical first version of an order flow is a chain of REST calls.

plaintext
POST /orders -> Order Service
                    |-- saves order
                    |-- POST Payment Service /charge
                    |        |-- charges card
                    |        |-- 200 OK
                    |-- POST Notification Service /confirm
                    |        |-- sends email
                    |        |-- 200 OK
                    |-- 201 Created back to client

Two things are appealing about this. The behavior is obvious from reading top to bottom. The caller learns the outcome before the response returns.

Two things break under load. Payment and notification latency are now part of the order endpoint's latency. Any failure downstream surfaces as an HTTP error to the user, even when the right behavior is "we will keep trying."

The decision to move to events is rarely about throughput. It is about decoupling outcomes from request paths and giving multiple systems a consistent way to learn that an order happened.

The Event-Driven Redesign

The same flow rebuilt around events looks like this.

Mermaid Diagram
Diagrams are rendered from Mermaid source so they stay editable, selectable, and theme-aware.

Three services, two topics, one direction of flow.

  • The Order Service writes the order and publishes OrderCreatedEvent.
  • The Payment Service consumes OrderCreatedEvent, calls the provider, then publishes PaymentCompletedEvent.
  • The Order Service consumes PaymentCompletedEvent and moves the order to CONFIRMED.
  • The Notification Service consumes PaymentCompletedEvent and sends the email.

Notice that the Notification Service does not consume OrderCreatedEvent. The customer should be told the order is confirmed, not that it was received. Picking the right event for the right side effect is half of event-driven design.

Why Business State Has To Move With The Flow

In the REST version, "the order is confirmed" was implied by 201 Created. The caller waited until everything succeeded, so the order ending up in the database meant payment had succeeded too.

In the event-driven version, that implication breaks. If the Order Service writes the order as CONFIRMED immediately, two things that have not happened yet are being claimed:

  • The payment has not been charged
  • The notification has not been sent

The right business state on creation is something honest, not something optimistic. PENDING_PAYMENT or ACCEPTED are both fine. The state transitions to CONFIRMED only when PaymentCompletedEvent is consumed.

This is not a Kafka rule. It is a design rule that becomes obvious as soon as the system is asynchronous.

Order Service: Honest States

java
public enum OrderStatus {
    PENDING_PAYMENT,
    CONFIRMED,
    PAYMENT_FAILED,
    CANCELLED
}
java
@Service
public class OrderService {
 
    private final OrderRepository orders;
    private final OrderProducer producer;
 
    public OrderService(OrderRepository orders, OrderProducer producer) {
        this.orders = orders;
        this.producer = producer;
    }
 
    @Transactional
    public UUID create(CreateOrderRequest request) {
        Order order = new Order(
            UUID.randomUUID(),
            request.customerId(),
            request.amountCents(),
            request.currency(),
            OrderStatus.PENDING_PAYMENT,
            Instant.now()
        );
        orders.save(order);
 
        // Best-effort publish. The dual-write window is real
        // and is solved with the Outbox Pattern later in this series.
        producer.publish(OrderCreatedEvent.from(order));
 
        return order.id();
    }
}

The endpoint returns 202 Accepted. The body says PENDING_PAYMENT. The UI tells the user the order is being processed.

java
@KafkaListener(topics = "payment.completed", groupId = "order-service")
public void onPaymentCompleted(PaymentCompletedEvent event) {
    orders.findById(event.orderId())
        .ifPresent(order -> orders.save(order.confirm(event.completedAt())));
}

CONFIRMED is set by the same code path that knows payment completed. There is no longer a place in the system where confirmation is asserted before it is true.

Payment Service: Consume, Charge, Publish

java
@KafkaListener(topics = "order.created", groupId = "payment-service")
public void onOrderCreated(OrderCreatedEvent event) {
    PaymentResult result = provider.charge(
        event.orderId(),
        event.amountCents(),
        event.currency()
    );
    if (result.succeeded()) {
        producer.publish(PaymentCompletedEvent.from(event, result));
    } else {
        producer.publish(PaymentFailedEvent.from(event, result));
    }
}

Two things in this listener are simplified for the article. Real payment integrations need an idempotency key forwarded to the provider so retries do not double-charge, and they need a deduplication store to handle Kafka's at-least-once delivery. Both are covered later in this series. The listener here keeps the focus on the flow.

Notification Service: One Consumer, One Job

java
@KafkaListener(topics = "payment.completed", groupId = "notification-service")
public void onPaymentCompleted(PaymentCompletedEvent event) {
    emailGateway.sendConfirmation(event.orderId(), event.completedAt());
}

Two consumer groups read payment.completed independently: order-service and notification-service. Each tracks its own offsets. Each can lag, fail, and recover independently. That independence is exactly what Kafka offers and what REST does not.

What Changes For The Developer

Three shifts worth naming explicitly:

  • Publishing returns no business outcome. The producer's success means the event reached Kafka. It does not mean any consumer has acted on it. UIs and APIs need to reflect that.
  • Errors move to the consumer. Failures during charging are not HTTP errors anymore. They are events on a different topic, handled by the Order Service or surfaced through reconciliation.
  • Reasoning becomes per-service, not per-request. "What does this endpoint do" is replaced by "what does this service consume, and what does it produce."

The right mental model is closer to a distributed state machine than a call graph.

The Failure Scenario, In Two Layers

A version of this system shipped with a different design choice. The Order Service writes the order as CONFIRMED on creation. A separate path on the database write triggers the confirmation email. Then the service publishes OrderCreatedEvent.

The Payment Service consumer is down for 20 minutes during a deploy. While it is down:

  • 312 customers place orders
  • 312 rows land in the orders table with status = CONFIRMED
  • 312 confirmation emails go out
  • 312 payments are never attempted, because the Payment Service is the only thing that calls the provider

Reconciliation, the next day, finds 312 confirmed orders with no matching payment. The team patches the Payment Service, restarts it, and the events that did publish are processed. But there is also an unknown subset of orders for which the publish itself failed (process killed during deploy, transient broker error, dropped acknowledgment). Those will never be charged without a manual sweep of the database against Kafka.

Two distinct bugs, often conflated:

  1. Wrong business invariant. Marking the order CONFIRMED before payment was a design error. The right state was PENDING_PAYMENT. This is free to fix and has to be fixed before anything else.
  2. Dual-write window. Even with the correct state, the Order Service still writes to PostgreSQL and then publishes to Kafka in the same code path. Those are two operations with no atomicity. A crash between the database commit and the Kafka send leaves the database with an order that no event ever announced.

The first bug is fixed in this article. The second bug is real, persistent, and lives in every example through the broker reliability articles. It is solved properly later in this series with the Outbox Pattern.

Every example from here through the broker reliability articles still publishes events from a service that also writes to a database. Even with the corrected invariant, the publish itself is best-effort. Treat it that way until the Outbox Pattern is introduced.

Tradeoffs The Team Accepts By Going Event-Driven

  • Latency. A confirmation email no longer arrives synchronously. Under healthy operation it lands within a few seconds. Under consumer lag it can take longer.
  • Debuggability. A request no longer has a single trace. It has a producer, an event, one or more consumers, and the choice of correlation IDs is now a real design decision.
  • Operational surface. Consumer-group lag, rebalance cost, schema management, idempotency, and dual-write all become live concerns.
  • Reliability semantics shift. Publishing is best-effort. Consumers are at-least-once by default. Each side effect needs to be designed for that.

These are not reasons to avoid Kafka. They are the reasons every later article in this series exists.

Suggested Module Shape

kafka-series - three-service layout
kafka-series/
order-service/# owns order state, publishes OrderCreatedEvent
src/main/java/com/example/order/
OrderController.javaentry# POST /orders
OrderService.java# business write + publish
OrderRepository.java# PostgreSQL persistence
PaymentCompletedListener.java# moves order to CONFIRMED
payment-service/# charges provider, publishes PaymentCompletedEvent
src/main/java/com/example/payment/
OrderCreatedListener.javaentry# entry point
PaymentProvider.java# external API client
notification-service/# sends confirmation on PaymentCompletedEvent
src/main/java/com/example/notification/
PaymentCompletedListener.javaentry
EmailGateway.java
common-events/# shared event records
OrderCreatedEvent.java
PaymentCompletedEvent.java
PaymentFailedEvent.java
docker-compose.ymlrequired# Kafka KRaft + Kafka UI + PostgreSQL
directory.kt.yaml / .gradle.xml.ts / .jsother

common-events is its own module on purpose. Sharing event types across services through copy-paste is the fastest way to end up with three slightly different definitions and silent serialization bugs.

What Most People Get Wrong

  • Treating Kafka like a synchronous call with extra steps. Publishing an event is not a request. It returns no business outcome. UIs and downstream services have to reflect that.
  • Defining no clear event ownership. Every event has exactly one producer. If two services publish the same event, there is no source of truth for what it means or when it changes.
  • Promoting state before the underlying domain confirmed it. CONFIRMED is a claim. Make it only when payment actually completed.
  • Ignoring the dual-write window. A database commit followed by a Kafka publish is two operations. The window between them is a real bug class. The fix for the invariant is free. The fix for the dual-write comes later in this series. Until then, the publish is best-effort.

What Comes Next

The next article goes underneath the topic abstraction. Partitions, offsets, replication, min.insync.replicas, acks, and unclean.leader.election.enable are not configuration trivia. They are the difference between a Kafka cluster that loses data quietly and one that refuses to and tells operators about it. The article also names the failure modes by name, including NotEnoughReplicas and NotEnoughReplicasAfterAppend, so they are recognizable in production logs.