Delivery Semantics and Idempotency in Kafka

This is Part 6 of the Kafka Mastery series. Parts 1 to 5 covered when to use Kafka, how to run it, how to design event flows on top of it, how it stores data, and how consumer groups behave. This part deals with the question that produces the most overclaiming in Kafka conversations: how often will my events actually be delivered.

The answer has three lanes, and most production systems need a different one than they think.

The article uses Apache Kafka 3.9, Spring Kafka 3.3, and Java 21.

Three Delivery Guarantees

Kafka offers three delivery semantics, and the difference between them is one configuration choice and one design decision.

At-most-once. The consumer commits the offset before processing. If processing fails or the JVM dies, the record is lost. Used almost nowhere on purpose.
At-least-once. The consumer commits after processing. If processing succeeds and the commit fails, the record is replayed on the next start. This is the practical default and what every Spring Kafka consumer does unless explicitly configured otherwise.
Effectively-once. Kafka transactions plus the right consumer configuration give exactly-once for Kafka read-process-write flows. The qualifier is important. The guarantee covers Kafka topic to Kafka topic. It does not extend to anything outside Kafka.

The interesting failure modes live in lane two. At-least-once is safe only when the consumer is idempotent. Without that, "at-least-once" means "at-least-once-and-potentially-many-times."

Three Idempotency Lanes

Different problems need different lanes. Mixing them up is where overclaiming starts.

Kafka-only EOS. A service reads from one Kafka topic, transforms, and writes to another. Everything stays inside Kafka. Use Kafka transactions plus a transactional listener container plus read_committed consumers downstream.
Database-and-Kafka consistency. A service writes to a database and publishes an event in the same code path. Kafka transactions do not cover the database. Use the Outbox Pattern (Part 10).
External side effects. A consumer takes an event and calls an HTTP API, sends an email, charges a card, updates a third-party system. Kafka transactions do not cover any of those. Use consumer-side idempotency: a stable event ID, a deduplication store, and an idempotency key forwarded to the downstream provider when it supports one.

This article covers lanes 1 and 3 in depth. Lane 2 is Part 10.

What Makes Read-Process-Write EOS Actually Work

The first overclaim is "set transactional.id and you have exactly-once." That is the start, not the end.

A correct Kafka-only EOS configuration needs:

Producer idempotence. enable.idempotence=true. This deduplicates producer retries within a single producer session.
A transactional ID. transactional.id set on the producer, with a unique prefix per application instance. Same prefix on two instances at the same time will fence one of them with ProducerFencedException.
acks=all. Required for transactional producers.
Offsets sent to the producer transaction. This is the crucial mechanic. The consumer's offsets must be committed inside the producer transaction, not separately. If they are committed separately, the read step is at-least-once and the entire chain stops being exactly-once.
isolation.level=read_committed on downstream consumers. Otherwise consumers see records from in-flight (and aborted) transactions.

In Spring Kafka, the transactional listener container handles the offset mechanic. It uses producer.sendOffsetsToTransaction() so consumed offsets and produced records commit atomically. That is what makes the loop exactly-once for Kafka outputs.

yaml

spring:
  kafka:
    producer:
      transaction-id-prefix: "tx-payment-projector-"
      acks: all
      properties:
        enable.idempotence: true
    consumer:
      isolation-level: read_committed
      enable-auto-commit: false

Setting transaction-id-prefix is the line that triggers Spring Boot auto-configuration of a KafkaTransactionManager. The default ConcurrentKafkaListenerContainerFactory picks it up. A custom container factory has to bind it explicitly.

The container factory binds the transaction manager. The producer factory's value type covers anything this projector publishes, so it is parameterised on Object and the producer-side serializer (Avro in later parts, JSON here) handles the actual class.

java

@Configuration
public class TransactionalConsumerConfig {
 
    @Bean
    public ConcurrentKafkaListenerContainerFactory<String, OrderCreatedEvent>
        transactionalFactory(ConsumerFactory<String, OrderCreatedEvent> cf,
                             KafkaTransactionManager<String, Object> tm) {
 
        ConcurrentKafkaListenerContainerFactory<String, OrderCreatedEvent> factory =
            new ConcurrentKafkaListenerContainerFactory<>();
        factory.setConsumerFactory(cf);
        factory.getContainerProperties().setKafkaAwareTransactionManager(tm);
        return factory;
    }
}

java

@Component
public class PaymentProjector {
 
    private final KafkaTemplate<String, Object> kafkaTemplate;
 
    public PaymentProjector(KafkaTemplate<String, Object> kafkaTemplate) {
        this.kafkaTemplate = kafkaTemplate;
    }
 
    @KafkaListener(topics = "order.created",
                   groupId = "payment-projector",
                   containerFactory = "transactionalFactory")
    public void onOrderCreated(OrderCreatedEvent event) {
        PaymentProjection projection = project(event);
        kafkaTemplate.send("payment.projection", event.orderId().toString(), projection);
        // Offsets are committed inside the transaction by the container.
    }
 
    private PaymentProjection project(OrderCreatedEvent event) {
        return new PaymentProjection(event.orderId(), event.amountCents(), event.currency());
    }
}

KafkaTemplate<String, Object> and KafkaTransactionManager<String, Object> are auto-configured by Spring Boot once spring.kafka.producer.transaction-id-prefix is set. The producer factory uses the value serializer from application.yml.

This is exactly-once between two Kafka topics. It is also the only thing Kafka transactions cover.

Local Infrastructure Requirements

Kafka 3.9 defaults transaction.state.log.replication.factor=3 and transaction.state.log.min.isr=2. A single-broker dev box cannot initialize a transactional producer at all under these defaults. The transactional demos for this article run on the three-broker cluster introduced in Part 4.

A single-broker fast path is possible only with explicit local-only overrides:

yaml

KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1

These are non-production by definition. The article ships both compose files and labels them.

What Kafka Transactions Do Not Cover

This is the honest list. Kafka transactions do not extend to:

HTTP calls to external APIs
Email or SMS gateways
Third-party payment providers
External databases that are not coordinated through Kafka
Filesystem writes
Anything that is not a Kafka topic

Once a side effect leaves Kafka, "exactly-once" stops being a Kafka property. The honest framing is "effectively-once," and the work of getting there is consumer-side.

Producer Idempotence Without Transactions

Producer idempotence on its own is useful and limited. Setting enable.idempotence=true deduplicates producer retries within a single producer session. If the broker acknowledges a write and the producer never sees the ack and retries, the broker recognises the duplicate by sequence number and discards it.

What it does not give you:

Cross-session deduplication. A new producer process is a new session.
Dual-write atomicity. The database write and the Kafka publish are still separate operations.
External side-effect protection. The downstream API still gets called twice if the consumer reprocesses the record.

enable.idempotence=true is a free correctness improvement. It is not exactly-once.

The Failure Scenario: Duplicate Payment From At-Least-Once Without Idempotency

A payment consumer reads OrderCreatedEvent from Kafka, calls an external payment API, and commits the offset.

A run goes like this:

The consumer reads event E1.
The consumer calls provider.charge(orderId, amount, currency). The API responds 200 OK. The card is charged.
The consumer attempts to commit offset for E1.
The network between the consumer and the broker blips. The commit fails.
The consumer is restarted by the orchestration platform.
On restart, the consumer reads from the last committed offset, which is before E1.
The consumer calls provider.charge again. The card is charged again.
The customer is charged twice.

Nothing in Kafka was misconfigured. acks=all, idempotent producer, healthy broker. The system did exactly what at-least-once delivery says it does. The bug was the assumption that at-least-once was safe without consumer idempotency.

External-API Idempotency: Concrete Implementation

The fix has three parts:

A stable event_id set at publish time by the producer (or sourced from the outbox in Part 10).
A processed_events table in the consumer's database, with event_id as primary key.
An idempotency key forwarded to the payment provider when the provider supports one.

sql

CREATE TABLE processed_events (
    consumer     TEXT NOT NULL,
    event_id     UUID NOT NULL,
    processed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (consumer, event_id)
);

A composite primary key on (consumer, event_id) keeps the table safe to share across services. A single event_id primary key would force one dedup table per consumer, which is also a valid design but worth choosing on purpose.

java

@Service
public class PaymentConsumer {
 
    private final ProcessedEventStore processedEvents;
    private final PaymentProvider provider;
    private final TransactionTemplate tx;
 
    public PaymentConsumer(ProcessedEventStore processedEvents,
                           PaymentProvider provider,
                           TransactionTemplate tx) {
        this.processedEvents = processedEvents;
        this.provider = provider;
        this.tx = tx;
    }
 
    @KafkaListener(topics = "order.created", groupId = "payment-service")
    public void onOrderCreated(OrderCreatedEvent event, Acknowledgment ack) {
        boolean alreadyProcessed = processedEvents.exists(event.eventId());
        if (alreadyProcessed) {
            ack.acknowledge();
            return;
        }
 
        // Forward idempotency key to the provider so a retry inside the
        // provider does not double-charge either.
        provider.charge(
            event.orderId(),
            event.amountCents(),
            event.currency(),
            event.eventId().toString() // provider's idempotency key
        );
 
        // Record the event as processed in the same transaction as any
        // local state change. For an external API with no local DB write,
        // record-then-ack is the simplest design with a known recovery
        // story for the small window between provider success and DB write.
        tx.executeWithoutResult(status ->
            processedEvents.save(event.eventId(), "payment-service")
        );
 
        ack.acknowledge();
    }
}

The listener takes an Acknowledgment, which only works if the container is configured for manual ack mode. The default Spring Boot factory is not, so a small piece of config is required:

java

@Configuration
public class PaymentConsumerConfig {
 
    @Bean
    public ConcurrentKafkaListenerContainerFactory<String, OrderCreatedEvent>
        manualAckFactory(ConsumerFactory<String, OrderCreatedEvent> cf) {
 
        ConcurrentKafkaListenerContainerFactory<String, OrderCreatedEvent> factory =
            new ConcurrentKafkaListenerContainerFactory<>();
        factory.setConsumerFactory(cf);
        factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL_IMMEDIATE);
        return factory;
    }
}

The listener then references this factory:

java

@KafkaListener(topics = "order.created",
               groupId = "payment-service",
               containerFactory = "manualAckFactory")
public void onOrderCreated(OrderCreatedEvent event, Acknowledgment ack) { ... }

A few things worth being explicit about:

The deduplication check happens before the side effect. A consumer that records first and acts second has a different recovery story (see below).
The idempotency key forwarded to the provider matters. If the consumer succeeds in calling the provider and then crashes before recording the event as processed, the next attempt would call the provider again. A provider with idempotency-key support recognises the duplicate and returns the original outcome. Without that, the consumer needs reconciliation to find the duplicate and the provider needs a refund.
If the side effect were a local database write, putting the side effect and the processed_events insert in the same transaction is the cleanest design. Cross-process side effects are different and need the idempotency-key forwarding as the safety net.

Backpressure And Downstream Protection

A consumer that catches up after a lag spike can fire records at the downstream provider faster than the provider can handle. That is how a Kafka issue turns into a downstream outage.

Two defences worth designing for:

Rate limiting on the consumer. A token bucket sized for what the downstream actually supports. Catch-up surges become smooth flow instead of a thundering herd.
Circuit breakers around the provider call. If the provider starts failing, the consumer pauses or short-circuits instead of hammering it. Spring Cloud Circuit Breaker, Resilience4j, or hand-rolled all work.

A small reconciliation job rounds out the safety net. Reconciliation finds events whose outcome is ambiguous: the provider accepted the call but the consumer crashed before recording it, or the provider's response was lost. Idempotency prevents double charges. Reconciliation finds unfinished ones. In payment-class systems, both are required.

Minimum Observability

Three metrics worth shipping with any consumer using this pattern:

A duplicate-detection counter (incremented when processed_events.exists returns true). A spike means upstream replay or producer retries.
Deduplication-table growth. Unbounded growth eats database space and slows lookups. Plan a retention or partitioning policy.
Provider idempotency-key reuse rate. Spikes here are the same signal from the other side.

Part 12 returns to observability across the system. For this article, the rule is: if the consumer claims to be idempotent, the duplicate-detection counter is the metric that proves it.

Suggested Module Shape

kafka-series - delivery semantics

kafka-series/

payment-service/# external-API idempotency lane

src/main/java/com/example/payment/

PaymentConsumer.javaentry# @KafkaListener with dedup

ProcessedEventStore.java# processed_events table

PaymentProvider.java# forwards idempotency key

payment-projector/# Kafka-only EOS lane

src/main/java/com/example/projector/

TransactionalConsumerConfig.javaentry# transactional listener container

ProjectionListener.java# read-process-write inside transaction

application.yml# transaction-id-prefix, read_committed

reconciliation-job/

ReconciliationRunner.java# finds ambiguous outcomes

directory.kt.yaml / .gradle.xml.ts / .jsother

What Most People Get Wrong

"At-least-once is the safe default." Only when the consumer is idempotent. Without that, at-least-once is at-least-once-and-potentially-many-times. Idempotency is the design that makes at-least-once safe.
"Kafka transactions give us exactly-once." They give exactly-once for Kafka topic to Kafka topic, with offsets committed inside the producer transaction. They do not extend to HTTP calls, payment providers, or external databases.
"transactional.id is enough." It is the start. Without offsets sent to the transaction, the read step is at-least-once and the chain is not exactly-once. The transactional listener container in Spring Kafka is what makes the offset mechanic correct by default.
"Exactly-once is a feature you switch on for a payment integration." It is not. Payment integrations are external-side-effect problems. They need consumer-side idempotency, an idempotency key forwarded to the provider, and reconciliation as the safety net.

Where We Go Next

Part 7 deals with the next breakage vector: the schema. JSON has carried us this far on a note that it would not last. Part 7 introduces Avro and Confluent Schema Registry, covers the wire format honestly, walks through a deserialization-loop failure caused by compatibility=NONE, and shows why ErrorHandlingDeserializer is not optional in any Spring Kafka consumer that touches Schema Registry.