
Synchronous vs Asynchronous Communication
Series
System Architecture Field Guide
6 of 12 in the series
A field guide for engineers moving into system ownership, focused on the decisions that make systems safer to change, easier to understand, and less fragile under real product pressure.
Article 1
What Architects Actually Decide
Article 2
Architecture Is Mostly Tradeoffs: Naming What A Decision Costs
Article 3
Monoliths, Modular Monoliths, And Services Without Hype
Article 4
Finding Service Boundaries That Teams Can Own
Article 5
API Design As Architecture
Article 6
Synchronous vs Asynchronous Communication
Article 7
SDK Architecture For Systems Other Developers Depend On
Article 8
Mobile And Backend Architecture Are One System
Article 9
Database Migrations Without Breaking Production
Article 10
Timeouts, Retries, Idempotency, And Backpressure
Article 11
Observability That Changes Architecture Decisions
Article 12
Change Safety: Testing Systems You Cannot Fully Stage
Choosing between request/response and async messaging is not a technology preference. It is a decision about coupling, latency, failure behavior, user experience, and operational cost.
Once a system has more than one boundary, the next question appears quickly:
"Should this call be synchronous or asynchronous?"
That question often gets flattened into a tool debate.
REST vs Kafka. Queue vs API. Events vs commands. Request/response vs pub/sub.
Those details matter, but they are not the first decision.
The useful decision starts here:
"Does the caller need an answer now, and what should happen if the work cannot finish now?"
That question is where communication style becomes product behavior.
Communication Style Is Coupling
Synchronous communication couples the caller to the callee in time.
The caller waits. The user may wait. The request budget is shared. If the callee is slow, the caller feels it immediately.
Asynchronous communication decouples time.
The caller records intent, publishes work, or enqueues a message. Another component handles it later. The user may see a pending state. The system needs a way to track progress, retries, and failures.
Neither is automatically better.
They buy different things.
| Style | What It Buys | What It Spends |
|---|---|---|
| Synchronous | Immediate answer, simpler mental model, direct error handling. | Tight latency coupling and dependency availability coupling. |
| Asynchronous | Decoupled processing, buffering, retries, independent consumers. | Eventual consistency, harder debugging, duplicate handling, progress tracking. |
The wrong choice is usually the one that hides the real product behavior.
Use Synchronous Calls When The User Needs A Decision Now
Some workflows need an immediate answer.
If a user clicks "Pay now," the product may need to know whether the payment was accepted, declined, or requires another step. If a user logs in, the product needs an authentication decision before continuing. If a checkout flow needs current inventory before confirming an order, the system may need a real-time answer.
Synchronous communication is a good fit when:
- the caller cannot continue without the answer
- the answer is part of the user-facing decision
- the dependency is expected to respond within the user experience budget
- failure can be shown or handled immediately
- the workflow is simpler and safer when the result is known now
For example:
Checkout API -> Payment Authorization API -> ProviderThe checkout flow may need to tell the user:
- payment accepted
- payment declined
- payment requires action
- payment status unknown
That does not mean the whole order lifecycle must be synchronous.
It means this decision may belong in the request path.
Use Async When The Work Can Continue Later
Some work should not block the user.
Sending a receipt, updating a search index, generating analytics, syncing a CRM, refreshing a recommendation model, or notifying a warehouse usually does not need to finish before the user sees success.
Asynchronous communication is a good fit when:
- the caller can record intent and continue
- the work can be retried safely
- multiple consumers need to react independently
- the system needs buffering during spikes
- user experience can tolerate pending or delayed completion
- the producer should not know every downstream consumer
For example:
Order Service -> OrderConfirmed event -> Receipt, Analytics, FulfillmentThe order confirmation is the business fact. Receipt delivery, analytics, and fulfillment can react without blocking the user-facing request.
That is the clean version.
The messy version is when the event says something the business has not actually decided yet.
If the system publishes OrderConfirmed before payment is authorized, every consumer now depends on a lie. The async shape is not the problem. The wrong event is the problem.
Compare the event names:
| Event | What It Implies | Safer When |
|---|---|---|
OrderCreated | An order record exists. | Downstream systems only need to know the draft or pending order exists. |
PaymentAuthorized | Money movement is approved. | Fulfillment or receipt logic depends on successful payment. |
OrderConfirmed | The business is ready to treat the order as accepted. | Payment, inventory, and risk checks required by the product have passed. |
The event name is not cosmetic. It tells consumers what decision they are allowed to make.
Do Not Use Async To Hide Bad Latency
Async can make a slow workflow feel faster.
It can also hide failure from the user and move complexity into operations.
Suppose checkout calls a fraud provider that is slow. A team might say:
"Let's make fraud async so checkout is faster."
Maybe that is right.
But the product question becomes:
"Can we confirm the order before fraud completes?"
If the answer is no, async does not remove the dependency. It only makes the state machine more complicated. The user now needs a pending state. The system needs cancellation or reversal behavior. Support needs to explain what happened. Operations need to detect stuck checks.
Async is not a latency eraser.
It is a different contract with the user.
For example, the API response may need to become honest about pending work:
{
"orderId": "ord_123",
"status": "pending_review",
"message": "We are checking this order and will update you shortly.",
"nextCheckAfterSeconds": 30
}That response is not a backend detail. It is the product accepting delayed certainty.
Do Not Use Sync To Avoid Designing Workflow State
Teams also overuse synchronous calls because they do not want to design state.
A long request path feels simple because all the steps sit in one call stack:
Checkout -> Payments -> Inventory -> Fraud -> Notifications -> LoyaltyThe code reads like a sequence.
Production does not behave like a sequence.
Any dependency can slow down, timeout, partially succeed, or fail after doing work. The longer the chain, the more likely the user request becomes a hostage to one dependency.
If the workflow has many steps, ask:
- which step creates the user-visible decision?
- which steps can happen after the decision?
- which steps can be retried?
- which steps need compensation?
- which failures should block the user?
- which failures should create operational work?
Sometimes the answer is still synchronous.
Often the answer is a smaller synchronous core with async follow-up work.
A Practical Decision Table
Use communication style to make system behavior honest.
| Question | Prefer Sync When | Prefer Async When |
|---|---|---|
| Does the caller need the result now? | Yes, the result changes the immediate response. | No, the work can complete after the caller continues. |
| Can the user tolerate pending state? | No, the product must decide now. | Yes, pending or delayed completion is acceptable. |
| Is the dependency reliable inside the latency budget? | Usually yes. | Not reliably, or it has variable latency. |
| Are there multiple independent consumers? | Usually no. | Yes, several consumers should react independently. |
| Is duplicate handling easy? | Not required by the call style. | Required, because retries and replays happen. |
| Is ordering important? | The request path controls the sequence. | Ordering must be designed explicitly. |
| Is failure user-visible? | Usually immediate. | Often delayed, operational, or shown as status. |
The table does not choose the technology.
It chooses the behavior the technology must support.
The Hybrid Shape Is Common
Many real workflows are mixed.
Checkout might synchronously validate the cart, authorize payment, and create the order. Then it asynchronously sends receipts, starts fulfillment, updates analytics, and publishes lifecycle events.
User waits for:
validate cart
authorize payment
create order
System continues with:
send receipt
start fulfillment
update analytics
notify support toolsThis is often the best architecture because it separates the user-facing decision from follow-up work.
The important part is naming the boundary:
"What must be true before we tell the user this succeeded?"
If the team cannot answer that, it is not ready to choose sync or async.
Async Needs More Than A Queue
Async systems need operational design.
At minimum, ask:
- how are messages retried?
- how are duplicates handled?
- how is ordering preserved where needed?
- how are poison messages isolated?
- how do operators see stuck work?
- how does the user see pending or failed work?
- how are event contracts versioned?
- how are consumers added without breaking existing behavior?
This is why queues and event streams are not "just infrastructure."
They change the system's failure model.
The Kafka Mastery branch goes into the implementation mechanics. The trunk-level lesson is simpler: do not introduce async communication unless the team is ready to own retries, replays, ordering, contracts, and visibility.
Sync Needs More Than An HTTP Call
Synchronous systems need operational design too.
At minimum, ask:
- what is the timeout budget?
- what happens when the dependency is slow?
- can the caller retry safely?
- what error does the user see?
- does the call hold locks or transactions while waiting?
- does the dependency have a fallback?
- does this call create cascading failure risk?
- how is latency observed by caller and callee?
An HTTP call is easy to add.
A reliable dependency is harder to own.
Where To Go Deeper
The service boundaries article helps decide which team should own a communication boundary.
The API design article goes deeper into request/response contracts.
The observability article goes deeper into seeing behavior across service and client boundaries.
The timeouts, retries, idempotency, and backpressure article goes deeper into making repeated or delayed work safe.
The Kafka Mastery branch is the deeper path when async communication becomes Kafka-specific and implementation mechanics matter.
Summary
Synchronous communication is not old-fashioned. Asynchronous communication is not automatically more scalable.
They express different product and operational promises.
Use sync when the caller needs a decision now and the dependency can fit inside the latency and failure budget.
Use async when work can continue later, multiple consumers need to react, or buffering and decoupling are worth the complexity.
The architect's job is not to pick REST or events.
It is to make the system's timing, failure, and ownership model honest.