
Java CompletableFuture, Part 4: Production Patterns and Testing
How to handle fire-and-forget failures safely, keep systems stable under high event load, and write async tests that are fast and deterministic.
The Dark Side of Async
Parts 1, 2, and 3 of this series covered the async fundamentals, composition, and error handling. This part covers the patterns that separate production code from tutorial code.
Three areas to address:
- Fire-and-forget: when you do not wait for results, and why that is dangerous
- High pressure: what happens when tasks arrive faster than you can process them
- Testing: writing async tests that are fast and deterministic
Fire-and-Forget: Handle With Care
Fire-and-forget is when you start an async operation but do not wait for it to complete:
// Start the task and move on
CompletableFuture.runAsync(() -> sendAnalyticsEvent(event));
// Execution continues immediatelyWhen fire-and-forget is appropriate
- Non-critical operations (analytics, metrics)
- Notifications where occasional failure is acceptable
- Cache warming and pre-loading
- Audit trails that should not block the main flow
When fire-and-forget is dangerous
- Critical business operations
- Operations where failure must be reported
- When you need to track completion
- When back-pressure is needed
Silent Failures
The most common fire-and-forget mistake:
// Bad: exceptions vanish silently
public void logAnalytics(String event) {
CompletableFuture.runAsync(() -> {
analyticsService.send(event); // If this throws, the exception is lost
});
}If analyticsService.send() throws:
- No stack trace is printed
- No error is logged
- No way to know it failed
- Analytics silently stops working
This looks like it handles errors, but it does not:
// Also bad: catches the error but discards it
CompletableFuture.runAsync(() -> analyticsService.send(event))
.exceptionally(ex -> {
return null; // Error caught and thrown away
});Safe Fire-and-Forget Patterns
Pattern 1: Always log errors
At minimum, always log errors in fire-and-forget operations:
public void safeFireAndForget(Runnable task, String taskName) {
CompletableFuture.runAsync(task, executor)
.whenComplete((result, ex) -> {
if (ex != null) {
log.error("[FIRE-AND-FORGET] Task '{}' failed: {}",
taskName, ex.getMessage(), ex);
metrics.increment("async." + taskName + ".error");
} else {
metrics.increment("async." + taskName + ".success");
}
});
}Pattern 2: Fire-and-forget with retry
For important operations that should not block but need reliability:
public void fireAndForgetWithRetry(Runnable task, String name, int maxRetries) {
executeWithRetry(task, name, maxRetries, 0);
}
private void executeWithRetry(Runnable task, String name, int maxRetries, int attempt) {
CompletableFuture.runAsync(task, executor)
.whenComplete((result, ex) -> {
if (ex != null && attempt < maxRetries) {
log.warn("Task '{}' failed, retrying ({}/{})",
name, attempt + 1, maxRetries);
sleep(100 * (attempt + 1)); // Simple exponential backoff
executeWithRetry(task, name, maxRetries, attempt + 1);
} else if (ex != null) {
log.error("Task '{}' failed after {} retries", name, maxRetries, ex);
metrics.increment("async." + name + ".final_failure");
}
});
}Pattern 3: Fire-and-forget with fallback
When you have a backup plan for failures:
public void fireAndForgetWithFallback(
Runnable primaryTask,
Runnable fallbackTask,
String taskName) {
CompletableFuture.runAsync(primaryTask, executor)
.whenComplete((result, ex) -> {
if (ex != null) {
log.warn("Primary task '{}' failed, executing fallback", taskName);
try {
fallbackTask.run();
} catch (Exception fallbackEx) {
log.error("Both primary and fallback failed for '{}'",
taskName, fallbackEx);
}
}
});
}
// Try to log to database, fall back to local file
fireAndForgetWithFallback(
() -> auditDatabase.log(event),
() -> auditFile.log(event),
"audit-log"
);Pattern 4: Return the future
The most flexible approach is to return the future, giving callers the option to track completion or ignore it:
public CompletableFuture<Void> optionallyTrackedTask(Runnable task, String name) {
return CompletableFuture.runAsync(task, executor)
.whenComplete((result, ex) -> {
if (ex != null) {
log.error("Task '{}' failed", name, ex);
}
});
}
// Option 1: ignore return value (fire-and-forget)
optionallyTrackedTask(() -> sendEmail(), "email");
// Option 2: chain more work
optionallyTrackedTask(() -> sendEmail(), "email")
.thenRun(() -> log.info("Email sent"));
// Option 3: wait for completion
optionallyTrackedTask(() -> sendEmail(), "email").join();High Pressure: When Tasks Overwhelm Your System
The scenario that crashes production systems: high-pressure fire-and-forget.
Consider logging events to a database. Normal load is 100 events per second and the database handles it fine. A traffic spike brings 10,000 events per second. Each event spawns a fire-and-forget task.
What happens:
- Thread pool saturates: all threads are busy
- Queue fills up: tasks pile up waiting for threads
- Memory exhausts: the unbounded queue grows until OOM
- System crashes or becomes unresponsive
// Bad: newFixedThreadPool uses an unbounded LinkedBlockingQueue
ExecutorService executor = Executors.newFixedThreadPool(4);
for (Event event : highVolumeStream) {
CompletableFuture.runAsync(() -> database.write(event), executor);
// Queue: 1, 10, 100, 1000... grows until OutOfMemoryError
}Solution 1: Bounded queues with rejection policies
Bound your queue. When it is full, a rejection policy decides what to do:
public ExecutorService createSafeExecutor(int threads, int queueSize) {
return new ThreadPoolExecutor(
threads, threads,
0L, TimeUnit.MILLISECONDS,
new ArrayBlockingQueue<>(queueSize), // bounded queue
new ThreadPoolExecutor.CallerRunsPolicy() // rejection policy
);
}The four built-in rejection policies:
AbortPolicy: throwsRejectedExecutionExceptionimmediately; use when you need to know right away that the system is overwhelmedCallerRunsPolicy: the caller thread runs the task itself; creates natural back-pressure by slowing the producer downDiscardPolicy: silently drops the task; acceptable for sampling or best-effort scenariosDiscardOldestPolicy: drops the oldest queued task; use when newer data takes priority
CallerRunsPolicy is a good default for fire-and-forget logging: when overwhelmed, the producer automatically slows down.
For visibility, monitor your rejection handler:
ExecutorService executor = new ThreadPoolExecutor(
4, 4,
0L, TimeUnit.MILLISECONDS,
new ArrayBlockingQueue<>(1000),
(runnable, pool) -> {
rejectedTasksCounter.increment();
log.warn("Task rejected - queue full. Total rejected: {}",
rejectedTasksCounter.get());
if (!pool.isShutdown()) {
runnable.run();
}
}
);Solution 2: Batching for database writes
Writing one row per event is inefficient under load. Batch your writes:
public class BatchingEventLogger {
private final BlockingQueue<LogEvent> buffer;
private final int batchSize;
private final long maxDelayMs;
private final ScheduledExecutorService scheduler;
public BatchingEventLogger(int bufferSize, int batchSize, long maxDelayMs) {
this.buffer = new ArrayBlockingQueue<>(bufferSize);
this.batchSize = batchSize;
this.maxDelayMs = maxDelayMs;
this.scheduler = Executors.newSingleThreadScheduledExecutor();
// Flush periodically even if the batch is not full
scheduler.scheduleAtFixedRate(
this::flush, maxDelayMs, maxDelayMs, TimeUnit.MILLISECONDS
);
}
public boolean logEvent(LogEvent event) {
boolean added = buffer.offer(event);
if (!added) {
log.warn("Event buffer full - dropping event");
return false;
}
if (buffer.size() >= batchSize) {
flush();
}
return true;
}
private void flush() {
List<LogEvent> batch = new ArrayList<>(batchSize);
buffer.drainTo(batch, batchSize);
if (!batch.isEmpty()) {
CompletableFuture.runAsync(() -> {
database.batchInsert(batch);
log.debug("Flushed {} events to database", batch.size());
}, executor);
}
}
}Benefits: 100 events become 1 database write instead of 100. The buffer provides natural back-pressure, and the periodic flush ensures events do not wait too long.
Solution 3: Rate limiting with Semaphore
Control concurrent operations regardless of queue size:
public class RateLimitedLogger {
private final Semaphore permits;
private final ExecutorService executor;
public RateLimitedLogger(int maxConcurrent, ExecutorService executor) {
this.permits = new Semaphore(maxConcurrent);
this.executor = executor;
}
public boolean tryLog(LogEvent event) {
if (!permits.tryAcquire()) {
return false; // At capacity - reject immediately
}
CompletableFuture.runAsync(() -> {
try {
database.write(event);
} finally {
permits.release(); // Always release
}
}, executor);
return true;
}
public boolean logWithWait(LogEvent event, long timeoutMs)
throws InterruptedException {
if (!permits.tryAcquire(timeoutMs, TimeUnit.MILLISECONDS)) {
return false;
}
// ... same as above
}
}This guarantees at most N concurrent database operations, regardless of how fast events arrive.
Solution 4: Circuit breaker
When the database is failing, stop sending requests to it:
public class CircuitBreakerLogger {
private volatile boolean circuitOpen = false;
private final AtomicInteger consecutiveFailures = new AtomicInteger(0);
private final int failureThreshold;
private volatile long circuitOpenedAt = 0;
private final long resetTimeoutMs;
public boolean logEvent(LogEvent event) {
if (circuitOpen) {
if (System.currentTimeMillis() - circuitOpenedAt > resetTimeoutMs) {
circuitOpen = false;
log.info("Circuit breaker attempting to close...");
} else {
log.debug("Circuit open - rejecting event");
return false;
}
}
CompletableFuture.runAsync(() -> {
try {
database.write(event);
consecutiveFailures.set(0);
} catch (Exception e) {
int failures = consecutiveFailures.incrementAndGet();
if (failures >= failureThreshold) {
circuitOpen = true;
circuitOpenedAt = System.currentTimeMillis();
log.error("Circuit breaker OPENED after {} failures", failures);
}
}
}, executor);
return true;
}
}When the database fails failureThreshold times in a row:
- The circuit opens, stopping new requests
- After
resetTimeoutMs, one test request goes through (half-open state) - If that request succeeds, the circuit closes; if it fails, it stays open
This prevents cascading failures and gives the downstream system time to recover.
Testing Async Code
Async code is hard to test. Race conditions, timing issues, and flaky tests all trace back to the same root cause: tests that depend on real thread scheduling. The six strategies below avoid it.
Strategy 1: Use a synchronous executor
For unit tests, use an executor that runs tasks immediately on the calling thread:
Executor synchronousExecutor = Runnable::run;Your async code becomes synchronous in tests:
@BeforeEach
void setUp() {
service = new UserService(
mockRepository,
mockClient,
Runnable::run // Synchronous executor
);
}
@Test
void testAsyncOperation() {
when(mockRepository.findById(1L)).thenReturn(expectedUser);
User result = service.findUserAsync(1L).join();
assertEquals("expected_user", result.username());
verify(mockRepository).findById(1L);
}No race conditions. No timing sensitivity. Deterministic by construction.
Strategy 2: Use completedFuture() and failedFuture()
For mocking methods that return CompletableFuture:
// Mock a successful async response
when(asyncClient.fetchData(anyLong()))
.thenReturn(CompletableFuture.completedFuture(expectedData));
// Mock a failed async response
when(asyncClient.fetchData(anyLong()))
.thenReturn(CompletableFuture.failedFuture(
new ServiceException("Connection failed")
));These create pre-completed futures with no actual async behavior to worry about.
Strategy 3: Test error handling paths
Do not only test the happy path:
@Test
void testFallbackOnError() {
when(userRepository.findById(anyLong()))
.thenThrow(new RuntimeException("DB Error"));
User result = service.findUserWithFallback(1L).join();
assertEquals("unknown", result.username());
}
@Test
void testExceptionPropagatesWithoutFallback() {
when(userRepository.findById(anyLong()))
.thenThrow(new RuntimeException("DB Error"));
CompletableFuture<User> future = service.findUserById(1L);
CompletionException thrown = assertThrows(
CompletionException.class,
future::join
);
assertEquals("DB Error", thrown.getCause().getMessage());
}Strategy 4: Test timeouts with real async
For timeout testing, you need actual async behavior:
@Test
void testTimeoutThrowsException() {
Executor asyncExecutor = Executors.newSingleThreadExecutor();
UserService asyncService = new UserService(mockRepo, asyncExecutor);
when(mockRepo.findById(anyLong())).thenAnswer(inv -> {
Thread.sleep(500); // Slower than the timeout
return expectedUser;
});
CompletableFuture<User> future = asyncService
.findUserWithTimeout(1L, 50); // 50ms timeout
CompletionException thrown = assertThrows(
CompletionException.class,
future::join
);
assertInstanceOf(TimeoutException.class, thrown.getCause());
}Strategy 5: Verify parallel execution
Ensure operations actually run in parallel when they should:
@Test
void testOperationsRunInParallel() {
Executor parallelExecutor = Executors.newFixedThreadPool(3);
UserService service = new UserService(mocks..., parallelExecutor);
when(userRepo.findById(1L)).thenAnswer(inv -> { Thread.sleep(100); return user; });
when(prefsClient.get(1L)).thenAnswer(inv -> { Thread.sleep(100); return prefs; });
when(loyaltyClient.get(1L)).thenAnswer(inv -> { Thread.sleep(100); return points; });
long start = System.currentTimeMillis();
UserProfile profile = service.getUserProfile(1L).join();
long duration = System.currentTimeMillis() - start;
// Should complete in ~100ms (parallel), not ~300ms (sequential)
assertTrue(duration < 200,
"Expected parallel execution (~100ms) but took " + duration + "ms");
}Strategy 6: Test pure functions separately
Extract business logic into pure functions and test them without any async:
// In your service
UserProfile buildProfile(User user, List<String> prefs, int points) {
String plan = calculatePlan(user.tier(), points);
return UserProfile.of(user, prefs, LocalDateTime.now(), points, plan);
}
String calculatePlan(String tier, int points) {
if (points > 5000) return "VIP";
if (points > 2000 && "BASIC".equals(tier)) return "PREMIUM";
return tier;
}
// Fast, deterministic tests with no mocking required
@Test
void testPlanCalculation() {
assertEquals("VIP", service.calculatePlan("BASIC", 6000));
assertEquals("PREMIUM", service.calculatePlan("BASIC", 3000));
assertEquals("BASIC", service.calculatePlan("BASIC", 500));
}Testing checklist
Before calling async test coverage complete:
- Success path with a synchronous executor
- Error handling (exceptions and fallbacks)
- Timeout behavior
- Partial failures in parallel operations
- Future state inspection (
isDone(),isCompletedExceptionally()) - Parallel execution timing (when it matters)
- Pure business logic tested separately from async mechanics
Production Checklist
Before deploying async code:
- Custom executors with meaningful thread names
- Bounded queues with an explicit capacity limit
- A defined rejection policy for when the queue is full
- Error logging in every fire-and-forget path
- Monitoring for submitted, completed, rejected, and failed counts
- Graceful shutdown configuration (
setWaitForTasksToCompleteOnShutdown(true)) - A back-pressure strategy for when load exceeds capacity
- A circuit breaker to protect against cascading downstream failures
Key Takeaways
- Fire-and-forget is dangerous. Always log errors and track metrics. Return the future when possible so callers can decide whether to track it.
- Bound your queues. Unbounded queues lead to memory exhaustion under load. Use
ArrayBlockingQueuewith an explicit rejection policy. CallerRunsPolicyis a good default for logging. When overwhelmed, it slows the producer naturally.- Batch writes under load. One database write for 100 events beats 100 separate writes.
- Use a synchronous executor in tests.
Runnable::runmakes async code deterministic. - Test error paths. Happy paths work in demos. Error paths work in production.
- Extract pure functions. Business logic without async mechanics is fast and easy to test in isolation.
Wrapping Up the Series
This series covered the full arc of working with CompletableFuture:
- Part 1: the foundation (supplyAsync, thenApply, the mental model)
- Part 2: composition (thenApply vs thenCompose, parallel execution)
- Part 3: error handling and Spring Boot integration
- Part 4: production patterns (fire-and-forget, back-pressure, testing)
CompletableFuture is a powerful tool. Used well, it makes applications faster and more responsive. Used carelessly, it creates silent failures and hard-to-reproduce bugs.
The difference is in the details: proper error handling, bounded queues, meaningful thread names, and comprehensive tests.