Java CompletableFuture, Part 4: Production Patterns and Testing

The Dark Side of Async

Parts 1, 2, and 3 of this series covered the async fundamentals, composition, and error handling. This part covers the patterns that separate production code from tutorial code.

Three areas to address:

Fire-and-forget: when you do not wait for results, and why that is dangerous
High pressure: what happens when tasks arrive faster than you can process them
Testing: writing async tests that are fast and deterministic

Fire-and-Forget: Handle With Care

Fire-and-forget is when you start an async operation but do not wait for it to complete:

java

// Start the task and move on
CompletableFuture.runAsync(() -> sendAnalyticsEvent(event));
// Execution continues immediately

When fire-and-forget is appropriate

Non-critical operations (analytics, metrics)
Notifications where occasional failure is acceptable
Cache warming and pre-loading
Audit trails that should not block the main flow

When fire-and-forget is dangerous

Critical business operations
Operations where failure must be reported
When you need to track completion
When back-pressure is needed

Silent Failures

The most common fire-and-forget mistake:

java

// Bad: exceptions vanish silently
public void logAnalytics(String event) {
    CompletableFuture.runAsync(() -> {
        analyticsService.send(event);  // If this throws, the exception is lost
    });
}

If analyticsService.send() throws:

No stack trace is printed
No error is logged
No way to know it failed
Analytics silently stops working

This looks like it handles errors, but it does not:

java

// Also bad: catches the error but discards it
CompletableFuture.runAsync(() -> analyticsService.send(event))
    .exceptionally(ex -> {
        return null;  // Error caught and thrown away
    });

Safe Fire-and-Forget Patterns

Pattern 1: Always log errors

At minimum, always log errors in fire-and-forget operations:

java

public void safeFireAndForget(Runnable task, String taskName) {
    CompletableFuture.runAsync(task, executor)
        .whenComplete((result, ex) -> {
            if (ex != null) {
                log.error("[FIRE-AND-FORGET] Task '{}' failed: {}",
                    taskName, ex.getMessage(), ex);
                metrics.increment("async." + taskName + ".error");
            } else {
                metrics.increment("async." + taskName + ".success");
            }
        });
}

Pattern 2: Fire-and-forget with retry

For important operations that should not block but need reliability:

java

public void fireAndForgetWithRetry(Runnable task, String name, int maxRetries) {
    executeWithRetry(task, name, maxRetries, 0);
}
 
private void executeWithRetry(Runnable task, String name, int maxRetries, int attempt) {
    CompletableFuture.runAsync(task, executor)
        .whenComplete((result, ex) -> {
            if (ex != null && attempt < maxRetries) {
                log.warn("Task '{}' failed, retrying ({}/{})",
                    name, attempt + 1, maxRetries);
                sleep(100 * (attempt + 1));  // Simple exponential backoff
                executeWithRetry(task, name, maxRetries, attempt + 1);
            } else if (ex != null) {
                log.error("Task '{}' failed after {} retries", name, maxRetries, ex);
                metrics.increment("async." + name + ".final_failure");
            }
        });
}

Pattern 3: Fire-and-forget with fallback

When you have a backup plan for failures:

java

public void fireAndForgetWithFallback(
        Runnable primaryTask,
        Runnable fallbackTask,
        String taskName) {
 
    CompletableFuture.runAsync(primaryTask, executor)
        .whenComplete((result, ex) -> {
            if (ex != null) {
                log.warn("Primary task '{}' failed, executing fallback", taskName);
                try {
                    fallbackTask.run();
                } catch (Exception fallbackEx) {
                    log.error("Both primary and fallback failed for '{}'",
                        taskName, fallbackEx);
                }
            }
        });
}
 
// Try to log to database, fall back to local file
fireAndForgetWithFallback(
    () -> auditDatabase.log(event),
    () -> auditFile.log(event),
    "audit-log"
);

Pattern 4: Return the future

The most flexible approach is to return the future, giving callers the option to track completion or ignore it:

java

public CompletableFuture<Void> optionallyTrackedTask(Runnable task, String name) {
    return CompletableFuture.runAsync(task, executor)
        .whenComplete((result, ex) -> {
            if (ex != null) {
                log.error("Task '{}' failed", name, ex);
            }
        });
}
 
// Option 1: ignore return value (fire-and-forget)
optionallyTrackedTask(() -> sendEmail(), "email");
 
// Option 2: chain more work
optionallyTrackedTask(() -> sendEmail(), "email")
    .thenRun(() -> log.info("Email sent"));
 
// Option 3: wait for completion
optionallyTrackedTask(() -> sendEmail(), "email").join();

High Pressure: When Tasks Overwhelm Your System

The scenario that crashes production systems: high-pressure fire-and-forget.

Consider logging events to a database. Normal load is 100 events per second and the database handles it fine. A traffic spike brings 10,000 events per second. Each event spawns a fire-and-forget task.

What happens:

Thread pool saturates: all threads are busy
Queue fills up: tasks pile up waiting for threads
Memory exhausts: the unbounded queue grows until OOM
System crashes or becomes unresponsive

java

// Bad: newFixedThreadPool uses an unbounded LinkedBlockingQueue
ExecutorService executor = Executors.newFixedThreadPool(4);
 
for (Event event : highVolumeStream) {
    CompletableFuture.runAsync(() -> database.write(event), executor);
    // Queue: 1, 10, 100, 1000... grows until OutOfMemoryError
}

Solution 1: Bounded queues with rejection policies

Bound your queue. When it is full, a rejection policy decides what to do:

java

public ExecutorService createSafeExecutor(int threads, int queueSize) {
    return new ThreadPoolExecutor(
        threads, threads,
        0L, TimeUnit.MILLISECONDS,
        new ArrayBlockingQueue<>(queueSize),       // bounded queue
        new ThreadPoolExecutor.CallerRunsPolicy()  // rejection policy
    );
}

The four built-in rejection policies:

AbortPolicy: throws RejectedExecutionException immediately; use when you need to know right away that the system is overwhelmed
CallerRunsPolicy: the caller thread runs the task itself; creates natural back-pressure by slowing the producer down
DiscardPolicy: silently drops the task; acceptable for sampling or best-effort scenarios
DiscardOldestPolicy: drops the oldest queued task; use when newer data takes priority

CallerRunsPolicy is a good default for fire-and-forget logging: when overwhelmed, the producer automatically slows down.

For visibility, monitor your rejection handler:

java

ExecutorService executor = new ThreadPoolExecutor(
    4, 4,
    0L, TimeUnit.MILLISECONDS,
    new ArrayBlockingQueue<>(1000),
    (runnable, pool) -> {
        rejectedTasksCounter.increment();
        log.warn("Task rejected - queue full. Total rejected: {}",
            rejectedTasksCounter.get());
        if (!pool.isShutdown()) {
            runnable.run();
        }
    }
);

Solution 2: Batching for database writes

Writing one row per event is inefficient under load. Batch your writes:

java

public class BatchingEventLogger {
    private final BlockingQueue<LogEvent> buffer;
    private final int batchSize;
    private final long maxDelayMs;
    private final ScheduledExecutorService scheduler;
 
    public BatchingEventLogger(int bufferSize, int batchSize, long maxDelayMs) {
        this.buffer = new ArrayBlockingQueue<>(bufferSize);
        this.batchSize = batchSize;
        this.maxDelayMs = maxDelayMs;
        this.scheduler = Executors.newSingleThreadScheduledExecutor();
 
        // Flush periodically even if the batch is not full
        scheduler.scheduleAtFixedRate(
            this::flush, maxDelayMs, maxDelayMs, TimeUnit.MILLISECONDS
        );
    }
 
    public boolean logEvent(LogEvent event) {
        boolean added = buffer.offer(event);
 
        if (!added) {
            log.warn("Event buffer full - dropping event");
            return false;
        }
 
        if (buffer.size() >= batchSize) {
            flush();
        }
 
        return true;
    }
 
    private void flush() {
        List<LogEvent> batch = new ArrayList<>(batchSize);
        buffer.drainTo(batch, batchSize);
 
        if (!batch.isEmpty()) {
            CompletableFuture.runAsync(() -> {
                database.batchInsert(batch);
                log.debug("Flushed {} events to database", batch.size());
            }, executor);
        }
    }
}

Benefits: 100 events become 1 database write instead of 100. The buffer provides natural back-pressure, and the periodic flush ensures events do not wait too long.

Solution 3: Rate limiting with Semaphore

Control concurrent operations regardless of queue size:

java

public class RateLimitedLogger {
    private final Semaphore permits;
    private final ExecutorService executor;
 
    public RateLimitedLogger(int maxConcurrent, ExecutorService executor) {
        this.permits = new Semaphore(maxConcurrent);
        this.executor = executor;
    }
 
    public boolean tryLog(LogEvent event) {
        if (!permits.tryAcquire()) {
            return false;  // At capacity - reject immediately
        }
 
        CompletableFuture.runAsync(() -> {
            try {
                database.write(event);
            } finally {
                permits.release();  // Always release
            }
        }, executor);
 
        return true;
    }
 
    public boolean logWithWait(LogEvent event, long timeoutMs)
            throws InterruptedException {
        if (!permits.tryAcquire(timeoutMs, TimeUnit.MILLISECONDS)) {
            return false;
        }
        // ... same as above
    }
}

This guarantees at most N concurrent database operations, regardless of how fast events arrive.

Solution 4: Circuit breaker

When the database is failing, stop sending requests to it:

java

public class CircuitBreakerLogger {
    private volatile boolean circuitOpen = false;
    private final AtomicInteger consecutiveFailures = new AtomicInteger(0);
    private final int failureThreshold;
    private volatile long circuitOpenedAt = 0;
    private final long resetTimeoutMs;
 
    public boolean logEvent(LogEvent event) {
        if (circuitOpen) {
            if (System.currentTimeMillis() - circuitOpenedAt > resetTimeoutMs) {
                circuitOpen = false;
                log.info("Circuit breaker attempting to close...");
            } else {
                log.debug("Circuit open - rejecting event");
                return false;
            }
        }
 
        CompletableFuture.runAsync(() -> {
            try {
                database.write(event);
                consecutiveFailures.set(0);
            } catch (Exception e) {
                int failures = consecutiveFailures.incrementAndGet();
                if (failures >= failureThreshold) {
                    circuitOpen = true;
                    circuitOpenedAt = System.currentTimeMillis();
                    log.error("Circuit breaker OPENED after {} failures", failures);
                }
            }
        }, executor);
 
        return true;
    }
}

When the database fails failureThreshold times in a row:

The circuit opens, stopping new requests
After resetTimeoutMs, one test request goes through (half-open state)
If that request succeeds, the circuit closes; if it fails, it stays open

This prevents cascading failures and gives the downstream system time to recover.

Testing Async Code

Async code is hard to test. Race conditions, timing issues, and flaky tests all trace back to the same root cause: tests that depend on real thread scheduling. The six strategies below avoid it.

Strategy 1: Use a synchronous executor

For unit tests, use an executor that runs tasks immediately on the calling thread:

java

Executor synchronousExecutor = Runnable::run;

Your async code becomes synchronous in tests:

java

@BeforeEach
void setUp() {
    service = new UserService(
        mockRepository,
        mockClient,
        Runnable::run  // Synchronous executor
    );
}
 
@Test
void testAsyncOperation() {
    when(mockRepository.findById(1L)).thenReturn(expectedUser);
 
    User result = service.findUserAsync(1L).join();
 
    assertEquals("expected_user", result.username());
    verify(mockRepository).findById(1L);
}

No race conditions. No timing sensitivity. Deterministic by construction.

Strategy 2: Use completedFuture() and failedFuture()

For mocking methods that return CompletableFuture:

java

// Mock a successful async response
when(asyncClient.fetchData(anyLong()))
    .thenReturn(CompletableFuture.completedFuture(expectedData));
 
// Mock a failed async response
when(asyncClient.fetchData(anyLong()))
    .thenReturn(CompletableFuture.failedFuture(
        new ServiceException("Connection failed")
    ));

These create pre-completed futures with no actual async behavior to worry about.

Strategy 3: Test error handling paths

Do not only test the happy path:

java

@Test
void testFallbackOnError() {
    when(userRepository.findById(anyLong()))
        .thenThrow(new RuntimeException("DB Error"));
 
    User result = service.findUserWithFallback(1L).join();
 
    assertEquals("unknown", result.username());
}
 
@Test
void testExceptionPropagatesWithoutFallback() {
    when(userRepository.findById(anyLong()))
        .thenThrow(new RuntimeException("DB Error"));
 
    CompletableFuture<User> future = service.findUserById(1L);
 
    CompletionException thrown = assertThrows(
        CompletionException.class,
        future::join
    );
    assertEquals("DB Error", thrown.getCause().getMessage());
}

Strategy 4: Test timeouts with real async

For timeout testing, you need actual async behavior:

java

@Test
void testTimeoutThrowsException() {
    Executor asyncExecutor = Executors.newSingleThreadExecutor();
    UserService asyncService = new UserService(mockRepo, asyncExecutor);
 
    when(mockRepo.findById(anyLong())).thenAnswer(inv -> {
        Thread.sleep(500);  // Slower than the timeout
        return expectedUser;
    });
 
    CompletableFuture<User> future = asyncService
        .findUserWithTimeout(1L, 50);  // 50ms timeout
 
    CompletionException thrown = assertThrows(
        CompletionException.class,
        future::join
    );
    assertInstanceOf(TimeoutException.class, thrown.getCause());
}

Strategy 5: Verify parallel execution

Ensure operations actually run in parallel when they should:

java

@Test
void testOperationsRunInParallel() {
    Executor parallelExecutor = Executors.newFixedThreadPool(3);
    UserService service = new UserService(mocks..., parallelExecutor);
 
    when(userRepo.findById(1L)).thenAnswer(inv -> { Thread.sleep(100); return user; });
    when(prefsClient.get(1L)).thenAnswer(inv -> { Thread.sleep(100); return prefs; });
    when(loyaltyClient.get(1L)).thenAnswer(inv -> { Thread.sleep(100); return points; });
 
    long start = System.currentTimeMillis();
    UserProfile profile = service.getUserProfile(1L).join();
    long duration = System.currentTimeMillis() - start;
 
    // Should complete in ~100ms (parallel), not ~300ms (sequential)
    assertTrue(duration < 200,
        "Expected parallel execution (~100ms) but took " + duration + "ms");
}

Strategy 6: Test pure functions separately

Extract business logic into pure functions and test them without any async:

java

// In your service
UserProfile buildProfile(User user, List<String> prefs, int points) {
    String plan = calculatePlan(user.tier(), points);
    return UserProfile.of(user, prefs, LocalDateTime.now(), points, plan);
}
 
String calculatePlan(String tier, int points) {
    if (points > 5000) return "VIP";
    if (points > 2000 && "BASIC".equals(tier)) return "PREMIUM";
    return tier;
}
 
// Fast, deterministic tests with no mocking required
@Test
void testPlanCalculation() {
    assertEquals("VIP", service.calculatePlan("BASIC", 6000));
    assertEquals("PREMIUM", service.calculatePlan("BASIC", 3000));
    assertEquals("BASIC", service.calculatePlan("BASIC", 500));
}

Testing checklist

Before calling async test coverage complete:

Success path with a synchronous executor
Error handling (exceptions and fallbacks)
Timeout behavior
Partial failures in parallel operations
Future state inspection (isDone(), isCompletedExceptionally())
Parallel execution timing (when it matters)
Pure business logic tested separately from async mechanics

Production Checklist

Before deploying async code:

Custom executors with meaningful thread names
Bounded queues with an explicit capacity limit
A defined rejection policy for when the queue is full
Error logging in every fire-and-forget path
Monitoring for submitted, completed, rejected, and failed counts
Graceful shutdown configuration (setWaitForTasksToCompleteOnShutdown(true))
A back-pressure strategy for when load exceeds capacity
A circuit breaker to protect against cascading downstream failures

Key Takeaways

Fire-and-forget is dangerous. Always log errors and track metrics. Return the future when possible so callers can decide whether to track it.
Bound your queues. Unbounded queues lead to memory exhaustion under load. Use ArrayBlockingQueue with an explicit rejection policy.
CallerRunsPolicy is a good default for logging. When overwhelmed, it slows the producer naturally.
Batch writes under load. One database write for 100 events beats 100 separate writes.
Use a synchronous executor in tests. Runnable::run makes async code deterministic.
Test error paths. Happy paths work in demos. Error paths work in production.
Extract pure functions. Business logic without async mechanics is fast and easy to test in isolation.

Wrapping Up the Series

This series covered the full arc of working with CompletableFuture:

Part 1: the foundation (supplyAsync, thenApply, the mental model)
Part 2: composition (thenApply vs thenCompose, parallel execution)
Part 3: error handling and Spring Boot integration
Part 4: production patterns (fire-and-forget, back-pressure, testing)

CompletableFuture is a powerful tool. Used well, it makes applications faster and more responsive. Used carelessly, it creates silent failures and hard-to-reproduce bugs.

The difference is in the details: proper error handling, bounded queues, meaningful thread names, and comprehensive tests.