How to Fix: test: flaky test in `wait`/`notify` implementation
Fixing the flaky wait_notify::atomic_wait_timeout_length test in a wait/notify implementation
This failure is a classic timing-sensitive concurrency bug: the test assumes a timeout will complete within a tight wall-clock window, but real schedulers, CI virtualization, timer granularity, and wakeup races make that assumption unreliable. The result is a test that passes most of the time and still fails often enough to erode trust in the suite.
Table of Contents
Problem overview
The flaky test mentioned in the issue, wait_notify::atomic_wait_timeout_length, is intended to validate that an atomic wait with a timeout blocks for approximately the requested duration. On paper that sounds straightforward. In practice, timeout behavior is influenced by multiple layers:
- OS scheduler latency
- Timer resolution on Linux, macOS, and Windows
- Spurious wakeups or implementation-level retries
- CPU contention in CI runners
- Clock measurement jitter between start and end timestamps
If the test expects too much precision from a timeout-based primitive, it becomes flaky even when the implementation is correct.
Understanding the Root Cause
The root problem is that the test is validating exact elapsed time for a synchronization primitive whose completion time is only guaranteed approximately.
In a typical wait/notify implementation, a thread does something like this:
- Read an atomic value.
- If it matches the expected state, go into a blocking wait.
- Wake up either because of notify, timeout expiration, or a spurious wake.
- Re-check the atomic state and decide whether to continue.
That means the elapsed wait time is not simply “requested timeout equals measured timeout.” Instead, actual time includes:
- Time to enter the kernel wait primitive
- Time spent descheduled before the wait begins
- Timer rounding to the platform’s next tick or deadline bucket
- Delay between timeout expiration and the thread getting CPU again
- Extra loop iterations after spurious wakeups
This is why tests that assert very narrow timing ranges are fragile. A timeout API should usually be tested for lower-bound correctness and eventual completion, not for exact millisecond precision.
Another common issue is using wall-clock assumptions in environments where monotonic timing behaves differently under load. Even with a monotonic clock, the scheduler may not resume the waiting thread immediately after the timeout threshold is reached.
In short, the flake happens because the test likely over-specifies behavior that the runtime, operating system, and hardware do not strictly guarantee.
Step-by-Step Solution
The most reliable fix is to make the test assert semantically correct timeout behavior instead of brittle exact timing.
1. Use a monotonic clock for measurement
If the test is not already using a monotonic source such as Instant, switch to it immediately. This avoids issues caused by wall-clock adjustments.
use std::time::{Duration, Instant};
let start = Instant::now();
// perform timed wait
let elapsed = start.elapsed();
2. Assert a minimum duration, not a narrow exact window
The test should prove that the timeout did not return too early. That is the property that matters most. Avoid asserting that the operation finished within a tiny upper bound.
let timeout = Duration::from_millis(50);
let start = Instant::now();
let timed_out = wait_with_timeout(timeout);
let elapsed = start.elapsed();
assert!(timed_out, "expected wait to time out");
assert!(elapsed >= timeout, "wait returned before timeout elapsed");
This change removes the main source of flakiness: demanding scheduler precision that CI cannot guarantee.
3. If an upper bound is necessary, make it generous
Sometimes you still want to catch pathological hangs. In that case, use a broad tolerance that accounts for loaded CI environments.
let timeout = Duration::from_millis(50);
let max_expected = Duration::from_secs(2);
let start = Instant::now();
let timed_out = wait_with_timeout(timeout);
let elapsed = start.elapsed();
assert!(timed_out, "expected timeout result");
assert!(elapsed >= timeout, "returned too early");
assert!(elapsed < max_expected, "timeout took unreasonably long: {:?}", elapsed);
This protects against deadlocks without reintroducing sensitivity to normal scheduler jitter.
4. Avoid comparing against a single exact duration threshold
A flaky pattern often looks like this:
assert!(elapsed >= Duration::from_millis(50));
assert!(elapsed <= Duration::from_millis(60));
That 10 ms window is far too narrow for many CI systems. Replace it with behavior-based assertions or a much wider envelope.
5. Re-check whether the implementation can spuriously wake
If the underlying wait primitive can wake without notification, the implementation must loop until either:
- The atomic value changes, or
- The timeout deadline is actually reached
A robust implementation pattern is deadline-based rather than duration-based looping.
use std::time::{Duration, Instant};
fn wait_until_changed_with_timeout(expected: u32, timeout: Duration) -> bool {
let deadline = Instant::now() + timeout;
loop {
if ATOMIC.load(Ordering::Acquire) != expected {
return true;
}
let now = Instant::now();
if now >= deadline {
return false;
}
let remaining = deadline.saturating_duration_since(now);
atomic_wait_for(&ATOMIC, expected, remaining);
}
}
This structure prevents timeout drift across retries and handles spurious wakeups correctly.
6. Prefer deadline semantics in the test as well
If the production implementation uses repeated waits internally, the test should not assume one uninterrupted sleep. Validate the final outcome instead.
#[test]
fn atomic_wait_timeout_length() {
let timeout = Duration::from_millis(50);
let start = Instant::now();
let changed = wait_until_changed_with_timeout(0, timeout);
let elapsed = start.elapsed();
assert!(!changed, "value should not have changed");
assert!(elapsed >= timeout, "timed wait completed too early: {:?}", elapsed);
assert!(elapsed < Duration::from_secs(2), "timed wait exceeded reasonable bound: {:?}", elapsed);
}
7. If needed, increase the timeout used by the test
Very short durations like 1 ms, 5 ms, or 10 ms are especially vulnerable to platform timer granularity. If the purpose is correctness rather than micro-benchmarking, a slightly larger test timeout often improves reliability dramatically.
let timeout = Duration::from_millis(100);
This does not make the implementation slower in production; it only makes the test less vulnerable to environmental jitter.
8. Document the reason for the looser timing assertion
Future maintainers may be tempted to tighten the assertion again. Add a short comment explaining that timeout-based synchronization is inherently imprecise in CI and under scheduler contention.
// Timed waits are scheduler- and platform-dependent.
// We assert that the wait does not return early and does not hang,
// but we intentionally avoid a narrow upper timing bound.
That comment can prevent the same flake from returning later.
Common Edge Cases
Spurious wakeups
If the wait primitive can wake spuriously, a single wait call is not enough. The implementation must re-check the atomic state in a loop and recompute the remaining timeout from a fixed deadline.
Coarse timer resolution on some platforms
Some operating systems round sleep and wait intervals more aggressively than expected. A requested 50 ms wait may measure significantly higher under load. This is normal and should not fail the test unless the delay is extreme.
Thread descheduling after timeout expiration
The timeout may expire on time, but the waiting thread may not run immediately afterward. Measured elapsed time therefore includes resumption latency, not just timeout duration.
Clock source mismatch
If part of the implementation uses one timing source and the test uses another, measurements can become inconsistent. Stick to monotonic time consistently.
Repeated relative waits causing timeout drift
If an implementation retries with the full original duration after each wake, it can wait much longer than intended. The correct pattern is to compute a fixed deadline once and use the remaining time for each iteration.
CI-specific noise
Shared runners, virtualization, and CPU throttling can magnify all of the above. A test that is stable locally can still be flaky in automation if the timing assertions are too strict.
FAQ
Why not just retry the flaky test?
Retries hide the problem instead of fixing it. The right approach is to make the test validate the actual contract of the timeout synchronization primitive, not an unrealistically precise runtime schedule.
Should the test assert both lower and upper bounds?
Yes, but only if the upper bound is generous. The lower bound ensures the wait did not return early. The upper bound should only catch obvious hangs or deadlocks, not ordinary scheduler jitter.
Is increasing the timeout in the test a real fix?
By itself, not always. Increasing the timeout helps with coarse timer granularity, but the main fix is changing the assertion strategy to be deadline-aware and tolerant of platform scheduling variability.
The practical resolution for this GitHub issue is simple: treat atomic_wait_timeout_length as a correctness test for no early return and eventual timeout, not as a precision timer benchmark. That aligns the test with how wait/notify primitives actually behave across platforms and removes the flakiness at its source.