How to Fix: polling with a zero timeout (p2) or `waitable-set.poll`ing (p3) can starve the async runtime
Zero-timeout polling can monopolize execution and starve the async runtime
When WASI polling is implemented with a zero timeout, the executor may repeatedly re-enter a ready path without ever yielding fairly. In practice, that means async tasks, timers, and unrelated I/O can get delayed or fully starved, especially when poll or waitable-set.poll is invoked in a tight loop. This is exactly the class of bug behind the wasmtime_wasi issue involving wasi:clocks/monotonic-clock#subscribe_instant and subscribe_duration.
At a high level, the bug appears when a timeout that should behave like a scheduled wake-up is translated into an immediate, non-blocking poll over and over again. Instead of allowing the runtime to park, advance timers, and schedule other futures, the code creates a hot loop that continuously asks, “is it ready now?” and immediately gets control back.
Understanding the Root Cause
The root problem is a mismatch between readiness polling semantics and cooperative async scheduling.
In runtimes like Tokio, fairness depends on tasks eventually yielding. If an implementation uses a zero-duration timeout as a signal to call a poll-like primitive immediately, several bad things can happen:
- A future is repeatedly polled and returns quickly without any real suspension point.
- The surrounding loop keeps re-registering interest and polling again.
- The runtime sees constant activity from one task and has fewer chances to schedule others fairly.
- Timer-driven subscriptions such as
subscribe_instantandsubscribe_durationstop behaving like true asynchronous waits and instead behave like busy checks.
For WASI monotonic clock subscriptions, this is especially subtle. A clock subscription should usually translate into one of two behaviors:
- If the deadline has already passed, mark it ready once and return control cleanly.
- If the deadline is in the future, register a wake-up with the runtime and suspend until the deadline or another event occurs.
The starvation bug appears when future deadlines are represented through repeated zero-timeout polling rather than a real timer registration. The effect is similar to a busy wait inside an async system.
With waitable-set.poll, the same issue can surface if the implementation repeatedly performs immediate checks on a set of waitables without a blocking or yielding path. Even though each individual call may look cheap, the aggregate scheduling behavior can become pathological under load.
Step-by-Step Solution
The fix is to ensure that zero-timeout or immediate-deadline paths do not devolve into unbounded repolling. Instead, convert clock subscriptions into proper runtime-backed timers and guarantee a yield boundary when work cannot complete synchronously in a fair way.
1. Distinguish expired deadlines from future deadlines
First, compute whether the monotonic clock deadline is already due. If it is due, resolve it once. If not, schedule a timer rather than spinning through poll.
use std::time::{Duration, Instant};
fn deadline_state(target: Instant, now: Instant) -> Result<Duration, Duration> {
if target <= now {
Err(Duration::ZERO)
} else {
Ok(target.duration_since(now))
}
}
In this pattern:
Err(Duration::ZERO)means the event is already ready.Ok(duration)means you should register a real sleep or timer.
2. Replace zero-timeout repolling with a real async sleep
If the deadline is in the future, do not call a non-blocking poll in a loop. Use the runtime’s timer mechanism.
use tokio::time::{sleep, Duration, Instant};
async fn wait_until(deadline: Instant) {
let now = Instant::now();
if deadline <= now {
return;
}
sleep(deadline.duration_since(now)).await;
}
This ensures the task is parked efficiently and the runtime can continue executing other tasks.
3. Guard immediate-ready paths so they complete once
A common mistake is to keep reporting the same immediate event in a way that causes callers to loop forever. Track whether the event has already been consumed.
struct ClockSubscription {
ready: bool,
}
impl ClockSubscription {
fn new() -> Self {
Self { ready: false }
}
fn mark_ready_once(&mut self) -> bool {
if self.ready {
false
} else {
self.ready = true;
true
}
}
}
This matters for APIs that aggregate multiple waitables. A ready clock event should become an event to consume, not a permanent reason to repoll immediately forever.
4. Ensure waitable-set polling has a yielding or blocking path
If implementing waitable-set.poll, avoid logic like this:
loop {
if let Some(event) = check_waitables_non_blocking() {
return Some(event);
}
}
That pattern starves the runtime. Prefer one of these approaches:
- Register all waitables with async wakeups and await readiness.
- If the API requires a polling facade, internally back it by a notification primitive.
- If no event is ready yet, explicitly yield before retrying.
use tokio::task::yield_now;
async fn fair_poll_waitables() -> Event {
loop {
if let Some(event) = check_waitables_non_blocking() {
return event;
}
yield_now().await;
}
}
Yielding is a fallback, not the ideal design. A true event-driven wait is better, but yielding is still far safer than a hot zero-timeout loop.
5. Model monotonic clock subscriptions as timer registrations
For subscribe_instant and subscribe_duration, the robust implementation strategy is:
- Convert the WASI clock request into a concrete host deadline.
- If already expired, emit readiness once.
- If not expired, register a timer future.
- Wake the waiting task only when the timer completes.
use tokio::time::{sleep_until, Instant};
async fn subscribe_instant(deadline: Instant) {
let now = Instant::now();
if deadline <= now {
return;
}
sleep_until(deadline).await;
}
This aligns the WASI-facing API with the async runtime’s fairness guarantees.
6. Add regression tests for starvation
Do not stop at functional correctness. Add tests that verify unrelated tasks still make progress when many immediate or near-immediate subscriptions are active.
use std::sync::{Arc, atomic::{AtomicUsize, Ordering}};
use tokio::task;
use tokio::time::{sleep, Duration};
#[tokio::test]
async fn zero_timeout_poll_does_not_starve_runtime() {
let counter = Arc::new(AtomicUsize::new(0));
let counter2 = counter.clone();
let worker = task::spawn(async move {
for _ in 0..100 {
counter2.fetch_add(1, Ordering::SeqCst);
sleep(Duration::from_millis(1)).await;
}
});
let poller = task::spawn(async move {
for _ in 0..10_000 {
tokio::task::yield_now().await;
}
});
worker.await.unwrap();
poller.await.unwrap();
assert!(counter.load(Ordering::SeqCst) > 0);
}
In a real codebase, replace the synthetic poller with the actual WASI subscription path you fixed.
7. Review any code path that treats timeout zero as a special fast path
These bugs often exist in more than one place. Search for:
Duration::ZERO- non-blocking poll loops
try_-style readiness checks inside loops- manual clock comparisons followed by immediate retries
If a fast path skips suspension entirely, validate that it does not create an unfair scheduling loop.
Common Edge Cases
Deadlines that are already expired
If the requested instant is in the past, it should resolve immediately, but only once. Repeatedly surfacing the same expired timer as a fresh event can recreate starvation at a higher layer.
Very small non-zero durations
A duration of 1ns or 1µs may behave almost like zero in practice depending on clock resolution and runtime granularity. Treat ultra-short sleeps carefully and rely on the runtime timer rather than manual repolling.
Mixed waitable sets
If a waitable set contains both I/O and clock subscriptions, an always-ready or repeatedly-polled clock entry can dominate the set and delay delivery of other events. Ensure event consumption semantics are correct across all waitables.
Host clock conversion bugs
Converting WASI clock values into host Instant or Duration types can overflow, underflow, or lose precision. Clamp values and validate arithmetic when bridging ABI-level time representations.
Cancellation and dropped futures
When a waitable or timer is canceled, make sure wake registrations are cleaned up. Otherwise, stale readiness notifications may leak into subsequent polls and create confusing behavior that looks like starvation or spurious wakeups.
Single-threaded executors
This issue is often much more visible on a single-threaded runtime because one hot loop can monopolize the only worker thread. A multi-threaded runtime may mask the symptom temporarily, but the fairness bug still exists.
FAQ
Why is a zero timeout dangerous in async code if it returns immediately?
Because “returns immediately” is not the same as “behaves fairly.” In an async runtime, immediate-return loops can prevent the executor from parking or scheduling other tasks. The problem is not correctness of one call; it is the cumulative scheduling behavior.
Is adding yield_now() enough to fix the issue?
It is a useful mitigation, but not the best architectural fix. The preferred solution is to model future clock events as actual timer registrations and wake the task only when the deadline arrives. yield_now() is better than spinning, but it still burns more scheduler attention than a proper event-driven wait.
How do I know whether my WASI polling implementation is still starving the runtime?
Run concurrency-focused tests where unrelated tasks must continue making progress while many clock subscriptions and waitable-set polls are active. If throughput collapses, timers drift badly, or unrelated futures stop advancing, you likely still have a fairness problem.
The key takeaway is simple: do not implement time-based WASI subscriptions as repeated zero-timeout polls. Represent them as real async timer waits, consume immediate-ready events exactly once, and ensure any polling abstraction has a legitimate suspension path. That combination fixes starvation without changing the external behavior of the WASI API.