How to Fix: Wasi Preview 2 Async Race Condition

7 min read

WASI Preview 2 Async Race Condition: Why AsyncWriteStream Gets Dropped Too Early and How to Fix It

This bug is a classic ownership and task-lifecycle race: once streams were moved to async, the AsyncWriteStream can be dropped before the receiver inside the spawned task has fully taken ownership of the underlying channel or stream state. The result is intermittent write failures, missing output, or behavior that only reproduces under certain scheduler timings.

In the affected design, a spawned async task is expected to own the receiving side of a pipeline, but the producer-side wrapper may be dropped before that handoff is guaranteed. In async Rust, that kind of timing bug is easy to introduce when construction, spawning, and ownership transfer are not synchronized.

Understanding the Root Cause

The race appears because stream setup and task startup are decoupled. A common pattern looks roughly like this:

  1. Create a sender/receiver pair.
  2. Wrap the sender in AsyncWriteStream.
  3. Spawn a task that consumes the receiver.
  4. Return the write stream to the caller.

At first glance, this seems fine. But in practice, the executor does not guarantee that the spawned task will begin polling immediately. If the caller drops the returned AsyncWriteStream quickly, the sender side may be closed before the background task has actually started processing the receiver side in a stable way.

There are a few variants of this failure mode:

  • The spawned task has not run yet, so the system observes the stream as closed earlier than intended.
  • The receiver depends on initialization performed inside the task, but that initialization never completes before the writer is dropped.
  • The implementation assumes that spawning itself is a sufficient handoff boundary, which is not true in async runtimes.

Technically, the root issue is not just dropping a value too early. It is the absence of a synchronization barrier between:

  • Creating the async stream abstraction, and
  • Confirming that the background task has fully accepted and pinned the receiver-side work.

In other words, spawn does not equal started. If the lifecycle of the stream depends on work happening inside the spawned task, the constructor must not expose a writer that can immediately be dropped unless the implementation is resilient to that ordering.

This is especially relevant in WASI Preview 2 I/O abstractions, where host-managed resources, backpressure, and async bridging can make timing-sensitive bugs much more visible than with purely synchronous code.

Step-by-Step Solution

The fix is to make ownership transfer and task readiness explicit. The safest pattern is to ensure the background task has acknowledged startup before returning or relying on the outer stream object.

There are two reliable approaches:

  • Handshake on task startup using a one-shot channel or readiness signal.
  • Move critical receiver initialization out of the spawned task so dropping the writer early is harmless.

For this issue, the most practical fix is usually a startup handshake.

1. Identify the fragile spawn pattern

A problematic implementation often resembles this:

let (tx, rx) = tokio::sync::mpsc::channel(buffer_size);

let stream = AsyncWriteStream::new(tx);

tokio::spawn(async move {
    while let Some(chunk) = rx.recv().await {
        sink.write_all(&chunk).await?;
    }
    Ok::<_, anyhow::Error>(())
});

stream

The issue here is that stream is returned immediately, while the spawned task may not yet have started. If the stream is dropped right away, the sender closes before task-side assumptions are safely established.

2. Add a readiness handshake

Use a oneshot channel to signal that the receiver task has started and taken responsibility for processing.

use tokio::sync::{mpsc, oneshot};

let (tx, rx) = mpsc::channel(buffer_size);
let (ready_tx, ready_rx) = oneshot::channel();

let stream = AsyncWriteStream::new(tx);

tokio::spawn(async move {
    let _ = ready_tx.send(());

    let mut rx = rx;
    while let Some(chunk) = rx.recv().await {
        sink.write_all(&chunk).await?;
    }

    Ok::<_, anyhow::Error>(())
});

ready_rx.await?;
stream

This creates a minimal but important barrier: the constructor does not proceed until the task has at least entered execution and signaled readiness.

3. If initialization happens inside the task, signal readiness only after it completes

If the receiver task performs setup such as binding host state, acquiring a resource handle, or wrapping a WASI sink, the handshake must happen after that setup.

let (tx, rx) = mpsc::channel(buffer_size);
let (ready_tx, ready_rx) = oneshot::channel();

let stream = AsyncWriteStream::new(tx);

tokio::spawn(async move {
    let mut output = prepare_wasi_output_sink().await?;
    let _ = ready_tx.send(());

    let mut rx = rx;
    while let Some(chunk) = rx.recv().await {
        output.write_all(&chunk).await?;
    }

    output.flush().await?;
    Ok::<_, anyhow::Error>(())
});

ready_rx.await?;
stream

This version is much safer because it acknowledges the real dependency: the writer should not be considered fully usable until the sink exists and the receiver loop is genuinely ready.

4. Consider holding the task handle or shared state when drop semantics matter

If the system needs deterministic shutdown, do not rely solely on channel closure. Store a JoinHandle, shared state, or explicit close signal.

pub struct AsyncWriteStream {
    tx: tokio::sync::mpsc::Sender<Bytes>,
    task: tokio::task::JoinHandle<anyhow::Result<()>>,
}

impl AsyncWriteStream {
    pub async fn close(self) -> anyhow::Result<()> {
        drop(self.tx);
        self.task.await??;
        Ok(())
    }
}

This is useful when the stream should not just disappear silently. Instead, closing becomes an explicit protocol step.

5. Prefer construction patterns that avoid hidden races

If possible, initialize the receiver-side machinery before creating the outer stream object. For example:

pub async fn new_async_write_stream() -> anyhow::Result<AsyncWriteStream> {
    let sink = prepare_wasi_output_sink().await?;
    let (tx, rx) = tokio::sync::mpsc::channel(32);

    let task = tokio::spawn(async move {
        run_writer_loop(rx, sink).await
    });

    Ok(AsyncWriteStream { tx, task })
}

This pattern reduces the amount of meaningful setup happening after spawn and makes the lifecycle easier to reason about.

6. Test the race explicitly

Race conditions often disappear in normal test runs, so add a regression test that forces early drop behavior.

#[tokio::test]
async fn dropping_writer_immediately_does_not_race_receiver_startup() {
    for _ in 0..1000 {
        let stream = new_async_write_stream().await.unwrap();
        drop(stream);
    }
}

You can strengthen this by inserting yield points and running under tools like Loom when feasible.

For this WASI issue, the best production-grade fix is usually:

  1. Create the sender/receiver pair.
  2. Spawn the receiver task.
  3. Signal readiness only after all receiver-side initialization is complete.
  4. Await that readiness before exposing or finalizing the AsyncWriteStream.
  5. Optionally keep a JoinHandle for explicit shutdown and error propagation.

This preserves async design while eliminating the startup race.

Common Edge Cases

1. The readiness signal fires too early

If you send the ready signal before acquiring the actual WASI sink or finishing receiver initialization, the race still exists. The handshake must represent true operational readiness, not merely task creation.

2. Background task errors are lost

If the spawned task fails and its JoinHandle is ignored, writes may appear to succeed while the sink is already dead. Consider surfacing task failure through:

  • An explicit close API
  • Shared error state
  • Awaiting the task during teardown

3. Drop order hides partial writes

When the sender is dropped, queued chunks may still exist. Depending on the channel and write loop, some buffered data might be processed and some might not. If write completion matters, define clear flush and close semantics.

4. Backpressure changes timing

With bounded channels, a full buffer can delay writes and make the race easier to trigger in stress scenarios. With unbounded channels, memory growth can hide the bug while creating a different operational problem.

5. Cancellation during startup

If the parent future is cancelled while waiting for readiness, you may leak task state or leave a partially initialized sink. Wrap startup carefully and ensure cancellation-safe cleanup.

6. Multi-runtime assumptions

If the implementation is expected to run across different executors or host embeddings, avoid depending on scheduler behavior such as immediate polling after spawn. The fix should be executor-agnostic.

FAQ

Why did this appear after moving streams to async?

Because synchronous code often performs setup in a single call path, while async code splits construction and execution across scheduled tasks. That introduces a window where the stream object exists, but the receiver task has not fully started or initialized yet.

Is keeping the writer alive longer enough to fix it?

No. Artificially delaying drop may reduce the frequency, but it does not remove the race. The correct fix is a synchronization mechanism or a constructor design that guarantees receiver readiness before the stream lifecycle can end.

Should I use a channel handshake or redesign the API?

If you need a minimal, targeted patch, use a oneshot readiness handshake. If you want the most maintainable long-term solution, redesign construction so receiver-side initialization completes before the outer AsyncWriteStream is considered ready.

The core lesson is simple: in async Rust, resource creation, task spawn, and ownership transfer must be treated as separate events. Once that is reflected in the WASI stream implementation, this race condition disappears reliably instead of only seeming fixed under favorable timing.

Leave a Reply

Your email address will not be published. Required fields are marked *