How to Fix: `tls::get` runtime panic

6 min read

A panic in tls::get is a classic sign that thread-local state is being accessed outside the runtime conditions it was designed for. In this Wasmtime regression, the failure shows up while exercising wasip3 filesystem support, but the real problem is deeper: code introduced by the referenced change ends up touching TLS-backed runtime data in a context where that data is not initialized, has already been torn down, or is being reached from the wrong execution boundary.

Understanding the Root Cause

The panic happens because tls::get assumes a valid runtime-local context is available. That assumption breaks when a code path introduced in the filesystem work reaches runtime internals from a place that is not inside the expected Wasmtime call scope.

In practice, this usually means one of the following:

  • A host-side callback, destructor, or async boundary runs after the runtime context has been dropped.
  • A helper introduced by the regression reads thread-local storage before the runtime has installed its per-thread state.
  • The code executes on a different thread than the one where the store or call context was established.
  • An internal convenience API uses tls::get in a place where it should instead receive context explicitly as a parameter.

That last case is typically the most important. TLS is convenient, but fragile when execution can cross host/guest boundaries, async scheduling points, resource destructors, or layered component-model adapters. Filesystem operations are especially good at exposing this because they often allocate resources, trigger cleanup paths, and cross abstraction layers that were not originally written with strict runtime-context lifetimes in mind.

So the root cause is not just “TLS panics.” The real issue is implicit runtime context lookup in a path that now executes outside the guaranteed lifetime of that context.

Step-by-Step Solution

The most reliable fix is to remove the hidden dependency on tls::get from the failing path and pass the required context explicitly. If the code truly needs runtime state, it should receive a Store, caller handle, or equivalent context object from its immediate caller instead of recovering it through thread-local lookup.

1. Reproduce the panic consistently

Start by running the failing test case from the issue and capture the backtrace:

RUST_BACKTRACE=1 cargo test wasi_filesystem -- --nocapture

If the test name differs in your checkout, search for the exact case first:

cargo test wasi -- --list

2. Identify the exact tls::get caller

Find where runtime-local access is being used in the failing stack:

rg "tls::get|with_tls|current_store|current_caller" crates/

Focus on the frame closest to the new filesystem or component-model path. The key question is: why is this code trying to recover execution context implicitly?

3. Refactor the API to accept explicit context

A typical problematic pattern looks like this:

fn filesystem_helper(arg: &Path) -> Result<Output> {
    let runtime = tls::get();
    runtime.do_filesystem_work(arg)
}

Replace it with explicit context threading:

fn filesystem_helper(cx: &mut StoreContextMut<'_, T>, arg: &Path) -> Result<Output> {
    cx.data_mut().do_filesystem_work(arg)
}

Then update the caller so the function runs only while the store context is definitely valid:

fn host_call<T>(mut store: StoreContextMut<'_, T>, path: &Path) -> Result<Output> {
    filesystem_helper(&mut store, path)
}

4. Guard teardown and destructor paths

If the panic appears during cleanup, avoid performing runtime-dependent work in Drop implementations or deferred destructors unless the required state is passed in explicitly before teardown begins.

impl Drop for ResourceState {
    fn drop(&mut self) {
        self.cached_handle.take();
    }
}

If cleanup must talk to runtime state, move that logic into an explicit method:

impl ResourceState {
    fn close(&mut self, cx: &mut StoreContextMut<'_, T>) -> Result<()> {
        // perform runtime-aware cleanup here
        Ok(())
    }
}

5. Verify thread and async boundaries

If the filesystem path crosses tasks or threads, ensure the runtime context is not assumed to follow automatically:

// bad: spawning work that later expects TLS runtime state
std::thread::spawn(move || {
    let _ = filesystem_helper(path);
});

Instead, extract the needed data before the boundary, or re-enter through a valid API that re-establishes the required context.

6. Add a regression test

The best long-term fix is a test that exercises the exact lifetime boundary that caused the panic: filesystem operation, component call, resource cleanup, and host return. That ensures future refactors do not quietly reintroduce TLS-dependent access.

#[test]
fn wasi_filesystem_does_not_touch_tls_outside_call_scope() {
    // construct store
    // invoke the filesystem path from the original repro
    // assert success or structured error, but never panic
}

7. Prefer structured errors over panics

If a context lookup can legitimately fail, return an error that explains the misuse instead of panicking. Panics in runtime plumbing make regressions much harder to debug and can mask the actual ownership or lifetime bug.

Common Edge Cases

  • Destructor-triggered panics: a resource drops after the guest call returns, and cleanup still tries to access runtime-local state.
  • Cross-thread execution: a helper works in unit tests but panics under parallel execution because the TLS slot is only initialized on the original thread.
  • Async suspension: a future resumes after the original call scope ended, so implicit runtime access is no longer valid.
  • Nested host/guest transitions: component-model adapters may introduce call paths where a lower layer assumes TLS is present but the upper layer did not establish it.
  • Partial refactors: one helper gets updated to accept explicit context, but a deeper utility still calls tls::get, leaving the panic in place.
  • Test-only blind spots: a happy-path filesystem test passes, while error cleanup, canceled operations, or resource finalization still panic.

FAQ

Why does this panic only show up in the filesystem work?

Filesystem flows often stress resource lifetime management, cleanup behavior, and component-model adapters more than simpler calls. That makes them much more likely to expose hidden dependencies on thread-local runtime state.

Is wrapping tls::get in an Option check enough?

Usually no. That may suppress the panic, but it does not fix the architectural problem. If the code requires runtime context, the safer design is to pass that context explicitly so lifetime and ownership stay correct.

What is the safest long-term fix in Wasmtime-style runtime code?

The safest fix is to remove implicit context recovery from sensitive paths, especially around host functions, async work, destructors, and resource handling. APIs should receive the store or caller context directly, and invalid access should become a normal error path rather than a panic.

If you are patching this regression in Wasmtime itself, inspect the change introduced by the referenced commit and trace every newly introduced call chain that reaches tls::get. The bug is most likely resolved not by changing TLS alone, but by moving the affected filesystem path back under a valid runtime context or by making that context an explicit function parameter all the way down.

Leave a Reply

Your email address will not be published. Required fields are marked *