How to Fix: Cranelift: riscv64 having memory side-effects when trapping

6 min read

Cranelift on riscv64 can commit memory side-effects before a trap, which breaks Wasm semantics and can surface as corrupted state in otherwise deterministic code.

This issue appears when Cranelift generates riscv64 machine code for operations that may trap, but the emitted instruction sequence allows a memory write or observable side-effect to happen before the trap is reported. In WebAssembly and in Wasmtime’s execution model, trapping instructions must behave as if they do not partially commit forbidden side-effects. If they do, the engine violates expected runtime guarantees.

Understanding the Root Cause

The bug is fundamentally about instruction ordering and trap semantics in the Cranelift backend for riscv64. Some generated code sequences combine:

  • a memory access or store with observable side-effects, and
  • an operation that may trap, such as a bounds check failure, illegal address formation, or other fault-triggering behavior.

For Wasm execution, the expected behavior is strict: if an instruction traps, the runtime must not expose memory modifications that should have been logically prevented by that trap. The failure mode happens when the backend lowers IR in a way that lets a store occur too early, or lets the hardware-visible sequence observe memory mutation before the trap point is finalized.

On riscv64, this can be especially subtle because backend lowering must preserve the ordering guarantees implied by the source IR, not merely emit a functionally similar sequence. If Cranelift hoists, merges, or schedules memory operations around trap-capable instructions incorrectly, then a store may become visible even though the enclosing Wasm operation should have aborted.

In practice, the root cause usually falls into one or more of these categories:

  • Incorrect lowering of a trap-before-store semantic into a store-before-trap machine sequence.
  • Missing side-effect barriers in backend instruction selection or legalization.
  • Faulting address generation where intermediate operations are not modeled conservatively enough.
  • Backend-specific scheduling assumptions that are valid on one ISA but not on riscv64.

The key technical point is that this is not just a crash bug. It is a semantic correctness bug, because trap behavior must be atomic from the perspective of observable state.

Step-by-Step Solution

The safest fix is to ensure that Cranelift’s riscv64 backend preserves trap-before-side-effect ordering for all relevant instruction patterns. That usually means auditing the lowering path for faulting memory operations and rewriting any sequence where a store can happen before a required trap boundary.

1. Reproduce the issue with a minimal Wasmtime program

Start from a reduced version of the reported test case and run it on a riscv64 target with Cranelift enabled.

use wasmtime::*;

fn main() -> Result<()> {
    let mut config = Config::default();
    config.strategy(Strategy::Cranelift);

    let engine = Engine::new(&config)?;
    let module = Module::new(&engine, r#"
        (module
            (memory 1)
            (func (export "run")
                ;; Insert the reduced wasm sequence that triggers
                ;; trapping behavior near a memory side-effect.
            )
        )
    "#)?;

    let mut store = Store::new(&engine, ());
    let instance = Instance::new(&mut store, &module, &[])?;
    let run = instance.get_typed_func::<(), ()>(&mut store, "run")?;

    let result = run.call(&mut store, ());
    println!("result = {:?}", result);
    Ok(())
}

If the bug is present, you may see a trap while memory has already been modified, which should not happen.

2. Inspect the generated Cranelift IR and machine lowering

Enable backend dumps so you can compare the intended IR ordering with the final emitted riscv64 sequence. The exact workflow depends on your local Wasmtime and Cranelift setup, but the goal is always the same: verify whether a store or other side-effecting instruction appears before the actual trap point.

# Example workflow
cargo build
RUST_LOG=cranelift=debug cargo test

# Or use whatever internal flags your checkout exposes for IR / vcode dumps.

Look for patterns like:

  • address computation split from the faulting operation,
  • store emission before explicit bounds/trap checks,
  • combined lowering where trap metadata is attached too late.

3. Fix the riscv64 lowering path

In the backend, ensure that any operation with possible trapping behavior is lowered so that no visible memory mutation can occur first. Depending on the exact buggy pattern, the fix generally looks like one of these approaches:

  • emit an explicit trap or bounds check before the store,
  • split complex lowering into a check phase and a write phase,
  • mark instructions more conservatively as having side-effects or trap behavior,
  • prevent instruction reordering across the trap boundary.

A conceptual correction looks like this:

// Incorrect conceptual lowering
store value, [addr]      // side-effect becomes visible
trap_if_invalid addr     // trap happens too late

// Correct conceptual lowering
trap_if_invalid addr     // fail first if needed
store value, [addr]      // commit only after safety is guaranteed

If the bug originates in legalization or instruction selection, update that layer instead of patching around it later in emission. The earlier the semantic contract is preserved, the less likely it is to regress elsewhere.

4. Add a regression test at the backend or Wasmtime level

This class of bug should always ship with a regression test. Add a test that proves memory remains unchanged when the trap occurs.

#[test]
fn riscv64_trap_does_not_commit_memory_side_effect() {
    // Pseudocode outline:
    // 1. Initialize memory with known bytes.
    // 2. Execute wasm or cranelift-generated code that traps.
    // 3. Assert trap occurred.
    // 4. Assert memory bytes are unchanged.
}

Make sure the test checks both:

  • that a trap actually occurred, and
  • that the memory region targeted by the operation is still unchanged after failure.

5. Validate on real riscv64 hardware or emulator

Because this is ISA-specific, confirm the fix on a true riscv64 environment or a trusted emulator. Cross-check behavior with other architectures to ensure the patch does not introduce backend inconsistencies.

# Example validation flow
cargo test -p cranelift-codegen
cargo test -p wasmtime

# Then run the reproducer on riscv64 target/runtime.

6. Prefer upgrading if the fix is already merged upstream

If you are hitting this in an application rather than contributing to the compiler backend, the best solution is often to upgrade to a Wasmtime release containing the fix. Check the relevant project pages such as Wasmtime on GitHub and related issue or release notes before maintaining a local patch.

Common Edge Cases

  • Hidden partial writes: even if the main store looks protected, helper-generated instructions may still touch memory or state too early.
  • Signal-based traps vs explicit traps: some failures are emitted as explicit checks while others depend on hardware faults. The backend must preserve semantics for both paths.
  • Optimized builds only: the issue may disappear in debug mode and surface only with optimization because instruction scheduling changes.
  • Multiple memory operations in one lowered sequence: a trap may be correctly placed for one access but not for another adjacent temporary write.
  • Host integration confusion: embedding code may assume the trap itself caused corruption, when the actual problem is backend-generated ordering before the trap surfaced.

FAQ

Does this bug mean Wasm memory is generally unsafe on riscv64?

No. It indicates a backend correctness issue in a specific code generation path. Once fixed, Wasmtime and Cranelift should again preserve expected WebAssembly trap semantics on riscv64.

Why is this more serious than a normal crash?

Because the problem is not only that execution traps. The more serious issue is that observable memory state may change before the trap, which violates the execution model and can break sandboxing assumptions, tests, and deterministic behavior.

Can I work around it without patching Cranelift?

Sometimes. You may be able to avoid the affected code path by upgrading Wasmtime, disabling the problematic optimization pattern if a temporary flag exists, or running on a different backend or architecture. But the real fix is to correct the riscv64 code generation so trap ordering is guaranteed.

For maintainers, the durable resolution is simple in principle: never allow a trapping operation on riscv64 to expose memory side-effects before the trap boundary is established. Once the lowering path enforces that rule and a regression test locks it in, this class of bug becomes much harder to reintroduce.

Leave a Reply

Your email address will not be published. Required fields are marked *