How to Fix: wasmtime: `issue_3327_bnot_lowering.wast` test relies on unspecified behaviour

7 min read

Why issue_3327_bnot_lowering.wast Fails on RISC-V: Removing an Unspecified SIMD Assumption in Wasmtime

This test is not exposing a broken RISC-V SIMD backend; it is exposing a test bug. The disabled issue_3327_bnot_lowering.wast case relies on behavior that the backend is free to implement differently, which means the test can pass on one architecture and fail on another while both remain WebAssembly spec compliant. The fix is to rewrite the test so it validates the guaranteed semantics of the Wasm SIMD instruction sequence rather than an artifact of a particular lowering strategy.

Understanding the Root Cause

The issue centers on a test named issue_3327_bnot_lowering.wast that was written to catch a historical lowering bug around SIMD bitwise negation, usually represented as a bitwise NOT pattern such as x ^ -1 or an equivalent transform. The problem is that the test appears to assume a specific machine-level lowering or a specific interpretation of intermediate values that is not guaranteed by the Wasm specification.

In Wasm SIMD, operations are defined in terms of their lane-wise or bitwise semantic result. They do not require a compiler backend to produce any particular instruction sequence. A backend may lower a bitwise NOT through:

  • a direct target instruction,
  • an XOR with an all-ones vector,
  • a sequence involving constant materialization and register moves, or
  • another equivalent canonicalization during instruction selection.

If a test checks observable program results, all of those are valid. If a test checks something architecture-specific indirectly, such as the exact handling of a constant mask, a trap-adjacent artifact, or a value whose behavior is unspecified in the test setup, then the test becomes fragile.

That is why RISC-V can be correct while the test still fails. The backend likely computes the right final vector value, but the test expects behavior derived from an implementation detail that other backends happened to share accidentally.

In practical terms, this usually happens when one of the following is true:

  • The test assumes a specific lowering pattern rather than the semantic result.
  • The test depends on undefined or unspecified bits after an operation chain.
  • The test was written around a historical regression and overfit to one backend.
  • The expected output encodes a backend artifact instead of the actual SIMD instruction contract.

For Wasmtime and Cranelift, the right fix is not to special-case RISC-V. The right fix is to make the test assert only what the spec guarantees.

Step-by-Step Solution

The safest resolution is to inspect the failing WAST, identify the assumption that is not required by the spec, and replace it with a semantics-based assertion.

1. Locate the test and inspect the instruction pattern

Open the WAST file in the Wasmtime repository and identify the function that exercises the bnot-related lowering path. Look for vector operations such as XOR with all-ones, bitselect-style transforms, or constants that are intended to represent bitwise inversion.

# Example workflow inside the repository
rg "issue_3327_bnot_lowering" -n .
sed -n '1,200p' path/to/issue_3327_bnot_lowering.wast

When reading the test, ask one question: Is this asserting the final WebAssembly result, or is it assuming how Cranelift lowers the operation?

2. Compare the test expectation to the Wasm SIMD semantic rule

For a bitwise NOT pattern, the semantic result is straightforward: every bit in the input vector is inverted. That means the expected output should be derived from the bit pattern itself, not from register allocation, constant materialization, or target instruction choice.

If the test currently depends on a chain like this conceptually:

(v128.xor input (v128.const i32x4 -1 -1 -1 -1))

Then the only valid assertion is the resulting inverted vector bits. Any lower-level assumption beyond that should be removed.

3. Rewrite the WAST to validate explicit deterministic outputs

Replace ambiguous or backend-sensitive assertions with direct input/output checks. A robust test uses concrete vector constants and verifies the exact resulting vector value.

;; Example of a semantics-focused test structure
(module
  (func (export "bnot_via_xor") (param v128) (result v128)
    (v128.xor
      (local.get 0)
      (v128.const i32x4 -1 -1 -1 -1)))

  ;; Input: 0x00000000 0xffffffff 0x12345678 0xaaaaaaaa
  ;; Output should be bitwise inverted lane-by-lane.
)

Then assert the exact result in WAST form, using values whose inversion is easy to verify.

;; Pseudocode-style expectation example
(assert_return
  (invoke "bnot_via_xor" (v128.const i32x4 0 -1 305419896 -1431655766))
  (v128.const i32x4 -1 0 -305419897 1431655765))

This kind of test is architecture-neutral because it checks only the output required by the spec.

4. Remove checks that depend on unspecified behavior

If the original test is validating something indirect, remove it. Common examples include:

  • Expecting a specific canonical IR lowering shape.
  • Assuming a backend preserves temporary values in a particular form.
  • Using data patterns where interpretation differs across lane types without making that explicit.
  • Relying on disassembly-like expectations in a WAST behavioral test.

Behavioral tests should remain at the observable WebAssembly level.

5. Run the relevant test suite on multiple backends

After rewriting the test, validate that it passes across architectures, especially the one that previously had the test disabled.

# Run the specific test if supported by the test harness
cargo test issue_3327_bnot_lowering -- --nocapture

# Or run the broader WAST/spec test buckets used by Wasmtime
cargo test wast -- --nocapture

If you have access to RISC-V CI or cross-target execution, verify the test there as well. The goal is to prove that the test now checks portable semantics.

6. Re-enable the RISC-V-disabled test

Once the test no longer depends on unspecified behavior, remove the architecture-specific exclusion and run the full affected suite.

# Example: update target skip list or disabled annotation
# Then rerun validation
cargo test --all -- --nocapture

If the test still fails after becoming semantics-based, then you likely have a genuine backend bug. At that point, inspect the generated Cranelift IR and target lowering path rather than the test itself.

7. Document the reason in the test or commit message

This type of issue tends to recur unless the intent is written down. Add a short comment explaining that the prior version relied on unspecified behavior and that the current assertions are based only on spec-defined vector results.

;; This test intentionally checks only the final SIMD bitwise result.
;; Earlier versions relied on backend-specific lowering behavior,
;; which was not guaranteed across architectures such as RISC-V.

Common Edge Cases

Even after rewriting the test, there are several pitfalls that can still cause confusion.

Lane interpretation mismatches

A v128 value is just bits, but WAST assertions may display those bits through a lane shape such as i8x16, i16x8, or i32x4. Make sure the expected constant is written using the same interpretation as the operation under test. The underlying bits are what matter, but human-readable lane notation can mislead reviewers.

Signed vs unsigned confusion

Bitwise inversion does not care about signedness, but constant notation does. For example, -1 and 0xffffffff are the same 32-bit pattern semantically, yet expressing expected outputs inconsistently can make a correct test look wrong.

Canonicalization differences

One backend may emit a native NOT instruction while another emits XOR with all ones. Both are correct. If your test accidentally inspects generated code or relies on implementation details exposed by debugging output, it will remain flaky.

Vector constant construction

Some regressions are really about how constants are materialized, not the operation itself. If you want to isolate bnot lowering, choose constants that make the expected result obvious and avoid mixing unrelated transforms into the same test.

Overly broad regression tests

A regression test should be narrow. If the original issue_3327_bnot_lowering.wast combines multiple SIMD operations, split it into focused assertions so future failures clearly identify whether the bug is in inversion, constant handling, or lane reinterpretation.

FAQ

1. Why is this considered unspecified behavior instead of a backend bug?

Because the failing expectation is not mandated by the WebAssembly SIMD spec. If the backend produces the correct final vector result, it is compliant even if it uses a different lowering strategy than x86 or AArch64.

2. Should Wasmtime add a RISC-V special case for this test?

No. A special case would preserve a flawed test. The correct fix is to rewrite the test so it validates only portable, observable semantics, then re-enable it universally.

3. How can I tell whether a SIMD regression test is portable?

Ask whether the assertion depends solely on the final Wasm-visible result. If the answer involves generated instructions, temporary representation, register allocation, or backend-specific canonicalization, the test is not portable enough.

To solve this GitHub issue, update issue_3327_bnot_lowering.wast so it no longer assumes a particular Cranelift lowering or any architecture-specific artifact. Assert only the exact bitwise result required by the Wasm SIMD specification, run the test across supported targets, and then re-enable it for RISC-V. This approach fixes the root problem, improves test portability, and avoids masking correct backend behavior as a false negative.

Leave a Reply

Your email address will not be published. Required fields are marked *