How to Fix: s390x: ixx.trunc_sat_fxx_u with NaN should result in 0

7 min read

s390x incorrectly returning a non-zero or trap-adjacent path for ixx.trunc_sat_fxx_u when the input is NaN breaks a core WebAssembly guarantee: saturating unsigned truncation must produce 0 for NaN. If your i64.trunc_sat_f32_u, i64.trunc_sat_f64_u, i32.trunc_sat_f32_u, and i32.trunc_sat_f64_u tests already pass but the broader ixx.trunc_sat_fxx_u path still fails on s390x, the issue is usually in the architecture-specific lowering or fallback sequence that handles unordered floating-point comparisons incorrectly.

Understanding the Root Cause

The bug comes from a mismatch between WebAssembly saturating conversion semantics and the s390x code generation path for unsigned float-to-int truncation.

For WebAssembly, i32.trunc_sat_f32_u, i32.trunc_sat_f64_u, i64.trunc_sat_f32_u, and i64.trunc_sat_f64_u must behave like this:

  • If the input is NaN, return 0.
  • If the input is negative, return 0.
  • If the input is larger than the unsigned target range, return the target type’s maximum value.
  • Otherwise, truncate toward zero.

On s390x, the failure usually appears when the backend uses floating-point compare-and-branch logic that handles unordered comparisons incorrectly. In IEEE-754, NaN is unordered: it is not equal, less than, or greater than any numeric value, including itself. If the lowering sequence only checks numeric bounds such as x <= 0 or x >= max without an explicit NaN detection path, then NaN can slip into the regular conversion instruction path.

That is where architecture behavior matters. Some native conversion instructions:

  • produce undefined or implementation-specific intermediate results for NaN,
  • set condition codes that are later misread, or
  • require a separate unordered test before conversion.

In practice, the backend often already has working logic for some specific operators, but the shared helper for ixx.trunc_sat_fxx_u may not normalize NaN first. The result is an incorrect non-zero value, a path that saturates to max instead of zero, or a mismatch between 32-bit and 64-bit cases.

The fix is simple in principle: check for NaN before any unsigned saturation logic, and return 0 immediately.

Step-by-Step Solution

The most reliable repair is to update the s390x-specific lowering, codegen helper, or runtime stub so that NaN is handled explicitly before range checks and before the final conversion instruction.

Step 1: Reproduce the failing behavior

Create or use a minimal WebAssembly test that passes nan into the saturating unsigned truncation opcode.

(module
  (func (export "test_i32_f32") (param f32) (result i32)
    (i32.trunc_sat_f32_u (local.get 0)))

  (func (export "test_i32_f64") (param f64) (result i32)
    (i32.trunc_sat_f64_u (local.get 0)))

  (func (export "test_i64_f32") (param f32) (result i64)
    (i64.trunc_sat_f32_u (local.get 0)))

  (func (export "test_i64_f64") (param f64) (result i64)
    (i64.trunc_sat_f64_u (local.get 0))))

Your expected result for every exported function with a NaN input is 0.

Step 2: Locate the shared unsigned trunc_sat lowering path

Look for the architecture-specific implementation that handles:

  • i32.trunc_sat_f32_u
  • i32.trunc_sat_f64_u
  • i64.trunc_sat_f32_u
  • i64.trunc_sat_f64_u

Typical places include:

  • a backend instruction selector,
  • a macro assembler helper,
  • a builtins/runtime conversion stub, or
  • a shared helper for trunc_sat operators reused by s390x.

Step 3: Add an explicit NaN fast-path

The correct logic should check unordered input first. Pseudocode:

function trunc_sat_unsigned(x, target_bits):
    if isNaN(x):
        return 0

    if x <= 0:
        return 0

    max = (target_bits == 32) ? 4294967295 : 18446744073709551615

    if x >= max_plus_one_threshold:
        return max

    return truncate_toward_zero(x)

For backend code, the crucial part is not the high-level shape but the ordering:

  1. NaN detection first
  2. negative-or-zero handling next
  3. overflow saturation next
  4. native conversion last

Step 4: Implement the fix in the s390x path

If your code currently branches only on numeric comparisons, patch it so unordered values branch to the zero-result label.

// Pseudocode for s390x lowering/helper behavior
Label return_zero;
Label return_max;
Label do_convert;

// 1. Detect NaN explicitly
if (is_unordered(input, input)) goto return_zero;

// 2. Handle non-positive values for unsigned saturation
if (input <= 0.0) goto return_zero;

// 3. Handle overflow to unsigned max
if (input >= upper_saturation_threshold) goto return_max;

goto do_convert;

return_zero:
  result = 0;
  goto done;

return_max:
  result = UINT_MAX_OR_UINT64_MAX;
  goto done;

do_convert:
  result = truncate_toward_zero(input);

done:

On platforms like s390x, a common pattern for NaN detection is using a compare where unordered condition codes are explicitly checked. Another portable fallback is x != x, since only NaN is not equal to itself, though backend-level compare flags are usually preferred in code generation.

Step 5: Use the right saturation threshold

Unsigned saturation boundaries are subtle. For example:

  • i32.trunc_sat_f32_u saturates to 0xFFFFFFFF
  • i64.trunc_sat_f64_u saturates to 0xFFFFFFFFFFFFFFFF

But the comparison threshold is typically the first floating-point value that cannot truncate into the valid unsigned range. That means your compare constant may be max + 1 or an architecture-safe equivalent, not always the integer max expressed directly as a float.

// Conceptual thresholds
f32_to_u32_overflow_threshold = 4294967296.0
f64_to_u32_overflow_threshold = 4294967296.0
f32_to_u64_overflow_threshold = 18446744073709551616.0
f64_to_u64_overflow_threshold = 18446744073709551616.0

Be careful here because representability differs between f32 and f64, and some thresholds need to be encoded exactly the way your engine backend expects.

Step 6: Add regression tests

Cover all four opcode combinations and multiple NaN shapes.

(assert_return (invoke "test_i32_f32" (f32.const nan)) (i32.const 0))
(assert_return (invoke "test_i32_f64" (f64.const nan)) (i32.const 0))
(assert_return (invoke "test_i64_f32" (f32.const nan)) (i64.const 0))
(assert_return (invoke "test_i64_f64" (f64.const nan)) (i64.const 0))

Add more cases if your harness supports payload variations:

(assert_return (invoke "test_i32_f32" (f32.const nan:canonical)) (i32.const 0))
(assert_return (invoke "test_i32_f32" (f32.const nan:arithmetic)) (i32.const 0))
(assert_return (invoke "test_i64_f64" (f64.const nan:canonical)) (i64.const 0))
(assert_return (invoke "test_i64_f64" (f64.const nan:arithmetic)) (i64.const 0))

Step 7: Validate architecture-specific codegen

After patching, inspect generated code or disassembly if possible to confirm:

  • there is a dedicated unordered/NaN branch,
  • the zero path is reachable before conversion, and
  • the conversion instruction is never executed for NaN.

Step 8: Run the full conversion test matrix

Do not stop at the one failing case. Re-run:

  • signed trunc_sat conversions,
  • unsigned trunc_sat conversions,
  • f32 and f64 inputs,
  • i32 and i64 targets,
  • boundary values, infinities, subnormals, and negative zero.

Common Edge Cases

1. Negative zero

-0.0 is not NaN, but for unsigned saturating truncation it should still return 0. Make sure your zero/negative check does not accidentally send -0.0 into a conversion path that behaves inconsistently on s390x.

2. Positive infinity

+inf should not return zero. It must saturate to the unsigned maximum for the target type. If your fix is too broad and treats all non-finite values like NaN, you will introduce a new bug.

3. NaN payload variants

Different NaN encodings, including signaling and quiet NaNs where applicable, must all produce 0. Do not rely on a single canonical NaN test alone.

4. Threshold off-by-one errors

Unsigned saturation logic is easy to get wrong near:

  • 4294967295 and 4294967296 for u32,
  • 18446744073709551615 and 18446744073709551616 for u64.

If your compare uses the wrong boundary, valid values may saturate too early or overflow into the convert instruction.

5. Shared helper mismatch

If some opcodes already pass and others fail, there may be multiple lowering paths. One helper may already handle NaN correctly while another older helper does not. Search for duplicated logic instead of assuming one central implementation.

6. Condition code interpretation on s390x

The compare result may encode unordered in a separate condition state. If your branch sequence only checks less-than, equal, or greater-than, the unordered case can fall through accidentally.

7. Runtime fallback inconsistency

JIT and interpreter or baseline and optimizing tiers can diverge. If the bug appears only in one execution mode, make sure the fix is applied to every relevant lowering or runtime stub.

FAQ

Why should NaN return 0 for trunc_sat_u instead of trapping?

Because WebAssembly saturating truncation is defined to avoid traps. For unsigned variants, invalid or out-of-lower-bound inputs such as NaN and negative numbers saturate to 0, while overly large inputs saturate to the unsigned maximum.

Why do some truncation tests pass while ixx.trunc_sat_fxx_u still fails on s390x?

This usually means different opcodes are using different backend implementations. One path may already include explicit NaN handling, while the failing path relies on a conversion instruction or compare sequence that does not treat unordered inputs correctly.

Is checking x != x a safe way to detect NaN?

Yes, at a semantic level it is correct because only NaN is not equal to itself. However, in a compiler backend or macro assembler, using the architecture’s explicit unordered compare handling is often cleaner and more predictable for code generation on s390x.

Fixing this issue comes down to one rule: never let NaN reach the unsigned truncation instruction path on s390x. Detect it early, return 0, and back the change with regression tests covering every trunc_sat width combination.

Leave a Reply

Your email address will not be published. Required fields are marked *