How to Fix: fuzz: different results for `f64sqrt`

6 min read

WebAssembly f64.sqrt mismatch: why fuzzing finds different results and how to fix it

A fuzz failure on f64.sqrt usually means your runtime, compiler backend, or interpreter is not preserving the exact WebAssembly floating-point semantics. The minimal module in this issue looks harmless, but square root is one of the classic places where NaN handling, rounding behavior, signed zero, and host CPU differences can leak through and produce divergent results.

Reproducing the bug

The reported module is a direct wrapper around f64.sqrt:

(module
  (type (;0;) (func (param f64) (result f64)))
  (func (;0;) (type 0) (param f64) (result f64)
    local.get 0
    f64.sqrt
  )
  (export "test" (func 0))
)

If a fuzzer reports different outputs across engines or execution modes, the failure is typically triggered by one of these input classes:

  • Negative values that should produce NaN
  • -0.0, where sign preservation matters
  • NaN payloads, where canonicalization rules may differ
  • Subnormal numbers, especially on hardware or compiler paths that flush denormals
  • Backend-specific math intrinsics that do not exactly match Wasm requirements

Understanding the Root Cause

The root problem is that WebAssembly floating-point operations are specified more strictly than many native code paths are implemented. Even though sqrt looks like a direct hardware instruction, several layers can introduce observable differences.

First, NaN semantics are subtle. For negative finite inputs, f64.sqrt returns NaN. But not all NaNs are equal at the bit level. Some engines preserve a payload, some canonicalize it, and some backend lowering paths accidentally return a host-dependent NaN pattern. If your test compares raw bits instead of semantic NaN equivalence, fuzzing will flag a mismatch immediately.

Second, signed zero matters. In IEEE-754 and WebAssembly, sqrt(-0.0) = -0.0. An implementation that normalizes negative zero to positive zero will diverge from a more compliant engine. This is easy to miss if tests print numeric output instead of checking bit patterns.

Third, subnormal handling can break determinism. Some CPUs or compiler configurations enable flush-to-zero or denormals-are-zero modes. When that happens, tiny inputs may be rounded or normalized differently before the square root executes, causing fuzz-only failures that are hard to reproduce in higher-level tests.

Fourth, the issue may be in the compiler lowering pipeline. If your engine translates Wasm f64.sqrt into a generic host math library call, that path may behave differently from a dedicated IEEE-compliant machine instruction path. This happens especially when mixing interpreter execution, JIT backends, C library calls, and platform-specific optimizations.

Finally, fuzzers often expose inconsistencies between interpreter vs JIT, debug vs release builds, or x86 vs ARM. That does not always mean the sqrt instruction itself is wrong; it may mean one execution path is canonicalizing results and another is not.

Step-by-Step Solution

The fix is to make your f64.sqrt implementation and validation logic match WebAssembly rules exactly, including NaN and signed-zero behavior.

1. Reproduce with bit-level assertions

Do not only compare printed floating-point values. Inspect the raw 64-bit result.

// Pseudocode for test validation
uint64_t bits(double x) {
  uint64_t u;
  memcpy(&u, &x, sizeof(u));
  return u;
}

void assert_f64sqrt(double input, uint64_t expected_bits) {
  double out = wasm_f64_sqrt(input);
  if (bits(out) != expected_bits) {
    fail(input, bits(out), expected_bits);
  }
}

Include targeted checks for -0.0, negative finite numbers, infinities, and NaNs.

assert_f64sqrt(-0.0, 0x8000000000000000ULL); // expect -0.0
assert_is_nan(wasm_f64_sqrt(-1.0));
assert_f64sqrt(+0.0, 0x0000000000000000ULL);
assert_f64sqrt(INFINITY, 0x7ff0000000000000ULL);

2. Canonicalize NaN handling consistently

If your engine policy is to return a canonical NaN for operations like sqrt, apply that rule consistently across all execution modes. If your implementation preserves payloads, then your interpreter, JIT, and reference tests must all follow the same rule.

// Example canonicalization helper
uint64_t canonicalize_nan_f64(uint64_t bits) {
  uint64_t exp_mask = 0x7ff0000000000000ULL;
  uint64_t frac_mask = 0x000fffffffffffffULL;
  if ((bits & exp_mask) == exp_mask && (bits & frac_mask) != 0) {
    return 0x7ff8000000000000ULL;
  }
  return bits;
}

Use this only if it matches your engine’s intended semantics and test oracle.

3. Preserve signed zero explicitly

If your current path uses a host helper that loses -0.0, add a fast path before calling the math primitive.

double wasm_f64_sqrt(double x) {
  if (is_negative_zero(x)) {
    return x; // preserve -0.0 exactly
  }

  double out = host_sqrt(x);

  // Optional canonicalization if your engine requires it
  if (isnan(out)) {
    return canonical_nan_f64();
  }
  return out;
}
bool is_negative_zero(double x) {
  uint64_t u;
  memcpy(&u, &x, sizeof(u));
  return u == 0x8000000000000000ULL;
}

4. Avoid non-deterministic host math paths

If your backend lowers Wasm sqrt through a platform library with inconsistent behavior, switch to the most direct and deterministic path available for that target. In practice this means:

  • Prefer a dedicated machine sqrt instruction when it matches Wasm semantics
  • Disable unsafe floating-point optimizations such as -ffast-math
  • Ensure no pass rewrites sqrt in ways that alter NaN or zero-sign behavior
  • Verify that JIT and interpreter use the same post-processing rules
# Good build hygiene for deterministic floating point
CFLAGS="-fno-fast-math"
CXXFLAGS="-fno-fast-math"

5. Check floating-point environment assumptions

Make sure the runtime is not executing with flush-to-zero or other altered floating-point environment settings that impact subnormals.

// Pseudocode: ensure runtime does not enable FTZ/DAZ for Wasm execution
void enter_wasm_fp_mode() {
  disable_flush_to_zero();
  disable_denormals_are_zero();
}

This is particularly important when native host code and Wasm execution share the same thread-local floating-point state.

6. Add regression tests for all problematic classes

Create a focused test suite around the bug instead of relying only on broad fuzz coverage.

(module
  (func (export "test") (param f64) (result f64)
    local.get 0
    f64.sqrt
  )
)
// Suggested regression inputs
-0.0
+0.0
-1.0
4.0
NaN
+infinity
smallest_positive_subnormal
largest_subnormal

For NaN-focused tests, compare according to your engine’s intended Wasm NaN policy, not generic host formatting.

7. Compare interpreter and JIT output on the same corpus

If the mismatch is internal, build a harness that runs the same inputs through each execution path and compares raw bits after normalization.

for input in corpus:
    interp = normalize(run_interpreter(input))
    jit = normalize(run_jit(input))
    assert interp == jit

This quickly reveals whether the problem is in decoding, lowering, machine code generation, or result normalization.

Common Edge Cases

  • sqrt(-0.0) returning +0.0 instead of preserving the sign bit
  • sqrt(negative finite) producing different NaN bit patterns across backends
  • Signaling NaN inputs being quieted differently by different execution paths
  • Subnormal inputs behaving differently when flush-to-zero is enabled
  • Fast-math compiler flags rewriting floating-point behavior in non-Wasm-safe ways
  • Host library calls differing from direct hardware instruction semantics
  • Bitwise test oracles flagging harmless NaN payload differences when semantic equality was intended

If your fuzzing framework is cross-engine, define up front whether it expects canonical NaNs or only checks for isNaN. That choice determines whether the bug is in the engine or in the test oracle.

FAQ

Why does this happen only in fuzzing and not in normal tests?

Fuzzers hit rare floating-point values such as NaNs, negative zero, and subnormals. Standard unit tests often cover only normal positive inputs like 4.0 or 9.0, which do not expose semantic mismatches.

Should NaN results be compared by raw bits or by isNaN?

It depends on your engine and oracle. If your implementation guarantees canonical NaN output, compare raw bits after canonicalization. If the spec contract for your layer allows payload variation, compare using isNaN instead of exact bits.

What is the most common implementation mistake for this issue?

The most common mistakes are losing the sign of -0.0, relying on a host math path with different NaN canonicalization, or compiling with unsafe floating-point optimizations that violate WebAssembly expectations.

The practical takeaway is simple: treat f64.sqrt as a spec-sensitive operation, not just a direct call to native sqrt. Once you standardize NaN handling, preserve signed zero, disable unsafe math rewrites, and test using raw bits where appropriate, this fuzz issue becomes reproducible and fixable.

Leave a Reply

Your email address will not be published. Required fields are marked *