How to Fix: [GC] `global.set` or `table.set` with large i31 reference results in `illegal hardware instruction`

6 min read

Large i31ref values in global.set or table.set can crash with an illegal hardware instruction because the engine is misclassifying out-of-range tagged values during GC reference handling.

This issue shows up when a WebAssembly module creates an i31 reference from a large i32 value and stores it through global.set, table.set, or related GC write paths. Smaller values work, but once the payload crosses the implementation boundary, execution can trap at the machine level instead of failing safely. The fix is to ensure the compiler and runtime both preserve the correct tagging, sign-extension, and representable range rules for ref.i31.

Reproducing the Bug

The issue is triggered by constructing an i31ref from a large integer and storing it in a mutable reference location such as a global or table.

(module
  (global $g (mut anyref)
    (i32.const 262136)
    (ref.i31)
  )
  (func (export "f")
    (i32.const 262136)
    (ref.i31)
    global.set $g
  )
)

In affected builds, values near a threshold may succeed while the next value crashes. That pattern is a strong signal that the bug lives in the engine’s value representation or lowering pipeline, not in the Wasm text itself.

Understanding the Root Cause

i31ref is not an ordinary boxed integer. It is a compact GC reference representation that packs a signed 31-bit integer payload into a tagged reference-like value. That means every stage of the engine must agree on three things:

  • how the value is tagged,
  • whether the source i32 is masked or sign-extended,
  • and whether storage paths treat it as a real heap reference, an immediate tagged value, or both.

This crash usually happens when the compiler generates a fast path for global.set or table.set that assumes the incoming GC value already matches the machine-level reference encoding. For smaller values, that assumption accidentally holds. For larger values, one of the upper bits flips into a position that changes how the runtime interprets the object.

Typical failure modes include:

  • Missing range normalization before ref.i31 encoding.
  • Incorrect sign handling when converting a 32-bit integer into a 31-bit tagged payload.
  • Write barrier or store logic treating an i31ref like a heap pointer and touching invalid memory.
  • JIT lowering mismatch between value creation and value storage.

The result is not a clean WebAssembly trap. Instead, the engine emits or executes invalid machine instructions or dereferences impossible tagged addresses, which surfaces as an illegal hardware instruction.

In short, the bug is caused by a mismatch between the semantic contract of ref.i31 and the engine’s low-level handling of tagged GC references during store operations.

Step-by-Step Solution

The reliable fix is to normalize i31 values at creation time and make every store path explicitly support immediate tagged references.

1. Enforce correct i31 payload encoding

When lowering ref.i31, do not pass through a raw i32 unchecked. Convert it into the exact internal form expected by the runtime.

// Pseudocode for lowering ref.i31
int32_t raw = pop_i32();
int32_t normalized = sign_extend_31(raw);
TaggedRef value = encode_i31ref(normalized);
push_ref(value);

The key point is that sign_extend_31 must match the specification and the runtime encoding. A common implementation pattern is:

static inline int32_t sign_extend_31(int32_t x) {
  return (x << 1) >> 1;
}

2. Fix global.set and table.set store paths

Any path storing anyref, eqref, or other GC-capable references must recognize that an i31ref may be an immediate tagged value rather than a heap object.

// Pseudocode for storing a GC reference
void store_gc_ref(Location dst, TaggedRef ref) {
  if (is_i31ref(ref)) {
    dst.write(ref);
    return;
  }

  if (is_heap_ref(ref)) {
    write_barrier(dst, ref);
    dst.write(ref);
    return;
  }

  if (is_null_ref(ref)) {
    dst.write(ref);
    return;
  }

  trap_or_abort("invalid GC reference representation");
}

If the current implementation blindly runs a heap-object-only write barrier, that is likely the bug.

3. Audit JIT and interpreter consistency

If the engine has multiple execution tiers, all of them must encode and store i31ref identically.

Checklist:
- Baseline compiler uses the same encode_i31ref() helper
- Optimizing compiler preserves 31-bit signed semantics
- Interpreter store path accepts immediate tagged refs
- Table initialization and global initialization use shared logic

4. Add a regression test for boundary values

This issue is a classic boundary bug, so tests should cover values just below, at, and above the threshold that previously crashed.

(module
  (global $g (mut anyref) ref.null any)
  (func (export "set") (param $x i32)
    local.get $x
    ref.i31
    global.set $g
  )
)
// Suggested test inputs
0
1
-1
262135
262136
1073741823
-1073741824

Also test table storage:

(module
  (table 4 anyref)
  (func (export "set") (param $i i32) (param $x i32)
    local.get $i
    local.get $x
    ref.i31
    table.set
  )
)

5. Validate with sanitizer and debug assertions

Add debug-only checks anywhere a GC reference is decoded or written.

assert(is_valid_tagged_ref(ref));
assert(!is_heap_ref(ref) || is_valid_heap_object(ref));

This turns a machine-level crash into an actionable failure much earlier in the pipeline.

6. Land the patch with a focused commit message

A good commit message for this issue would be:

Fix i31ref encoding in GC store paths

Normalize ref.i31 to signed 31-bit form and update global/table
store logic to handle immediate tagged references without applying
heap-only write barrier logic. Add regression tests for large i31ref
boundary values.

Common Edge Cases

  • Negative values: Bugs often appear only for positive boundary values, but negative i31 payloads can break if sign extension is wrong.
  • Table initialization: Even if table.set is fixed, elem segment initialization may still use an older store path.
  • Global initializers: The module may crash at instantiation time, not call time, if the bug exists in constant initialization logic.
  • Write barrier integration: Some runtimes assume every non-null GC value is a heap pointer. That assumption is invalid for i31ref.
  • Tiered compilation: The interpreter may work while the optimizing JIT crashes, or the reverse, if helper routines are duplicated.
  • Reference subtype confusion: anyref, eqref, and internal boxed reference types may share storage logic but not validation logic.
  • Pointer compression or NaN-boxing: Engines using compact tagged layouts are especially vulnerable to off-by-one tag bit mistakes.

FAQ

Why does the bug appear only for larger values?

Because small values may still fit the engine’s accidental fast-path assumptions. Once higher bits are set, the encoded i31ref may overlap with tag bits or pointer classification bits, exposing the bug.

Is this a WebAssembly spec issue or an engine implementation issue?

This is an engine implementation issue. The Wasm program is valid if ref.i31, global.set, and table.set are implemented according to the GC proposal semantics.

Should the engine trap instead of crashing?

Yes. Even if the value were mishandled internally, the engine should never reach an illegal hardware instruction. A crash indicates broken internal validation, incorrect code generation, or unsafe runtime assumptions.

To resolve this GitHub issue, make ref.i31 creation normalize to a signed 31-bit payload, ensure global.set and table.set accept immediate tagged references without heap-only handling, and add regression tests around the failing boundary values. That combination fixes the current crash and prevents future regressions in the engine’s GC reference representation pipeline.

Leave a Reply

Your email address will not be published. Required fields are marked *