How to Fix: Cranelift: incorrect narrow `sdiv` lowering on AArch64

Updated June 10, 2026 6 min read

Aldawsari

7 min read

Cranelift on AArch64 Can Miscompile Narrow sdiv: Here’s the Correct Fix

When an i8 or i16 signed division is lowered incorrectly on AArch64, the generated machine code can compute the wrong result because the operands are not properly sign-extended before the hardware divide. This issue shows up clearly in a minimal Cranelift IR test like v2 = sdiv v0, v1 for i8, where the backend may treat narrow values as if their upper bits were already valid for a 32-bit divide.

Table of Contents

What the Bug Looks Like
Understanding the Root Cause
Step-by-Step Solution
Common Edge Cases
FAQ

What the Bug Looks Like

The issue is specific to narrow signed integer division on AArch64, especially for types like i8 and i16. AArch64 does not provide a native 8-bit or 16-bit signed divide instruction. Instead, division is performed using wider registers, typically via 32-bit or 64-bit divide instructions such as sdiv wX, wY, wZ or sdiv xX, xY, xZ.

That means a Cranelift sdiv on an i8 must first be lowered into a sequence that preserves the signed meaning of the original 8-bit values. If the backend skips explicit sign extension, values like 0x80 intended as -128 may instead be interpreted as 128 once widened, producing incorrect results.

function %div8(i8, i8) -> i8 {
block0(v0: i8, v1: i8):
  v2 = sdiv v0, v1
  return v2
}

For signed division, the correct semantic lowering is closer to:

sextend v0:i8 -> w0:i32
sextend v1:i8 -> w1:i32
sdiv w2, w0, w1
ireduce w2:i32 -> i8

If Cranelift instead emits logic equivalent to a zero-extended or unspecified upper-bit divide, the result becomes backend-dependent and wrong for negative narrow values.

Understanding the Root Cause

The root cause is a mismatch between Cranelift IR narrow integer semantics and AArch64 instruction requirements. In Cranelift, an i8 value is semantically only 8 bits wide. On AArch64, however, arithmetic divide instructions operate on 32-bit or 64-bit registers. The backend must therefore make the narrow signed value valid in that wider register width before issuing sdiv.

The technical failure usually comes from one of these lowering mistakes:

The narrow operands are widened without an explicit sign extension.
The lowering path assumes upper bits are already normalized.
The backend reuses values produced by earlier operations where the upper 24 or 16 bits are not guaranteed to match the sign bit.
The final narrowing is handled, but the pre-divide operand preparation is wrong.

This is especially dangerous on AArch64 because writing or manipulating sub-word values does not automatically guarantee that the full w register contains the signed interpretation expected by sdiv. For signed math, the backend must explicitly normalize values first.

Consider this example:

i8 lhs = -2   // 0xFE
i8 rhs =  2   // 0x02

The mathematically correct result is -1. But if 0xFE is widened incorrectly to 0x000000FE instead of 0xFFFFFFFE, the hardware divide computes 254 / 2 = 127, which narrows to 0x7F rather than 0xFF.

So the bug is not in division itself. It is in the lowering contract: narrow signed operands must be sign-extended before AArch64 SDIV.

Step-by-Step Solution

The correct fix is to update the AArch64 lowering path for narrow sdiv so that all inputs narrower than the legal divide width are explicitly sign-extended before the divide instruction is emitted.

Identify the lowering code path for AArch64 integer division.
Detect when the Cranelift input type is narrower than 32 bits, such as i8 or i16.
Insert explicit sign extension of both operands to i32 before emitting sdiv.
Emit the divide using a legal AArch64 width.
Reduce the result back to the original narrow type if required by the IR value type.
Add regression tests using negative operands to catch incorrect widening.

A backend lowering strategy should conceptually look like this:

// Pseudocode for correct lowering of narrow signed divide
fn lower_sdiv_narrow(lhs: Value, rhs: Value, ty: Type) -> Value {
    if ty == i8 || ty == i16 {
        let lhs_wide = sextend_i32(lhs);
        let rhs_wide = sextend_i32(rhs);
        let quot = aarch64_sdiv32(lhs_wide, rhs_wide);
        return ireduce_to_ty(quot, ty);
    }

    if ty == i32 {
        return aarch64_sdiv32(lhs, rhs);
    }

    if ty == i64 {
        return aarch64_sdiv64(lhs, rhs);
    }

    unreachable();
}

If your lowering infrastructure uses legalized intermediate ops rather than manual instruction selection, the fix can be expressed by forcing a legal expansion sequence before the machine instruction phase:

; Legalization-oriented idea
v0_sext = sextend.i32 v0   ; from i8/i16
v1_sext = sextend.i32 v1
v2_wide = sdiv v0_sext, v1_sext
v3 = ireduce.i8 v2_wide
return v3

In many backends, the bug exists because the narrow operation is directly mapped to a machine node without first inserting the semantic extension required by the target ISA. The durable fix is to make the extension explicit in lowering, not implicit in register allocation or operand emission.

A practical regression test set should include signed values that fail under zero-extension:

function %div8_neg2_by_2(i8, i8) -> i8 {
block0(v0: i8, v1: i8):
  v2 = sdiv v0, v1
  return v2
}

; Suggested cases to validate in test harness:
; (-2, 2)   => -1
; (-128, 2) => -64
; (-7, 3)   => -2
; (7, -3)   => -2

If the backend has separate lowering for signed and unsigned division, make sure only sdiv uses sign extension. The corresponding udiv path should use zero-extension semantics instead.

Implementation Checklist

[ ] Audit AArch64 lowering for sdiv on i8/i16
[ ] Ensure operands are explicitly sextended to i32
[ ] Emit 32-bit SDIV for widened narrow inputs
[ ] Reduce quotient back to original narrow type
[ ] Add regression tests with negative operands
[ ] Verify udiv lowering still uses zero-extension where appropriate

Common Edge Cases

Fixing the main bug is necessary, but several nearby edge cases can still break correctness if not tested carefully.

Negative narrow operands: This is the primary failure mode. Values such as 0x80, 0xFE, and 0xFF must be interpreted as signed negative values after widening.
Division by zero: Cranelift or the embedding runtime may define trap behavior. The narrowing fix must not accidentally change exception or trap semantics.
INT_MIN / -1 behavior: For signed division, overflow semantics matter. Verify how Cranelift models this and ensure the widened divide preserves the same trap or wrap behavior expected by the IR.
i16 paths: The same bug pattern often appears for i16 if the backend logic was shared or copied.
Incorrect reuse of partially defined registers: Even if some earlier instruction happened to write a narrow value, the backend must not assume the full register already contains the proper sign pattern.
Confusion with udiv: Applying sign extension to unsigned division would introduce a new bug. Keep the signed and unsigned lowering rules separate.
Optimization passes removing explicit extension: If a later combine pass drops the sextend because it thinks it is redundant, the original bug can reappear in a different form.

A good test matrix should cover i8 and i16, positive and negative combinations, divisors of 1 and -1, and values near signed boundaries.

; Recommended coverage ideas
; i8:   -128, -127, -2, -1, 0, 1, 2, 127
; i16:  -32768, -1, 1, 32767
; divisors: -3, -2, -1, 1, 2, 3

FAQ

1. Why does this bug affect `i8` and `i16` more than `i32`?

Because AArch64 divide instructions are legal for 32-bit and 64-bit registers, not native 8-bit or 16-bit signed division. Narrow types must be widened first, and if that widening is done incorrectly, signed semantics are lost. i32 does not need this extra sign-extension step before a 32-bit divide.

2. Should the fix use sign extension before or after lowering to machine instructions?

Before emitting the final AArch64 SDIV. The important part is that the operands reaching the machine divide are already semantically valid signed 32-bit values. Whether this is done during legalization or instruction selection depends on backend architecture, but it must be explicit.

3. Does this same issue apply to `udiv`?

Not in the same way. udiv requires zero extension, not sign extension. The bug described here is specifically about signed division where failing to propagate the sign bit changes the mathematical value before division.

In short, the reliable backend rule is simple: for narrow signed division on AArch64, sign-extend both operands to a legal divide width, perform SDIV, then reduce the result back to the original type. That preserves Cranelift IR semantics and prevents silent miscompilation for negative narrow integers.