How to Fix: Cranelift: `srem.i8` giving wrong result on x86_64

Updated June 10, 2026 7 min read

Aldawsari

8 min read

Cranelift’s x86_64 lowering for srem.i8 can silently produce the wrong value when the dividend is materialized through a bmask-style path instead of being properly sign-extended before the remainder operation. The classic symptom is a case equivalent to srem -1, -1 returning the left-hand side representation rather than the mathematically correct remainder 0.

Problem Overview • Understanding the Root Cause • Step-by-Step Solution • Common Edge Cases • FAQ

Problem Overview

This bug appears in the x86_64 backend when lowering signed remainder for 8-bit integers. At a high level, signed remainder on x86 is tricky because the hardware instruction sequence is not a direct one-instruction mapping for i8 IR values. Instead, the compiler must prepare registers exactly the way IDIV expects them.

For signed division and remainder, x86 does not divide arbitrary 8-bit virtual values in isolation. It divides an implicit wider signed value held in fixed registers. That means the lowering must preserve the sign of the original i8 input all the way into the division setup. If the value is accidentally treated as a masked unsigned byte, -1 becomes 255, and the generated code computes a remainder from the wrong mathematical input.

That is why a fuzzed case resembling srem -1, -1 is so useful: it should always evaluate to 0, so any non-zero result strongly suggests broken sign handling during legalization or lowering.

Understanding the Root Cause

The root problem is a mismatch between IR signed semantics and the x86_64 machine-level preparation for division.

In Cranelift IR, srem.i8 a, b means:

signed_remainder((int8_t)a, (int8_t)b)

But on x86, signed division uses implicit registers and requires a correctly sign-extended dividend. For byte-sized division, the dividend is interpreted from a wider register state, and the backend must ensure the high bits represent the sign, not just a zero-extended or bit-masked form.

The failure mode usually looks like this:

An i8 constant such as -1 is materialized through a pattern equivalent to a bitmask.
That pattern preserves the low 8 bits as 0xFF but loses the fact that the source is signed.
When lowering srem, the backend feeds this value into the x86 signed division sequence without restoring the proper signed interpretation.
The machine code effectively operates on 255 instead of -1.
The computed remainder is therefore incorrect.

Why does the issue sometimes disappear when both operands come from function arguments? Because argument values may already flow through code paths that preserve the signed lane semantics better than constant materialization or certain legalization transforms. In other words, the bug is often not in the abstract srem rule itself, but in how specific input-producing nodes interact with the signed division lowering.

In compiler backend terms, this is usually one of these concrete mistakes:

Using a zero-extension where a sign-extension is required.
Matching a generic i8 producer as though it were semantically unsigned.
Failing to normalize the dividend into the exact register form expected before IDIV.
Extracting the remainder from the right register portion, but after an invalid setup of the dividend.

The key insight is simple: signed remainder is only correct if the dividend is sign-extended correctly before x86 division lowering.

Step-by-Step Solution

The fix is to make the signedness explicit during lowering and to prevent i8 values from entering the division path through an unsigned-mask representation.

1. Reproduce the bug with a minimal test

Create or add a Cranelift test case that isolates the bad pattern.

function %f() -> i8 system_v {
block0:
    v0 = iconst.i8 -1
    v1 = iconst.i8 -1
    v2 = srem v0, v1
    return v2
}

The expected result is:

Also add a variant using parameters to compare behavior:

function %f(i8, i8) -> i8 system_v {
block0(v0: i8, v1: i8):
    v2 = srem v0, v1
    return v2
}

2. Inspect the x86_64 lowering path for `srem.i8`

Review the signed division/remainder lowering in the x64 backend. Look for the point where the dividend is prepared for IDIV. The incorrect version often resembles one of these anti-patterns:

// Wrong idea: preserve only low 8 bits
val = val & 0xff

// Wrong idea: zero-extend before signed division
wide = uextend(val)

// Wrong idea: treat byte producer as raw bits without signed recovery
emit_idiv(wide, divisor)

The corrected approach must preserve signed semantics:

// Correct idea: sign-extend the i8 dividend first
wide_dividend = sextend_i8_to_dividend_width(val)
wide_divisor  = sextend_i8_to_operand_width(divisor)
emit_signed_divrem(wide_dividend, wide_divisor)
extract_remainder()

3. Force sign extension before division setup

If the lowering currently accepts an arbitrary i8 producer, normalize it first. Pseudocode:

fn lower_srem_i8(lhs: Value, rhs: Value) -> Value {
    let lhs_signed = sign_extend_i8(lhs);
    let rhs_signed = sign_extend_i8(rhs);

    let dividend = prepare_x86_signed_dividend(lhs_signed);
    let divisor = prepare_x86_divisor(rhs_signed);

    let rem = emit_idiv_and_get_remainder(dividend, divisor);
    return truncate_to_i8(rem);
}

This is the most important compiler invariant for the fix: never lower signed remainder from an i8 source that has only been masked or zero-extended.

4. Fix constant and `bmask`-like producers if needed

If the bad value originates earlier, the backend may need to canonicalize constants or legalize masked values before division lowering. For example:

// Problematic producer shape
v0 = bmask.i8 imm
v1 = srem v0, v2

Replace the effective semantics with a signed normalization step in lowering:

v0_signed = sign_extend_i8(v0)
v2_signed = sign_extend_i8(v2)
v3 = srem_lowered_signed(v0_signed, v2_signed)

If your lowering framework uses pattern matching, add a rule that explicitly distinguishes:

signed-producing byte values
unsigned or masked byte values

and route only the properly sign-normalized form into the signed division sequence.

5. Add regression tests for constants and arguments

Do not stop at the original fuzz case. Add tests covering the common failure matrix:

; constants
srem.i8 -1, -1   ; expect 0
srem.i8 -5, 2    ; expect -1
srem.i8 5, -2    ; expect 1

; argument-driven
srem.i8 arg0, arg1

; mixed forms
srem.i8 iconst(-1), arg0
srem.i8 arg0, iconst(-1)

Also include values around the sign boundary:

-128, -1, 0, 1, 127

6. Verify machine code expectations

After patching, inspect the generated x86_64 sequence and confirm that:

The dividend is sign-extended, not merely masked.
The divisor is prepared with the correct signed width semantics.
The remainder is extracted from the correct architectural location after signed division.

A good review question is: If the input is 0xFF, does this path still mean -1 by the time IDIV executes? If the answer is no, the bug is still present.

7. Document the backend invariant

Leave a short comment in the lowering code so the bug does not reappear during future refactors.

// Signed i8 div/rem on x86 must operate on a sign-extended dividend.
// Do not feed masked/zero-extended byte values into IDIV lowering,
// or values like 0xFF will be treated as 255 instead of -1.

Common Edge Cases

Once srem.i8 is fixed, several neighboring cases deserve attention.

1. Division by zero

srem x, 0 is not a normal arithmetic case. Ensure Cranelift’s semantics for traps or error handling remain intact after the lowering change.

2. `-128 % -1` and signed overflow-adjacent behavior

For two’s-complement signed arithmetic, the paired division case around INT_MIN / -1 is historically dangerous on x86. Even if the original issue is about remainder, verify the shared signed div/rem path handles i8::MIN correctly.

3. Incorrect reuse of unsigned lowering

It is tempting to share setup logic between urem and srem. That usually works only up to the point where signedness matters. If the implementation reuses unsigned preparation code, byte values with the high bit set will break.

4. Truncation after correct computation

Even with proper signed division, the backend can still fail if it extracts a wider remainder and truncates it incorrectly or too early. Keep the full signed value valid until the final result conversion.

5. Constant folding mismatch

If the optimizer folds srem.i8 correctly but the backend lowers the same expression differently, you get inconsistent behavior between optimized and non-optimized pipelines. Regression tests should hit both constant and non-constant forms.

6. Legalization across type boundaries

If i8 operations are legalized through i16, i32, or i64, ensure the transition uses sign extension for signed ops. A single accidental zero-extension in the legalization chain can recreate the bug.

FAQ

Why does this bug show up specifically with `srem.i8` on x86_64?

Because byte-sized signed division on x86 is not a simple direct operation on an isolated SSA value. The backend must prepare implicit machine registers exactly right, and that preparation is highly sensitive to whether the source byte is treated as signed or unsigned.

Why does `-1` become a problematic value here?

In raw bits, -1 as an i8 is 0xFF. If lowering preserves only the bits and loses the signed interpretation, the machine path can treat it as 255 instead of -1. That completely changes the remainder result.

Is the right fix to special-case constants like `-1`?

No. The robust fix is to enforce the correct sign-extension invariant for all i8 signed division and remainder lowering. Special-casing constants may hide the fuzzed example, but it will not solve the general backend correctness issue.

Conclusion

The bug is not really about srem as a mathematical operation. It is about losing signed byte semantics during x86_64 lowering. Once the dividend and divisor are normalized through proper sign extension before the signed division sequence, cases like srem -1, -1 return the correct value, and the backend becomes resilient against the broader family of masked-byte miscompilations.

The practical fix is straightforward: sign-extend first, lower second, test constants and arguments, and lock it down with regressions.