How to Fix: Cranelift: Wrong result for `rotl.i16` with `i128` shift value on riscv64

8 min read

Cranelift riscv64 bug: fixing wrong rotl.i16 results when the shift value is i128

This bug is a classic type-lowering mismatch: a rotate-left on a narrow integer is correct in the IR, but once Cranelift lowers it for riscv64, an i128 shift operand can bypass the expected masking or narrowing rules and produce the wrong machine-level behavior. If your .clif test uses a small value type and a much wider shift type, this is exactly the kind of backend bug that surfaces.

The issue can be reproduced with a Cranelift test like this, targeting riscv64:

test interpret
test run
target riscv64

function %a(i16, i128) -> i128 system_v {
block0(v0: i16, v1: i128):
    v2 = rotl v1, v0
    return v2
}

Even though the issue title mentions rotl.i16 with an i128 shift value, the real backend hazard is broader: rotate operations require the shift count to be normalized to the bit-width of the value being rotated. When that normalization is skipped, applied at the wrong width, or lowered inconsistently for RISC-V helper sequences, the generated code can rotate by an invalid count and return the wrong result.

Understanding the Root Cause

At the IR level, a rotate-left operation follows a well-defined rule:

rotl(x, y) = (x << (y mod N)) | (x >> (N - (y mod N)))

Where N is the bit-width of x. For example:

  • If x is i16, the shift count must be reduced modulo 16.
  • If x is i128, the shift count must be reduced modulo 128.

The bug appears when Cranelift lowers a rotate for riscv64 and the shift operand type is much wider than the value width used by the rotate logic. In practice, one of these backend mistakes usually causes the failure:

  1. The shift count is not masked at all before constructing the rotate sequence.
  2. The shift count is masked to the wrong width, such as 64 instead of 16 or 128.
  3. The shift count is truncated too late, after intermediate arithmetic has already used the wider i128 value.
  4. The lowering path assumes operand widths match, which is not true for mixed-width IR patterns.

On riscv64, this matters because Cranelift often expands rotate operations into a sequence of shifts, subtracts, masks, and ORs rather than relying on a single native rotate instruction for every case. If the shift amount remains as i128 too long, the expanded sequence can compute the wrong effective count. That leads directly to incorrect bits in the result.

In short, the root cause is: the rotate count must be canonicalized to the destination/value bit-width before backend expansion. Mixed-width inputs make this easy to get wrong unless the lowering code explicitly narrows and masks the count.

Step-by-Step Solution

The fix is to make the riscv64 lowering path normalize the rotate count using the width of the value being rotated, not the width of the shift operand. The safest implementation strategy is:

  1. Determine the bit-width of the rotate input.
  2. Convert or extract the shift count into the legal register width used by the lowering code.
  3. Apply a mask of bit_width - 1 when the width is a power of two, which it is for standard integer types.
  4. Build the rotate sequence using that normalized count.
  5. Add regression tests for mixed-width rotate operands on riscv64.

1. Reproduce the failure locally

Create or reuse a regression test that isolates the bug:

test interpret
test run
target riscv64

function %rot_bug(v_shift: i16, v_val: i128) -> i128 system_v {
block0(v_shift: i16, v_val: i128):
    v_res = rotl v_val, v_shift
    return v_res
}

Then run the Cranelift filetests or the relevant test target in the Wasmtime/Cranelift workspace.

2. Inspect the lowering path for rotate on riscv64

Find the backend code responsible for lowering or legalizing rotl/rotr for integer values. You are looking for logic that resembles:

masked = shift & (bits - 1)
left = value << masked
right = value >> (bits - masked)
result = left | right

If the implementation instead uses the raw shift operand directly, or computes bits - shift before masking/truncation, that is the bug.

3. Normalize the shift count before expansion

A correct lowering strategy should conceptually look like this:

fn lower_rotl(value, shift, ty_bits):
    let narrowed_shift = convert_shift_to_legal_width(shift)
    let masked_shift = narrowed_shift & (ty_bits - 1)
    let inv_shift = (0 - masked_shift) & (ty_bits - 1)

    let lhs = ishift_left(value, masked_shift)
    let rhs = ushr(value, inv_shift)
    return bor(lhs, rhs)

That inv_shift pattern is often safer than directly computing ty_bits - masked_shift, because it naturally wraps modulo the same width and avoids corner cases around zero shifts.

4. Make sure the mask matches the rotated value type

This is the most important correction. If the rotate is over an i128 value, use 127 as the mask. If it is over an i16 value, use 15. Do not derive the mask from the shift operand type.

// Correct concept
let width = value_type_bits(value)
let mask = width - 1
let normalized_shift = shift & mask

// Incorrect concept
let mask = shift_type_bits(shift) - 1

This distinction is exactly where mixed-width IR can break backend assumptions.

Because riscv64 is a 64-bit target, an i128 shift operand may be represented through multiple registers or split values during lowering. If your backend helper sequence only consumes one machine register as the shift amount, ensure the shift count is first reduced to the rotate domain and then passed in legal form.

For example, conceptually:

if value_bits == 128:
    normalized_shift = shift & 127
else if value_bits == 64:
    normalized_shift = shift & 63
else if value_bits == 32:
    normalized_shift = shift & 31
else if value_bits == 16:
    normalized_shift = shift & 15
else if value_bits == 8:
    normalized_shift = shift & 7

The backend can then split or lower the rotate itself as needed, but the count is already guaranteed correct.

6. Add a regression test for this exact bug

Add a dedicated filetest that combines:

  • riscv64 target
  • a rotate operation
  • a small or mixed-width shift operand
  • an i128 value path if that is where the failure occurs
test interpret
test run
target riscv64

function %rotl_i128_shift_mix(v0: i16, v1: i128) -> i128 system_v {
block0(v0: i16, v1: i128):
    v2 = rotl v1, v0
    return v2
}

; Add concrete run cases that verify:
; - shift 0
; - shift 1
; - shift 15
; - shift 16
; - shift 127
; - shift 128

If the original issue specifically involved an i16 rotate count interacting with an i128 rotate value, include both small and boundary counts. If the title reflects a narrow-value rotate with a wide shift operand, mirror that exact signature too. The point of the regression is to lock in the backend rule: the shift amount is always interpreted modulo the rotated value width.

7. Validate with interpreter and backend execution

The interpreter should already reflect the correct IR semantics. Your goal is to make the riscv64 backend produce the same result as interpretation.

# Example workflow
cargo test -p cranelift-filetests
cargo test -p cranelift-codegen
# Run any riscv64-specific test suites available in the workspace

After the patch, the backend-generated result must match the interpreter for every rotate boundary case.

Common Edge Cases

Once you fix this bug, several neighboring cases should also be reviewed because they fail for the same reason: inconsistent shift normalization.

1. Shift counts equal to the type width

A rotate by exactly 16 on i16, or by 128 on i128, should behave like a rotate by 0. If the backend does not mask the count, this case commonly returns garbage or a shifted zeroed result.

2. Very large shift counts

Counts like 255, 256, or larger i128 values must still collapse correctly modulo the rotate width. This is a strong signal that masking is done properly.

3. Signed versus unsigned shift-count handling

The shift count should be treated as a bit pattern for modulo masking purposes. If sign-extension leaks into the lowering path, negative-looking values may generate incorrect counts before masking.

4. rotr may have the same bug

If rotl is lowered through custom backend logic, the same code path or a mirrored implementation may exist for rotr. Fix and test both operations together when possible.

Backends sometimes legalize i8 or i16 operations via i32 or i64 intermediates. If the rotate is emulated at a wider width without restoring the original semantics, the result may be correct for some counts and wrong for others.

6. Split-lane handling for i128 on 64-bit targets

Because riscv64 is not a native 128-bit integer machine, any rotate over i128 may be implemented as a pair of 64-bit operations. The shift count still has to be reduced modulo 128 before deciding whether to rotate within halves or exchange halves.

FAQ

Why does this bug appear on riscv64 but not necessarily in the interpreter?

The interpreter executes the IR semantics directly, including the correct modulo behavior for rotate counts. The riscv64 backend has to lower that operation into legal machine instructions, and the bug occurs in that lowering step.

Why is an i128 shift operand dangerous for rotate lowering?

Because the shift operand width can be much larger than the width relevant to the rotate. If backend code accidentally uses the raw i128 value, or masks it using the wrong width, the resulting shift sequence does not match the rotate semantics.

What is the safest general fix pattern for rotate lowering?

Always normalize the shift count first using the bit-width of the value being rotated, then build the rotate from shifts and ORs. In other words: mask early, lower later.

For Cranelift maintainers, the key takeaway is simple: rotate correctness depends on the value type, not the shift operand type. Once the riscv64 lowering path enforces that rule consistently, this issue and its close variants disappear, and the added regression test ensures they stay fixed.

Leave a Reply

Your email address will not be published. Required fields are marked *