How to Fix: Cranelift: Wrong result for `iabs.i16` on riscv64
Cranelift riscv64 bug: fixing wrong results for iabs.i16
When Cranelift lowers iabs for a 16-bit value on riscv64, the generated machine code can compute the absolute value using the wrong bit width. The result is subtle but dangerous: inputs that should stay positive after a logical shift can become incorrectly sign-extended, producing a bad i16 result at runtime even though interpretation passes.
Problem overview
The failing pattern is small but revealing:
test interpret
test run
target riscv64
function %a(i16, i64) -> i16 system_v {
block0(v0: i16, v1: i64):
v2 = ushr v0, v1
v3 = iabs v2
ret v3
}
At a high level, ushr produces a value that should be treated as an unsigned logical result in the lane width of i16. After that, iabs should operate on the same logical 16-bit domain. On riscv64, however, the lowering path can accidentally perform the absolute-value sequence in a wider register context without first rebuilding the correct 16-bit sign semantics. That breaks the contract of the IR operation.
Understanding the Root Cause
The root cause is a mismatch between IR type semantics and RISC-V register semantics.
In Cranelift IR, an i16 value conceptually lives in 16 bits. In riscv64, general-purpose registers are 64 bits wide, and narrow integer values often exist in registers with either unspecified upper bits or with extension performed by earlier operations. If the backend expands iabs.i16 as if it were a native 64-bit absolute value, it may inspect bit 63 instead of bit 15 when determining the sign, or it may preserve upper garbage/sign-extension from a previous step.
A common lowering strategy for absolute value is logically equivalent to:
mask = x >> (bits - 1)
result = (x ^ mask) - mask
That is correct only if x is already normalized to the exact lane width. For i16, the backend must ensure the sign bit being tested is bit 15, not a widened or stale sign bit from a 64-bit register representation.
Why does the provided test expose this? Because ushr on a narrow type can create values whose high 48 bits in the physical register are not guaranteed to match the intended signed interpretation of the i16 result. If iabs runs before an explicit narrow-width sign-extension or masking step, the backend may compute absolute value from the wrong sign source.
In short, the bug happens because narrow integer legalization is incomplete: iabs.i16 is being lowered with incorrect assumptions about how a 16-bit value is represented in a 64-bit RISC-V register.
Step-by-Step Solution
The fix is to normalize the input to true 16-bit signed form before applying the absolute-value sequence, or to implement a dedicated lowering path that explicitly operates on 16-bit semantics.
1. Reproduce the failure
Add or preserve the regression test:
test interpret
test run
target riscv64
function %a(i16, i64) -> i16 system_v {
block0(v0: i16, v1: i64):
v2 = ushr v0, v1
v3 = iabs v2
ret v3
}
This is important because interpret validates IR semantics, while run validates backend code generation on the target.
2. Inspect the riscv64 lowering for iabs
Look for the code path that expands integer absolute value for non-native narrow widths. The buggy logic usually resembles one of these patterns:
// Pseudocode: incorrect if x is not first normalized to i16 semantics
mask = srai x, 63
res = sub (xor x, mask), mask
or a branchless sequence emitted on the full register width.
The problem is not the formula itself; the problem is the missing sign-extension from 16 bits before using it.
3. Normalize to the lane width before computing absolute value
Use a canonical narrow-width preparation step. For i16, that usually means one of the following backend-safe forms:
// Option A: explicit sign-extension to XLEN from 16 bits
x16 = sign_extend_i16_to_xlen(x)
mask = srai x16, 15
res = sub (xor x16, mask), mask
res = reduce_to_i16(res)
// Option B: shift-left then arithmetic shift-right to rebuild i16 sign
x16 = slli x, 48
x16 = srai x16, 48
mask = srai x16, 15
res = sub (xor x16, mask), mask
res = sext_or_trunc_to_i16(res)
On riscv64, the exact helper names depend on the Cranelift backend structure, but the key invariant is the same: the operand must be sign-correct as i16 before abs is computed.
4. Prefer legalized lowering by type width
If the backend has a generic integer-abs lowering, split it by type class so narrow integer widths get explicit handling:
match ty_bits {
8 => lower_iabs_narrow(x, 8),
16 => lower_iabs_narrow(x, 16),
32 => lower_iabs_narrow(x, 32),
64 => lower_iabs_xlen(x),
_ => unreachable!(),
}
And then:
fn lower_iabs_narrow(x, bits) -> Value {
let x_norm = sign_extend_from_n_bits(x, bits);
let mask = srai(x_norm, bits - 1);
let res = sub(xor(x_norm, mask), mask);
truncate_or_retype_to_n_bits(res, bits)
}
This makes the backend behavior align with Cranelift IR semantics rather than host register width.
5. Validate with focused tests
Add more cases around sign boundaries to ensure the bug is truly fixed:
function %abs16_case0() -> i16 system_v {
block0:
v0 = iconst.i16 0
v1 = iabs v0
ret v1
}
function %abs16_case1() -> i16 system_v {
block0:
v0 = iconst.i16 -1
v1 = iabs v0
ret v1
}
function %abs16_case2() -> i16 system_v {
block0:
v0 = iconst.i16 -32768
v1 = iabs v0
ret v1
}
Also keep combinations with ushr, because that was the trigger pattern in the original issue.
6. Run target-specific tests
cargo test -p cranelift-codegen riscv64
cargo test -p cranelift-filetests
If you have a local setup for filetests or ISA-specific execution, run the exact .clif regression as both interpreter and backend execution.
7. Submit the regression test with the code fix
For backend correctness issues, the regression test is as important as the patch. It prevents future changes in legalization or instruction selection from reintroducing the same narrow-type bug.
Common Edge Cases
1. The minimum signed value
For i16, the absolute value of -32768 cannot be represented as a positive i16. Depending on IR semantics, the result may wrap and remain negative. Make sure your fix preserves existing Cranelift semantics instead of silently saturating.
2. Other narrow integer widths
If iabs.i16 is broken due to missing width normalization, iabs.i8 and possibly iabs.i32 on some lowering paths deserve review too. Bugs like this often appear as a family, not a one-off.
3. Values produced by logical operations
Instructions such as ushr, band, and some selects can leave upper bits in a state that is valid for the physical register but invalid for the intended narrow signed interpretation. Any later signed operation must re-establish lane-correct sign behavior.
4. Canonicalization mismatches
If one optimization pass assumes narrow values are already extended while another pass assumes upper bits are irrelevant, backend codegen can become inconsistent. This issue is a classic symptom of that mismatch.
5. Target-specific helper instructions
Some architectures have direct narrow-width operations or extension helpers; others require synthetic shift sequences. The implementation detail differs, but the correctness rule does not: always compute signed behavior from the declared IR width.
FAQ
Why does the interpreter pass while run fails on riscv64?
The interpreter executes pure Cranelift IR semantics, where i16 is truly 16-bit. The backend must map that to 64-bit registers on riscv64. The failure occurs in lowering or legalization, not in the IR itself.
Is the bug in ushr or in iabs?
The visible wrong result appears at iabs. ushr simply creates a narrow value pattern that exposes the backend mistake. The real problem is that iabs is computed without first restoring correct 16-bit signed semantics.
What is the safest long-term fix?
The safest fix is to centralize narrow integer normalization in lowering for signed operations like iabs, sshr, comparisons, and sign-sensitive transforms. That reduces the chance of one instruction family handling i16 differently from another.
Bottom line: this bug is caused by treating a logical i16 as if it were already a valid signed xlen-sized register value. Normalize first, then compute the absolute value, and lock the fix in with a riscv64 regression test.