How to Fix: Cranelift: Segmentation fault on riscv64 related to `call` instruction
Cranelift segfault on riscv64: why a call can corrupt returns and how to fix it
A segmentation fault in a Cranelift-generated riscv64 function that includes a call is usually a calling-convention bug, not a random backend crash. In this case, the trigger is the interaction between multi-value returns, implicit sret, vector-like values such as i8x16, and the RISC-V lowering rules for call arguments and return storage. When the backend assigns registers or stack slots incorrectly for the call sequence, the generated machine code can overwrite a live pointer, read from an invalid address, or restore the wrong value into the return path, producing a runtime segfault.
Table of Contents
Understanding the Root Cause
The issue centers on how Cranelift lowers a CLIF call instruction for the riscv64 ABI. The provided test enables enable_multi_ret_implicit_sret=true, which means functions that conceptually return multiple values may be transformed so that some return data is written through a hidden structure return pointer instead of being returned only in registers.
That transformation is safe only if the backend preserves three invariants:
- The hidden sret pointer must be passed in the correct ABI location.
- Any call-clobbered registers that still hold live values must be saved or re-materialized.
- Stack layout and alignment for large or vector-like values such as i8x16 must match what the callee expects.
On riscv64, this becomes tricky because argument lowering combines integer registers, stack slots, and special handling for values that do not map cleanly to a scalar register class. If a call uses a hidden return buffer while another argument like i8x16 is also present, the lowering phase may compute the wrong offsets or assign overlapping locations. The result is classic memory corruption:
- the return buffer pointer is overwritten before the call,
- the callee writes return data to an invalid address,
- the caller restores a bad value after the call, or
- register allocation assumes a value survives the call when the ABI says it does not.
The visible symptom is a segfault, but the underlying bug is usually one of these backend mistakes:
- incorrect ABI lowering for implicit sret on RISC-V,
- missing register clobber modeling around calls,
- wrong handling of SIMD/vector-like types in call signatures, or
- bad legalization when a type is split into multiple machine values.
In short: the call is not inherently broken; the crash happens because the backend emits a call sequence whose argument/return mapping does not respect the riscv64 calling convention under implicit sret.
Step-by-Step Solution
The most reliable fix is to isolate the bad lowering pattern, reduce it to a minimal reproducer, and then correct call lowering for the affected signature shape. If you are maintaining Cranelift itself, follow the steps below.
1. Reproduce with a minimized CLIF test
Start from the failing test case and reduce it until the crash still occurs with the smallest possible function that contains:
enable_multi_ret_implicit_sret=true- a riscv64 target
- at least one call
- the problematic argument or return type, such as
i8x16
set enable_multi_ret_implicit_sret=true
function u1:0(i64, i64, i8x16) -> i64, i64, i8x16 system_v {
block0(v0: i64, v1: i64, v2: i8x16):
v3, v4, v5 = call fn0(v0, v1, v2)
return v3, v4, v5
}
If the exact original signature is more complex, keep reducing until removing any single detail makes the crash disappear. That tells you which ABI combination is actually responsible.
2. Compare generated code before and after the call
Dump the legalized IR, VCode, or final assembly for the reproducer and inspect argument placement. You want to verify:
- where the hidden sret pointer is inserted,
- which registers are used for integer arguments,
- whether the
i8x16value is passed in registers or spilled to memory, and - whether any live value remains in a call-clobbered register.
cargo test -p cranelift-codegen riscv64 -- --nocapture
RUST_LOG=cranelift_codegen=trace cargo test -p cranelift-codegen your_test_name -- --nocapture
If your local setup provides dedicated CLIF test tooling, use that as well to inspect the exact lowering pipeline.
3. Audit the riscv64 ABI signature lowering
The bug is typically in the code that converts a high-level function signature into ABI-assigned locations. Review the logic that handles:
- multi-return lowering,
- insertion of the hidden sret argument,
- classification of i8x16 or other non-scalar types,
- alignment and size of stack-passed values, and
- call-site versus callee-side consistency.
A correct fix usually looks like one of the following:
- force unsupported vector-like values through a stack path with proper alignment,
- split legal values consistently on both caller and callee sides,
- reserve the hidden return pointer before assigning user-visible arguments, or
- mark more registers as clobbered so register allocation does not keep live data there across the call.
// Pseudocode: ensure hidden sret is assigned first.
if sig.uses_implicit_sret() {
abi_args.push(assign_int_reg_or_stack(hidden_sret_ptr));
}
for arg in user_args {
match classify_riscv64_arg(arg) {
IntReg => abi_args.push(assign_int_reg_or_stack(arg)),
StackOnly => abi_args.push(assign_stack_slot(arg, proper_align(arg))),
Split(parts) => assign_all_parts_consistently(parts),
}
}
4. Validate call clobbers and live ranges
Even with correct ABI placement, the crash can still happen if the register allocator or lowering code assumes a register survives a call when the ABI allows the callee to overwrite it. Check that all caller-saved registers are modeled correctly.
// Pseudocode: values live across a call must not stay in caller-saved regs
for value in live_across_call {
if assigned_to_caller_saved_reg(value) {
spill_or_move_to_safe_location(value);
}
}
This is especially important if the hidden sret pointer or split pieces of an i8x16 value are kept live across the call sequence.
5. Add a regression test
Once fixed, add a target-specific regression test so the exact lowering pattern cannot break again.
set enable_multi_ret_implicit_sret=true
target riscv64
function %callee(i64, i64, i8x16) -> i64, i64, i8x16 {
block0(v0: i64, v1: i64, v2: i8x16):
return v0, v1, v2
}
function %caller(i64, i64, i8x16) -> i64, i64, i8x16 {
block0(v0: i64, v1: i64, v2: i8x16):
v3, v4, v5 = call %callee(v0, v1, v2)
return v3, v4, v5
}
Good regression tests for this issue should verify more than “does not crash.” They should exercise:
- calls with hidden sret,
- mixed scalar and vector-like arguments,
- multiple return values, and
- both direct and indirect call forms if supported.
6. Use a practical workaround if you need an immediate unblock
If you cannot patch Cranelift immediately, the safest short-term workaround is to avoid the exact ABI shape that triggers the bug. Options include:
- disable
enable_multi_ret_implicit_sretif feasible, - rewrite the function to return a single scalar or pointer,
- replace
i8x16in the public call boundary with scalar pieces, or - pass large aggregate-like data via explicit memory buffers instead of implicit returns.
// Instead of returning multiple values directly:
// fn(a, b, vec) -> (x, y, vec2)
// Use an explicit out-pointer style API:
// fn(out: *mut RetBuf, a: i64, b: i64, vec_parts...)
This does not fix the backend, but it can eliminate the broken lowering path in production builds.
Common Edge Cases
Even after fixing the main bug, several adjacent cases can still fail if they are not covered by tests.
- Indirect calls: a function pointer call may use a slightly different lowering path than a direct call and reintroduce the same corruption.
- Type splitting: if
i8x16is split into multiple machine values, caller and callee must split it identically. - Stack alignment: RISC-V stack alignment bugs often appear only under optimization or with additional spilled temporaries.
- Tail calls: if tail-call lowering exists, hidden sret handling can be invalid there even when normal calls are fixed.
- Debug versus release behavior: a bad live range may “work” in debug builds and crash in optimized builds because register pressure changes.
- Cross-signature mismatches: if the declared CLIF signature and the lowered machine ABI disagree, the call may appear valid in IR but still crash at runtime.
A strong fix should test at least one variant for each of these edge cases on riscv64.
FAQ
Why does this crash only on riscv64 and not on x86_64?
Because calling conventions are target-specific. The same CLIF function can lower correctly on x86_64 but fail on riscv64 if the RISC-V backend mishandles hidden sret arguments, register classes, or stack layout.
Is the call instruction itself invalid in CLIF?
No. The call instruction is valid. The problem is how the backend lowers that call for a specific signature under implicit sret and mixed value types.
What is the safest long-term fix?
The best long-term fix is to correct riscv64 ABI lowering for implicit sret and add regression tests that cover multi-return calls with vector-like arguments. Workarounds such as avoiding i8x16 at the ABI boundary are useful, but they should not replace a backend fix.
For maintainers, the actionable takeaway is simple: inspect the call-site ABI assignment, verify hidden sret placement, ensure proper caller-saved register handling, and lock the result in with a regression test. That is the path from a mysterious segfault to a stable Cranelift riscv64 backend.