How to Fix: Cranelift: Wrong result for function with `tail` calling convetion on AArch64
AArch64 tail calls in Cranelift can silently return the wrong value when the caller and callee disagree about how arguments and return values flow through registers and stack slots. The failure is especially nasty because the generated code may execute without crashing while still corrupting values such as i128, narrow integers like i12, or floating-point arguments passed around a tail calling convention.
Table of Contents
Understanding the Root Cause
This Cranelift bug appears when a function compiled for AArch64 uses the tail calling convention and performs a tail call with a signature containing values that stress ABI lowering, such as f64, i128, and non-standard integer widths like i12.
In a normal call, Cranelift lowers arguments into the target ABI locations, emits a call instruction, and then handles the callee return values after control comes back. In a tail call, the current function does not return after the call. Instead, it reuses the current stack frame or tears it down and jumps directly to the callee. That optimization is only correct if the outgoing call frame exactly matches what the callee expects.
The wrong-result behavior comes from a mismatch in one of these areas:
- ABI argument placement: large values such as i128 may be split across multiple registers or register-plus-stack locations.
- Register clobbering: preparing a tail call can overwrite a register that still contains a later outgoing argument.
- Stack slot overlap: outgoing stack arguments may be copied into locations that overlap with still-needed incoming arguments.
- Narrow integer extension: values such as i12 must be consistently zero-extended or sign-extended according to the lowered type semantics.
- Tail-call frame finalization: the stack pointer, frame pointer, and return address handling differ from a regular call and must be emitted in the correct order.
Fuzz-generated CLIF tests are good at exposing this class of bug because they combine unusual signatures with many ABI boundaries. A minimal reproducer usually looks like this shape:
test interpret
test run
target aarch64
function %callee(f64, i128, i128, i12) -> i128 tail {
block0(v0: f64, v1: i128, v2: i128, v3: i12):
; arithmetic that makes corrupted arguments visible
return v2
}
function %caller(f64, i128, i128, i12) -> i128 tail {
block0(v0: f64, v1: i128, v2: i128, v3: i12):
return_call %callee(v0, v1, v2, v3)
}
The exact failing test may differ, but the important trigger is the combination of AArch64, tail calls, and ABI-heavy parameter movement.
Step-by-Step Solution
The fix should be approached as a backend correctness issue, not as a CLIF test workaround. The goal is to make AArch64 tail-call lowering preserve all outgoing arguments until they are safely placed in their final ABI locations.
1. Reproduce the failure with the CLIF test
Save the failing test as a dedicated file under the Cranelift filetests directory, for example:
cranelift/filetests/filetests/aarch64/tail-call-i128.clif
Run the filetest against the AArch64 target:
cargo run -p cranelift-filetests -- cranelift/filetests/filetests/aarch64/tail-call-i128.clif
If the issue is present, test interpret and test run will disagree, or the run result will differ from the interpreter result.
2. Reduce the test while preserving the ABI pressure
Before changing backend code, reduce the CLIF to the smallest signature that still fails. Keep the types that create the bug:
target aarch64
test interpret
test run
function %callee(f64, i128, i128, i12) -> i128 tail {
block0(v0: f64, v1: i128, v2: i128, v3: i12):
return v2
}
function %caller(f64, i128, i128, i12) -> i128 tail {
block0(v0: f64, v1: i128, v2: i128, v3: i12):
return_call %callee(v0, v1, v2, v3)
}
This makes the bug easier to diagnose because any wrong result points directly at tail-call argument lowering rather than unrelated arithmetic.
3. Inspect the generated AArch64 code
Generate the lowered assembly or disassembly for the reduced test and inspect how arguments are moved before the tail jump:
cargo run -p clif-util -- compile --target aarch64 path/to/tail-call-i128.clif
Look for suspicious instruction ordering such as:
- A register used as a source after it has already been overwritten.
- Stack arguments copied in an order that overwrites another pending source.
- An i128 value where only one 64-bit half is moved correctly.
- A tail jump emitted before all ABI locations are fully populated.
4. Fix tail-call argument shuffling
The robust fix is to treat outgoing tail-call argument placement like a parallel move problem. All source values must conceptually move at the same time, even if the emitted machine instructions are sequential.
The backend should use a temporary register or stack spill when a destination overlaps with a still-needed source:
; Conceptual problem:
; arg0 source is x0, destination is x1
; arg1 source is x1, destination is x0
; Incorrect sequential move:
mov x1, x0
mov x0, x1
; Correct move with temporary:
mov x16, x0
mov x0, x1
mov x1, x16
For Cranelift, this usually means auditing the AArch64 call-lowering path responsible for return_call or tail call emission and ensuring it uses the same safe move-resolution logic as regular ABI argument setup, with tail-call-specific stack handling applied afterward.
5. Validate stack and register ABI locations
For AArch64, confirm that the implementation respects the platform ABI rules:
- General-purpose arguments use integer registers first, then stack slots.
- Floating-point arguments use floating-point/SIMD registers where applicable.
- i128 values are represented as multiple machine words and must preserve both halves.
- Stack alignment remains valid before the tail jump.
The important ordering rule is: resolve all outgoing argument moves first, finalize the frame second, and emit the tail jump last.
; Safe high-level lowering order
resolve_outgoing_tail_call_arguments()
restore_or_adjust_tail_call_frame()
emit_tail_jump_to_callee()
6. Add a regression test
Once the backend fix is in place, keep the reduced CLIF test in the AArch64 filetests suite. The regression test should include both interpreter and run modes so future changes catch wrong-code behavior:
test interpret
test run
target aarch64
Use a return value that depends on the argument most likely to be corrupted, such as the second i128 argument or a value derived from both halves of it.
7. Run focused and broader validation
Start with the new filetest:
cargo run -p cranelift-filetests -- cranelift/filetests/filetests/aarch64/tail-call-i128.clif
Then run the wider Cranelift test suite if time allows:
cargo test -p cranelift-codegen
cargo test -p cranelift-filetests
A correct fix should make the interpreter and compiled execution agree while preserving existing AArch64 call and return behavior.
Common Edge Cases
- Register cycles: argument swaps such as x0-to-x1 and x1-to-x0 require a temporary register or spill slot.
- Mixed integer and floating-point arguments: f64 values may travel through FP registers while i128 values use integer registers, so both register classes must be handled independently.
- Stack-passed i128 values: if an i128 spills to the stack, both 64-bit lanes must be copied in the correct order without overlap corruption.
- Narrow integer types: non-standard widths like i12 must not accidentally retain stale high bits after lowering.
- Caller and callee signature mismatch: tail calls are only valid when the ABI-level transition can be represented safely; incompatible signatures should be rejected or lowered conservatively.
- Platform-specific tail-call restrictions: some AArch64 environments impose additional constraints for unwind info, frame pointers, pointer authentication, or branch target identification.
FAQ
Why does this only happen with tail calls?
A regular call can use a normal outgoing call sequence and then return to the caller. A tail call replaces the current frame and jumps directly to the callee, so argument movement, frame teardown, and control transfer happen in a tighter sequence. If any argument source is overwritten too early, the callee receives the wrong value.
Why do i128 arguments make the bug easier to trigger?
On AArch64, an i128 value is too large for one 64-bit general-purpose register. It must be split across multiple ABI locations. That increases the chance of register overlap, partial copies, stack fallback, or incorrect move ordering.
Is the CLIF test wrong, or is the backend wrong?
If test interpret produces the expected result and test run produces a different result for the same CLIF, the backend is wrong. The interpreter executes Cranelift IR semantics directly, while the run test exercises generated machine code. A mismatch points to lowering, register allocation, or ABI emission.
Can this be fixed by disabling tail calls?
Disabling tail calls avoids the immediate wrong result, but it does not fix the compiler bug. The correct solution is to repair AArch64 tail-call lowering so outgoing ABI arguments are moved safely and the final tail jump observes the correct calling convention.