How to Fix: Cranelift: Wrong result on `tail` function with stack probes
Cranelift Wrong Result on tail with Stack Probes: Root Cause and Fix
This bug is a classic code generation correctness failure: a function using tail returns the wrong result only when stack probing and stack slots are both present. Disable probing, or remove the stack slots, and the failure disappears. That pattern points directly at a bad interaction between the prologue/epilogue stack layout, tail-call lowering, and the assumptions Cranelift makes about the outgoing frame.
Problem Overview
The issue appears when a Cranelift-compiled function performs a tail call while the backend also emits stack probes. Under normal tail-call semantics, the current frame must be torn down or transformed so control can jump into the callee without returning through the current function. But stack probing changes how the frame is materialized: the backend may insert probe sequences, adjust the stack in chunks, and reserve space for stack slots earlier or differently than the tail-call lowering expects.
When those mechanisms disagree, Cranelift can generate machine code that reuses an invalid stack state at the tail jump site. The result is not necessarily a crash. More dangerously, it can be a silent wrong result, because arguments, return address expectations, or stack-resident values are no longer where the generated code assumes they are.
Understanding the Root Cause
The technical root cause is that tail-call lowering requires a precise stack discipline, but stack probes mutate that discipline by introducing additional stack adjustments and memory touches during frame setup.
In practice, this bug usually comes from one of these backend mistakes:
- The tail-call path assumes the stack pointer is in its canonical pre-frame or post-frame position, but the probe-expanded prologue leaves it in a different state.
- The frame layout logic accounts for stack slots when computing local storage, but the tail-call emission does not fully reverse or normalize those adjustments before the jump.
- The probe sequence is treated as transparent by the tail-call code, even though it changes the effective frame size and may require a different teardown order.
- The backend emits a tail jump while values still live in probed stack memory or in slots whose offsets were computed relative to a frame state that no longer holds.
Why does the testcase pass when probing is disabled? Because the simpler prologue keeps the stack layout consistent with tail-call lowering assumptions. Why does it also pass when stack slots are removed? Because without local stack storage, the frame may become trivial enough that the stack pointer ends up matching the tail-call path again.
At the implementation level, the bug is typically fixed by ensuring that the tail-call emission path uses the final legal stack state after considering probes, dynamic frame allocation, and stack slot reservation. In other words, the backend must guarantee that a tail call is emitted only after the current frame has been restored to exactly the state required by the calling convention.
Step-by-Step Solution
The safest fix is to make tail-call lowering explicitly aware of probed frames and to prevent tail-call emission unless the backend can prove the stack is restored correctly.
1. Reproduce the failure with the minimal configuration
Start by confirming the interaction matrix:
# Expected: fails or returns wrong result with probes + stack slots enabled only
probe-stack=enabled + stack slots + tail => wrong result
probe-stack=disabled + stack slots + tail => passes
probe-stack=enabled + no stack slots + tail => passes
This confirms the bug is in the combined lowering path, not in tail calls alone.
2. Inspect prologue and tail-call machine emission
Look for the target backend code responsible for:
- Frame layout computation
- Stack probe insertion
- Tail-call lowering
- Stack pointer restoration before tail jump
The critical question is: does the emitted tail call jump with the stack pointer matching the ABI-required caller state?
// Pseudocode for the invariant you need before emitting a tail call:
fn emit_tail_call(...) {
restore_sp_to_tailcall_legal_state();
materialize_outgoing_args();
jump_to_callee();
}
3. Normalize the stack before the tail jump
The core fix is usually to restore the stack pointer as if the current frame had been fully torn down, including any adjustments introduced by probing.
// Incorrect pattern
allocate_frame();
probe_stack();
use_stack_slots();
// ...
jump_tail(callee); // SP may still reflect current frame
// Correct pattern
allocate_frame();
probe_stack();
use_stack_slots();
// ...
release_stack_slots();
undo_probe_related_sp_adjustments();
restore_frame_base_if_needed();
prepare_tail_args_in_abi_locations();
jump_tail(callee);
If your backend already has a shared epilogue helper, reuse that logic or factor it into a routine tail-call lowering can call safely.
4. Guard unsupported combinations
If a full correctness fix is not yet available for a target, add a conservative legality check: disable tail-call lowering for probed frames with stack slots. This is preferable to silently producing wrong code.
fn tail_call_allowed(frame: &FrameInfo) -> bool {
if frame.has_stack_probes && frame.has_stack_slots {
return false;
}
true
}
This is not the final optimization-friendly solution, but it is a valid short-term correctness patch.
5. Add regression tests at the IR level
You want tests that lock down all relevant dimensions:
- Tail call with fixed-size stack slots
- Tail call with stack probing enabled
- Tail call with both enabled
- Tail call with different frame sizes around the probe threshold
- Tail call on every affected ISA backend
; Pseudocode shape for a regression test
function %callee(i64) -> i64 {
block0(v0: i64):
return v0
}
function %tail(i64) -> i64 {
ss0 = explicit_slot 64
ss1 = explicit_slot 64
block0(v0: i64):
stack_store v0, ss0
v1 = stack_load.i64 ss0
return_call %callee(v1)
}
Then run the same test with stack probing enabled for the target configuration that originally failed.
6. Verify generated assembly
After patching, inspect the generated assembly around the prologue and tail jump. You should see:
- Probes inserted correctly for large frames
- Locals addressed with valid offsets
- No stale frame state at the tail jump
- The stack pointer restored to the ABI-expected state before transferring control
; Conceptual expected flow
prologue
probe loop or probe sequence
... local stack usage ...
restore sp
move tail args
jmp callee
7. Land the fix with a clear commit message
A strong commit message should describe the exact invariant that was violated, for example:
cranelift: restore canonical stack state before tail calls in probed frames
Tail-call lowering assumed a non-probed frame layout and could emit a tail
jump while stack-slot allocations were still active. This caused wrong results
when stack probing and local stack slots were both present. Normalize the stack
state before tail-call emission and add regression coverage.
Common Edge Cases
- Probe threshold boundaries: a frame just below the threshold may work, while one just above it fails because the probing path is different.
- Dynamic stack allocation: if the function also uses variable-sized stack space, tail-call legality becomes even stricter.
- Different ABIs per target: x86_64, aarch64, and other backends may have different stack alignment and tail-call constraints.
- Frame pointer preservation: if the backend uses a frame pointer, both SP and FP must be consistent before the tail jump.
- Outgoing stack arguments: some tail calls require argument placement on the stack, which can overlap dangerously with current-frame locals if teardown is incomplete.
- Shrink-wrapping or late prologue insertion: any pass that moves prologue/epilogue logic can reintroduce the bug unless the tail-call invariant is enforced centrally.
FAQ
Why does this produce a wrong result instead of crashing?
Because the generated code can still be executable while using a miscomputed stack layout. Arguments or local values may be read from the wrong offsets, producing valid but incorrect data.
Is disabling stack probes a real fix?
No. It is only a diagnostic workaround. The real fix is to make tail-call lowering and stack probing agree on the final stack state, or to reject unsupported cases explicitly.
Should Cranelift disable tail calls whenever stack slots exist?
Not generally. Tail calls with stack slots are valid if the backend correctly tears down the frame and restores the ABI-mandated state before the jump. The bug is in the lowering interaction, not in the concept of tail calls with locals.
The key takeaway is simple: a tail call is only correct if the current frame is gone in ABI terms. Once stack probing enters the picture, Cranelift must treat that requirement as a hard backend invariant, not an assumption.