How to Fix: Segmentation Fault on riscv64 with opt_level=speed

6 min read

A segmentation fault on riscv64 that appears only when opt_level=speed is enabled is usually a signal that the backend is emitting invalid machine code after an optimization pass rewrites control flow, return handling, or stack layout assumptions. In this case, the combination of speed-focused optimization, frame pointer preservation, and multi-return implicit sret lowering can expose a backend bug where the generated function prologue, epilogue, or return-value plumbing becomes inconsistent on the riscv64 target.

Understanding the Root Cause

The failing test case is small, but the flags tell the real story:

test optimize
    set opt_level=speed
    set preserve_frame_pointers=true
    set enable_multi_ret_implicit_sret=true

function u1:4(i64)...

On riscv64, a crash at runtime or during generated code execution often comes from one of these low-level failures:

  • incorrect stack frame layout
  • bad handling of the hidden sret pointer
  • register allocation conflicts introduced by aggressive optimization
  • an epilogue/prologue mismatch when frame pointers are preserved
  • mis-lowered multi-value returns that overwrite argument or return registers

The most likely root cause here is the interaction between multi-return implicit sret and the riscv64 calling convention under speed optimization. When the compiler decides to lower multiple return values through an implicit structure return pointer, it changes how values are passed and where they are written. If a later optimization pass assumes the original return convention, or if the backend fails to keep the hidden pointer alive across rewritten blocks, the generated code may dereference an invalid address and crash with a segmentation fault.

The flag preserve_frame_pointers=true increases the chance of exposing the bug because it forces a more specific prologue/epilogue shape. On some backends, that changes register pressure and stack slot placement enough to surface a latent codegen issue that does not appear with default settings.

In short, the bug happens because backend lowering and optimization disagree about how the function should return values on riscv64, and the mismatch becomes visible only under the speed optimization pipeline.

Step-by-Step Solution

The fix is to isolate which transformation is producing invalid riscv64 code, then either disable the problematic lowering path or patch the backend so the hidden sret value, stack slots, and preserved frame pointer rules remain consistent.

1. Reproduce the failure with the minimal test

Start by keeping the test case as small as possible and verify that the crash depends on the exact flag combination.

test optimize
    set target=riscv64
    set opt_level=speed
    set preserve_frame_pointers=true
    set enable_multi_ret_implicit_sret=true

function u1:4(i64) -> i64, i64 {
block0(v0: i64):
    v1 = iconst.i64 1
    return v0, v1
}

If removing any one of these settings makes the crash disappear, you have confirmed an interaction bug rather than a generic parser or frontend problem.

2. Compare behavior across optimization levels

Run the same function with lower optimization settings.

set opt_level=none
set opt_level=speed_and_size

If the failure only reproduces with opt_level=speed, the issue is likely in a pass that is enabled or made more aggressive only in that pipeline, such as:

  • return-value coalescing
  • register allocation under high pressure
  • stack slot reuse
  • tail duplication or block layout rewriting

3. Disable implicit multi-return sret as a temporary workaround

If you need an immediate unblock for builds or CI, disable the feature that triggers the invalid lowering path.

test optimize
    set target=riscv64
    set opt_level=speed
    set preserve_frame_pointers=true
    set enable_multi_ret_implicit_sret=false

This is the safest short-term mitigation because it avoids the exact return-lowering mode that appears to be corrupting generated code.

4. Verify frame pointer interaction

Next, test whether preserving frame pointers is contributing to the problem.

test optimize
    set target=riscv64
    set opt_level=speed
    set preserve_frame_pointers=false
    set enable_multi_ret_implicit_sret=true

If this avoids the crash, inspect backend logic that computes:

  • CFA and frame offsets
  • saved register locations
  • hidden return-pointer placement
  • epilogue restoration order

That usually means the riscv64 backend is not correctly reconciling frame pointer preservation with implicit sret lowering.

5. Inspect the generated IR and machine code

Look for suspicious rewrites around the return sequence. Specifically check whether:

  • the hidden sret pointer is still available at the final store
  • return registers overlap with temporary registers
  • stack offsets remain valid after register allocation
  • prologue and epilogue save/restore the same registers
; Pseudocode pattern to validate
; expected: hidden_sret_ptr remains valid until all return fields are stored
store ret_val0, [hidden_sret_ptr + 0]
store ret_val1, [hidden_sret_ptr + 8]
ret

If the generated code instead writes through a clobbered register, reuses the frame pointer register incorrectly, or restores a register before the final store, that is the direct source of the segmentation fault.

6. Patch the backend lowering logic

The real fix is typically one of the following backend changes:

  • mark the implicit sret pointer as live until the final return store
  • prevent register allocation from assigning conflicting physical registers
  • adjust riscv64 calling-convention lowering for multi-return functions
  • insert correct stack slot constraints when preserve_frame_pointers=true
  • disable a specific optimization for this lowering path until a full fix lands
// Pseudocode for a defensive backend fix
if target == Riscv64 && opt_level == Speed && enable_multi_ret_implicit_sret {
    ensure_hidden_sret_pointer_is_pinned();
    forbid_conflict_with_frame_pointer_related_regs();
    validate_prologue_epilogue_offsets();
}

7. Add a regression test

Once fixed, lock it down with a dedicated test that keeps the same trigger conditions.

test optimize
    set target=riscv64
    set opt_level=speed
    set preserve_frame_pointers=true
    set enable_multi_ret_implicit_sret=true

function u1:4(i64) -> i64, i64 {
block0(v0: i64):
    v1 = iconst.i64 1
    return v0, v1
}

; Verify: no crash, correct lowering, valid return handling

A good regression test should remain minimal and should specifically exercise the combination of riscv64, speed optimization, and implicit sret multi-return lowering.

Common Edge Cases

  • Crash disappears in debug builds: lower optimization often avoids the problematic code path, which can hide the backend defect.
  • Only multi-value returns fail: single-value returns may use a different calling convention path and never touch implicit sret lowering.
  • Only riscv64 is affected: other architectures may have enough return registers or a different ABI, so the same bug does not surface there.
  • Frame pointer settings change the symptom: with frame pointers disabled, the same bug may show up as wrong output instead of a hard crash because register pressure changes.
  • Tail-call or block-layout optimizations complicate debugging: the faulting instruction may be far removed from the original return statement in the IR.
  • ABI-sensitive tests fail intermittently: if the bug depends on register allocation order, small unrelated code changes can make the segmentation fault appear or disappear.

FAQ

Why does this crash only with opt_level=speed?

Because the speed pipeline enables more aggressive transformations that can change register lifetimes, stack slot reuse, and return lowering. If the riscv64 backend has a hidden ABI bug, those optimizations are often what expose it.

What does enable_multi_ret_implicit_sret=true actually change?

It tells the compiler to lower multi-value returns through an implicit structure return pointer instead of relying only on direct return registers. That introduces a hidden pointer argument and extra memory writes, which increases the chance of ABI or register-liveness mistakes.

What is the best temporary workaround while waiting for a proper fix?

Disable enable_multi_ret_implicit_sret for riscv64 when using opt_level=speed, or fall back to a less aggressive optimization level. If stack tracing is more important than performance, also test whether changing preserve_frame_pointers avoids the faulty lowering path.

The key takeaway is simple: this segmentation fault is not caused by the source function itself, but by a riscv64 backend code generation bug triggered by the combination of speed optimization, preserved frame pointers, and implicit sret-based multi-return lowering. Reproducing it with a minimal test, isolating the triggering flag, and hardening the return-lowering path is the correct long-term fix.

Leave a Reply

Your email address will not be published. Required fields are marked *