How to Fix: cranelift-jit: Panic when calling colocated function in riscv64

8 min read

Cranelift JIT Panic on RISC-V64 When Calling a Colocated Function: Root Cause and Fix

A panic in cranelift-jit on riscv64 when one generated function calls another colocated function is usually a code generation and relocation mismatch, not a random runtime failure. The failure shows up because the JIT assumes a call sequence or addressability rule that does not hold on RISC-V 64-bit, especially when direct calls are emitted under placement assumptions that are invalid once machine code is finalized in memory.

The issue is easiest to reproduce with a minimal .clif file containing two functions where one function calls another in the same JIT module on a riscv64gc target. In that setup, Cranelift may treat the callee as close enough for a direct branch encoding, but the final lowering, relocation, or emitted trampoline path can violate RISC-V calling constraints and trigger a panic during compilation or execution.

Understanding the Root Cause

On RISC-V, call instructions are more restrictive than on some other architectures. A direct jal-style call has a limited PC-relative range. If the compiler backend marks a function reference as colocated, it is effectively saying: this target is expected to be placed near enough that a more direct and cheaper call sequence is valid.

That assumption becomes dangerous in a JIT for three reasons:

  1. Final code placement happens late. The backend may lower a call before executable memory is allocated and before exact distances between functions are known.
  2. Colocation metadata can be too optimistic. Two functions belonging to the same JIT session are not automatically guaranteed to fit the encoding constraints required by a short direct call on riscv64.
  3. Relocation handling differs by backend path. If the selected relocation kind, call lowering, or veneer/trampoline generation path does not fully support the colocated case, the backend can hit an internal invariant and panic.

In practice, the crash often comes from this chain:

  1. A function reference is marked colocated.
  2. The riscv64 backend chooses a direct near-call style.
  3. The actual relocation or emission phase cannot legally encode that target with the expected form.
  4. Instead of falling back to a safe indirect or long-range sequence, the implementation reaches an unimplemented or invalid state and panics.

This is why the bug appears specifically when calling a colocated function and specifically on riscv64. The architecture’s branch/call encoding constraints expose backend assumptions that may stay invisible on x86-64 or AArch64.

From a compiler engineering perspective, the correct fix is usually one of these:

  • Do not use the colocated fast path for riscv64 in JIT mode unless the backend can prove the relocation is valid.
  • Lower colocated calls to a safe long-range sequence.
  • Introduce a veneer, stub, or trampoline when the target cannot be encoded directly.
  • Normalize JIT function references so backend call lowering uses a relocation kind that the riscv64 emitter fully supports.

Step-by-Step Solution

The safest way to fix this issue is to remove the invalid assumption that colocated JIT calls on riscv64 can always use the direct near-call lowering path. The exact patch depends on the Cranelift version, but the engineering approach stays consistent.

1. Reproduce the failure with a focused test

Create or preserve a regression test using the reduced .clif program from the issue. The goal is to keep a minimal case where one function calls another under target riscv64gc.

test run
target riscv64gc

function %b() {
block0:
    return
}

function %a(i16) {
block0(v0: i16):
    call %b()
    return
}

If your local test harness supports file-based execution, add it to the appropriate Cranelift test suite so the panic is always checked in CI.

2. Inspect how the call is lowered on riscv64

Look in the riscv64 backend for the code path that lowers a direct function call. You want to find where a FunctionRef or external name marked as colocated causes direct call selection.

// Pseudocode
match callee_ref {
    Callee::Direct(func_ref) if flags.colocated(func_ref) => {
        // current fast path may emit a direct call form
    }
    _ => {
        // generic or indirect path
    }
}

In the broken implementation, this branch often assumes the backend can always materialize a valid direct call relocation for the target.

3. Disable the unsafe colocated fast path for riscv64 JIT calls

If the backend cannot guarantee direct encoding safety, force a conservative lowering path. That may mean materializing the callee address into a register and using an indirect call such as jalr.

// Safer pseudocode for riscv64
if target_arch == Riscv64 && is_jit_mode && is_colocated_call {
    // Materialize function address with relocation-friendly sequence
    tmp = load_function_address(callee)
    emit_jalr(tmp)
} else {
    emit_default_call(callee)
}

This approach trades a small performance cost for correctness and removes the panic entirely.

4. Prefer relocation-safe address materialization

On riscv64, long-range or unknown-distance targets are commonly handled by loading the address via relocation-supported instruction pairs before issuing jalr.

# Conceptual assembly shape
auipc t0, %pcrel_hi(target)
ld    t0, %pcrel_lo(target)(t0)
jalr  ra, t0, 0

The exact sequence depends on how the backend represents JIT relocations, but the core rule is simple: do not emit a direct call encoding unless range and relocation semantics are guaranteed.

5. If available, use or add a trampoline mechanism

Some JIT runtimes solve this class of problem by routing calls through a nearby stub. If Cranelift already provides a function-call trampoline abstraction for the JIT layer, make riscv64 colocated calls use it where direct encoding is not safe.

// Conceptual flow
caller -> local trampoline -> final callee

// Benefits:
// - stable relocation model
// - easier patching
// - safe distance handling

This is especially useful if executable pages can be allocated in separate regions over time.

6. Add a regression test specifically for colocated calls on riscv64

Do not stop at a generic JIT test. Add a test that proves the backend no longer panics when a same-module function call is marked or treated as colocated.

// Test intent
// 1. compile two functions for riscv64gc
// 2. function A calls function B
// 3. execute A
// 4. assert no panic and correct return behavior

If the test framework allows backend-level verification, also check the emitted call form is the conservative one you expect.

7. Validate with architecture-specific runs

Because this is a backend issue, validation should include actual riscv64 execution or a trusted emulator environment. At minimum:

  • Run the reduced regression test.
  • Run the full Cranelift JIT test suite for riscv64.
  • Run fuzz-generated call patterns with multiple function definitions.
  • Check both debug and release builds.
# Example workflow
cargo test -p cranelift-jit
cargo test -p cranelift-codegen riscv64
# then run architecture-specific regression coverage in your CI or emulator

8. Document the backend invariant

Leave a short comment in the riscv64 lowering code so future maintainers understand why the conservative path exists.

// On riscv64 JIT, colocated does not guarantee a directly encodable call.
// Use relocation-safe address materialization plus jalr unless range and
// relocation form are proven valid.

That note can prevent the same regression from being reintroduced during later optimization work.

Common Edge Cases

Even after fixing the immediate panic, several related cases should be checked carefully.

1. Multiple JIT allocation regions

If code memory is allocated in separate chunks, two functions created close together logically may still end up far apart physically. That makes colocation hints unreliable for direct call encoding.

2. Lazy compilation or function finalization order

If function A is compiled before function B has a stable executable address, the call relocation path must tolerate deferred resolution. A direct branch form may fail here even if it sometimes works in eager compilation.

3. Cross-module calls mistaken as local

A bug in symbol classification can mark a non-local target as colocated. That can lead to the same panic pattern, but the real problem is incorrect symbol metadata.

4. Tail calls and special calling conventions

If the backend has separate lowering for tail calls, libcalls, or ABI-specific helper calls, those paths may still contain the same invalid assumption even after normal calls are fixed.

5. Relocation support gaps in object and JIT modes

Cranelift can support both object emission and JIT execution. A fix that works in object mode may not be enough in JIT mode if relocation application logic differs.

6. Pointer authentication of internal invariants

Some backend panics are not caused by the final call instruction itself, but by verifier assumptions about register usage, temporary registers, or legal instruction forms during address materialization. Test the full compile pipeline, not just execution.

7. Fuzz-generated tiny functions

Very small functions often trigger optimization shortcuts. After the fix, ensure tiny void-return callees still use the safe path and do not regress into the broken direct-call lowering.

FAQ

Why does this happen only on riscv64 and not everywhere?

Because RISC-V call encodings have different range and relocation constraints than architectures like x86-64. A backend shortcut that is harmless elsewhere can become invalid on riscv64 when exact target placement is not known at lowering time.

Is marking a function as colocated always wrong in JIT mode?

No. Colocated is a useful optimization hint, but it must only enable direct call lowering when the backend and runtime can actually guarantee the target is encodable with the selected instruction sequence. On riscv64 JIT, that guarantee may not hold.

What is the safest fix if I need correctness immediately?

Use a relocation-safe indirect call sequence for riscv64 JIT function references, even for colocated targets. Materialize the address, then call through jalr. It may be slightly less efficient than a direct near call, but it avoids invalid encoding assumptions and backend panics.

The key takeaway is simple: colocated is a hint, not proof. For cranelift-jit on riscv64, the robust fix is to avoid direct call lowering unless the backend can prove the relocation and distance are valid after code placement. That turns a panic-prone optimization into a correct, architecture-aware implementation.

Leave a Reply

Your email address will not be published. Required fields are marked *