How to Fix: Cranelift: aarch64: user-controlled recursion in `lower_fmla`

Updated June 9, 2026 7 min read

Aldawsari

7 min read

Cranelift aarch64: Fixing user-controlled recursion in `lower_fmla`

A malicious or simply unlucky instruction shape can drive unbounded recursive rewriting inside Cranelift’s AArch64 lowering path when lower_fmla repeatedly peels fneg from one multiplicand, flips the operation, and then re-enters a rule set that can reconstruct the same state. In a compiler backend, that is dangerous because a crafted IR pattern can trigger a stack overflow, denial of service, or non-terminating compilation.

Table of Contents

Understanding the Root Cause
Step-by-Step Solution
Common Edge Cases
FAQ

This tutorial explains why the recursion happens, how to patch it safely, and what to test before merging the fix.

Understanding the Root Cause

The bug lives in instruction lowering logic for fused multiply-add patterns on AArch64. A typical optimization rule says: if one multiplicand is an fneg, strip the negation, reverse the fused operation, and continue lowering. That transformation is correct only if it makes measurable progress toward a terminal form.

The problem appears when the rewrite system does not enforce a monotonic decrease in complexity. For example:

fmla(a, fneg(b), c) becomes a logically reversed fused form.
The reversed form is routed back through lower_fmla or a sibling rule.
Another rule observes a negation pattern and transforms it again.
The new expression is structurally equivalent to the original one, or cycles through a small set of equivalent states.

That creates user-controlled recursion because the IR producer controls whether the compiler sees exactly the operand combinations that keep the rewrite loop alive.

In practical compiler terms, this is a classic termination bug:

No recursion depth limit.
No visited-state guard.
No proof that each rewrite reduces the expression to a simpler canonical representation.

If the lowering pipeline allows recursive entry on semantically equivalent forms, an attacker can produce a tiny input that causes disproportionately expensive compilation behavior.

A simplified version of the problematic pattern looks like this:

lower_fmla(x, y, z, op) -> if y is fneg(v) then lower_fmla(x, v, z, reverse(op))

If another branch later reintroduces or reinterprets the same negation shape, the recursion never converges.

Step-by-Step Solution

The safest fix is to replace recursive normalization with a non-recursive canonicalization step, or to allow the transformation only once before dispatching into terminal lowering code. The key idea is simple: normalize negations without calling the same high-level lowering routine again on an equivalent pattern.

1. Identify the recursive rewrite path

Find the lower_fmla rules that:

Match fneg on one multiplicand.
Flip from one fused op variant to another.
Call back into lower_fmla or another rule that can return to it.

The vulnerable logic usually resembles this:

if is_fneg(mul_lhs) {
    return lower_fmla(unwrapped_lhs, mul_rhs, acc, reverse_op(op));
}
if is_fneg(mul_rhs) {
    return lower_fmla(mul_lhs, unwrapped_rhs, acc, reverse_op(op));
}

2. Canonicalize operands first

Instead of recursive rewrites, extract negation state into local flags and lower once. This guarantees bounded execution.

struct CanonicalFmlaInputs {
    lhs: Value,
    rhs: Value,
    acc: Value,
    negate_product: bool,
}

fn canonicalize_fmla_inputs(lhs: Value, rhs: Value, acc: Value) -> CanonicalFmlaInputs {
    let mut lhs_out = lhs;
    let mut rhs_out = rhs;
    let mut negate_product = false;

    if let Some(v) = peel_fneg(lhs_out) {
        lhs_out = v;
        negate_product = !negate_product;
    }

    if let Some(v) = peel_fneg(rhs_out) {
        rhs_out = v;
        negate_product = !negate_product;
    }

    CanonicalFmlaInputs {
        lhs: lhs_out,
        rhs: rhs_out,
        acc,
        negate_product,
    }
}

Then perform a single lowering decision:

fn lower_fmla(lhs: Value, rhs: Value, acc: Value, op: FusedOp) -> InstOutput {
    let inputs = canonicalize_fmla_inputs(lhs, rhs, acc);

    let final_op = if inputs.negate_product {
        reverse_op(op)
    } else {
        op
    };

    lower_fmla_terminal(inputs.lhs, inputs.rhs, inputs.acc, final_op)
}

This works because the rule now performs at most two fneg peels and exactly one terminal dispatch. There is no recursive re-entry.

3. Introduce a terminal lowering function

Split the API so canonicalization and emission are separate phases. That prevents future changes from accidentally reintroducing recursion.

fn lower_fmla_terminal(lhs: Value, rhs: Value, acc: Value, op: FusedOp) -> InstOutput {
    match op {
        FusedOp::Add => emit_fmadd(lhs, rhs, acc),
        FusedOp::Sub => emit_fmsub(lhs, rhs, acc),
        FusedOp::NegAdd => emit_fnmadd(lhs, rhs, acc),
        FusedOp::NegSub => emit_fnmsub(lhs, rhs, acc),
    }
}

Make sure lower_fmla_terminal does not call back into lower_fmla.

4. If recursion must remain, add a hard progress invariant

If architectural constraints require recursive style, enforce a proven progress metric, such as a count of peelable negations or a recursion budget. This is less elegant than canonicalization, but still safer than open-ended recursion.

fn lower_fmla_guarded(lhs: Value, rhs: Value, acc: Value, op: FusedOp, budget: u8) -> InstOutput {
    assert!(budget > 0, "lower_fmla recursion budget exhausted");

    if let Some(v) = peel_fneg(lhs) {
        return lower_fmla_terminal(v, rhs, acc, reverse_op(op));
    }

    if let Some(v) = peel_fneg(rhs) {
        return lower_fmla_terminal(lhs, v, acc, reverse_op(op));
    }

    lower_fmla_terminal(lhs, rhs, acc, op)
}

Notice the important difference: even in the guarded version, once a negation is peeled, the code goes to a terminal path, not back into the same normalization function.

5. Add regression tests for cyclic patterns

Test both correctness and termination. Include cases where:

Left multiplicand is fneg.
Right multiplicand is fneg.
Both multiplicands are fneg.
Nested IR combines negation with fused op forms that used to bounce between rules.

#[test]
fn lower_fmla_single_neg_does_not_recurse() {
    let ir = build_fmla_with_negated_lhs();
    let result = compile_aarch64(ir);
    assert!(result.is_ok());
}

#[test]
fn lower_fmla_double_neg_canonicalizes_once() {
    let ir = build_fmla_with_both_negated();
    let result = compile_aarch64(ir);
    assert!(result.is_ok());
}

#[test]
fn lower_fmla_malicious_cycle_input_terminates() {
    let ir = build_cycle_triggering_ir();
    let result = compile_aarch64(ir);
    assert!(result.is_ok());
}

6. Add a fuzz target if one does not already exist

Because this is a compiler rewriting termination bug, fuzzing is especially effective. Generate floating-point expression trees with frequent fneg, fma, and operand permutations. The goal is not just wrong-code detection, but ensuring the compiler always terminates within expected resource limits.

loop {
    let ir = fuzz_ir_with_patterns([
        Op::Fneg,
        Op::Fma,
        Op::Fmul,
        Op::Fadd,
        Op::Fsub,
    ]);

    assert_compilation_terminates(ir);
}

7. Document the invariant in code comments

This class of bug often returns when later refactors restore a convenient recursive call. Add a comment explaining that lower_fmla must not recursively normalize equivalent fused-op states.

// Important: do not re-enter lower_fmla after peeling fneg.
// Negation handling must be canonicalized exactly once, then lowered
// through a terminal emission path to guarantee termination.

Common Edge Cases

Fixing the recursion is necessary, but a robust patch also handles nearby cases that can silently break semantics or performance.

Double negation

If both multiplicands are wrapped in fneg, the product sign flips twice. Your canonicalization must cancel the two negations correctly. Otherwise, you may fix the recursion but introduce wrong-code generation.

Negation on the accumulator instead of multiplicands

fma(a, b, fneg(c)) is not always interchangeable with negating the product side. Keep accumulator handling separate unless the backend proves equivalence for the exact fused instruction variant.

NaN and signed-zero behavior

Floating-point transforms are subtle. While fused forms are mathematically related, backend selection must still preserve the target’s expected behavior for NaN propagation, rounding, and negative zero under Cranelift’s semantics.

Pattern matching across aliases

If fneg can appear through instruction aliases, value wrappers, or legalization-generated nodes, make sure peel_fneg recognizes only the safe canonical form or normalizes aliases first.

Shared helper recursion

Even after removing direct self-calls, an indirect cycle may remain if lower_fmla calls lower_fmul, which calls another helper, which calls lower_fmla. Audit the full lowering chain for mutual recursion.

Debug versus release behavior

In debug builds, stack overflow may be easy to reproduce. In release builds, the same bug may look like a hang or high CPU consumption. Regression tests should assert termination, not just absence of panic.

FAQ

1. Why is this considered user-controlled recursion?

Because the input IR determines whether the lowering logic sees the exact operand pattern that triggers recursive rewriting. An external crate, test case, or fuzzed input can therefore influence compiler control flow and force non-termination.

2. Is adding a recursion limit enough?

No. A depth limit is a useful emergency guard, but the real fix is to ensure lowering has a terminating canonical form. Otherwise, you risk hidden correctness issues, brittle behavior, or future regressions when limits change.

3. What is the best long-term fix for Cranelift here?

The best approach is to split normalization from instruction emission. Canonicalize all sign-related operand state once, compute the final fused-op variant, and then lower through a terminal function that cannot bounce back into the same rewrite rules.

In short, the issue is not that peeling fneg is inherently wrong. The issue is that the current rewrite shape can re-enter itself without a guaranteed decrease in complexity. Converting that logic into a single-pass canonicalization step removes the recursion vector, keeps AArch64 fused-op lowering efficient, and closes a compiler denial-of-service bug at the source.

Cranelift aarch64: Fixing user-controlled recursion in lower_fmla